Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Aug 29;331(5):991-1004.
doi: 10.1016/s0022-2836(03)00865-9.

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Affiliations
Comparative Study

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Eric J Snijder et al. J Mol Biol. .

Abstract

The genome organization and expression strategy of the newly identified severe acute respiratory syndrome coronavirus (SARS-CoV) were predicted using recently published genome sequences. Fourteen putative open reading frames were identified, 12 of which were predicted to be expressed from a nested set of eight subgenomic mRNAs. The synthesis of these mRNAs in SARS-CoV-infected cells was confirmed experimentally. The 4382- and 7073 amino acid residue SARS-CoV replicase polyproteins are predicted to be cleaved into 16 subunits by two viral proteinases (bringing the total number of SARS-CoV proteins to 28). A phylogenetic analysis of the replicase gene, using a distantly related torovirus as an outgroup, demonstrated that, despite a number of unique features, SARS-CoV is most closely related to group 2 coronaviruses. Distant homologs of cellular RNA processing enzymes were identified in group 2 coronaviruses, with four of them being conserved in SARS-CoV. These newly recognized viral enzymes place the mechanism of coronavirus RNA synthesis in a completely new perspective. Furthermore, together with previously described viral enzymes, they will be important targets for the design of antiviral strategies aimed at controlling the further spread of SARS-CoV.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the SARS-CoV genome organization and expression. Comparison of the genome organizations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORF1a, ORF1b, and ribosomal frameshift site indicated. Arrows represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Cleavage products are provisionally numbered nsp1–nsp16 (see also Table 1). In the 3′-terminal part of the genomes, homologous structural protein genes are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the N-terminal half of replicase ORF1a, SARS-CoV lacks one of the PLpro domains (indicated in orange/green in BCoV) and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofmann et al.76). The bottom part of the Figure illustrates which parts of the genome are conserved in the genus Coronavirus and in the order Nidovirales (the ORF1a sequence of toroviruses, which largely remains to be sequenced, could not be included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique domain; PLpro, papainlike cysteine proteinase; 3CLpro, 3C-like cysteine proteinase; TM, transmembrane domain; ADRP, adenosine diphosphate-ribose 1″-phosphatase; ExoN, 3′-to-5′ exonuclease; CLpro, chymotrypsin-like proteinase; RdRp, RNA-dependent RNA polymerase; HEL1, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific endoribonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase; CPD, cyclic phosphodiesterase. Domains Ac, X, and Y are described by Ziebuhr et al. and Gorbalenya et al.
Figure 2
Figure 2
Phylogenetic analysis of coronavirus replicase genes. SARS-CoV replicase ORF1b amino acid sequences (Entrez Genomes accession number NC_004718 (AY274119)) were compared with those from viruses representing the three coronavirus subgroups and the genus Torovirus. Group 1: transmissible gastroenteritis virus (TGEV), NC_002306; human coronavirus 229E (HCoV-229E), NC_002645; porcine epidemic diarrhea virus (PEDV), NC_003436. Group 2: mouse hepatitis virus A59 (MHV-A59), NC_001846; bovine coronavirus (BCoV-Lun) AF391542. Group 3: infectious bronchitis virus (IBV), strains Beaudette (NC_001451) and LX4 (AY223860). Torovirus: equine torovirus (EToV), X52374. A multiple protein alignment of these sequences was generated with the help of the ClustalX1.82 program and was adjusted manually. Two regions of poor conservation were removed from the alignment, which was converted subsequently into the nucleotide form. All columns containing gaps were removed. The resulting alignment contains the following SARS-CoV sequences fused: 13,623–13,859, 14,310–18,857 and 20,076–21,482. It included 5487 characters with 3207 of them being parsimony-informative. Using the PAUP program (version 4.0.0d55) and parsimony criterion, an exhaustive tree search of the 135,135 evaluated trees identified the best tree having a score of 10,927 and the second best tree having a score of 10,964; the worst tree had a score of 13,611. A total of 1000 bootstrap trials were conducted using the parsimony criterion and a branch-and-bound search to generate a bootstrap 50% majority-rule consensus tree. The frequency of occurrence of particular bifurcations in bootstraps is indicated at the nodes. Similar trees with similar high bootstrap support above 960 were obtained using the NJ method that was applied to distance matrices obtained for either nucleotide or amino acid alignments (not shown).
Figure 3
Figure 3
SARS-CoV subgenomic mRNA synthesis. (A) Organization of ORFs in the 3′ end of the SARS-CoV genome with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of these TRSs for leader-to-body fusion are depicted below, with mRNAs predicted to be functionally bicistronic indicated with an asterisk (∗). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with SARS-CoV, Frankfurt-1 (Fr) and HKU-39849 (HK) isolates. See Materials and Methods for technical details. Oligonucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic 3′ end both recognized a set of nine RNA species (the genome (RNA1) and eight subgenomic RNAs) confirming the presence of common 5′ and 3′ sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which produces only five subgenomic mRNAs of known sizes was run in the same gel and used as a size marker. (C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension of minus strands., Whereas genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 3′ end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus strand that would then serve as template for the transcription of subgenomic mRNAs.
Figure 4
Figure 4
Sequence alignments of protein families that include cellular enzymes involved in RNA processing and their nidovirus homologs. Our in-depth comparative sequence analysis (see Materials and Methods) revealed a statistically significant relationship between functionally uncharacterized proteins (domains) of nidoviruses, including SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron excision to produce mature tRNA and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA (Figure 5). Shown are alignments for key regions of a few selected members of the following groups of enzymes: (A) XendoU family; (B) ExoN family; (C) 2′-O-MT family; (D) CPD family; and (E) ADRP family. These protein families may be known also under other names. Cellular homologs, not necessarily including proteins involved in the discussed RNA processing pathways, are listed in the top segment of each alignment and nidovirus proteins in the bottom segment. In the CPD family, along with group 2 coronavirus representatives, proteins of two rotaviruses (double-stranded RNA viruses), which were identified in this study, are listed. In both segments, residues are highlighted independently: black for absolutely conserved residues and different shades of grey to indicate different levels of conservation; amino acid similarity groups used were: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) F, W, Y; and (v) I, L, M, V. Positions occupied by identical or similar residues in all proteins under comparison are indicated with an asterisk (∗) and colon (:), respectively, in the inter-segment row. For the ExoN family, three motifs conserved in the DEDD superfamily and Zn-finger unique for the ExoN family are indicated. Database accession numbers for nidovirus genome sequences: SARS-CoV, Entrez Genomes accession number NC_004718 (AY274119); MHV-A59, NC_001846; BCoV-Lun, AF391542; HCoV-229E, NC_002645; IBV-B, NC_001451; PEDV, NC_003436; TGEV, NC_002306; equine torovirus (EToV), X52374; equine arteritis virus (EAV), X53459; porcine reproductive and respiratory syndrome virus (PRRSV), M96262; gill-associated virus (GAV), AF227196. Abbrevations and NCBI protein database ID number or SwissProt names of the remaining protein sequences are: (A) Npun 0562, hypothetical protein of Nostoc punctiforme, ZP_00106190; Poliv smB, pancreatic protein of Paralichthys olivaceus, BAA88246; Celeg Pp11, placental protein 11-like precursor of Caenorhabditis elegans, NP_492590); Xlaev endoU, endoU protein of Xenopus laevis, CAD45344; pp1b, ORF1b-encoded part of nidovirus replicase polyprotein 1ab. (B) Yeast PAN2, PAB-dependent poly(A)-specific ribonuclease subunit PAN2 of Saccharomyces cerevisiae, P53010; Mycge DPO3, DNA polymerase III polC-type, containing exonuclease domain, of Mycoplasma genitalium, P47277; Bacsu DING, probable ATP-dependent helicase dinG homolog, containing exonuclease domain, of Bacillus subtilis, P54394; Ecoli DP3E, DNA polymerase III, epsilon chain, containing exonuclease domain, of Escherichia coli, P03007 (PDB: 1J53 and 1J54); Ecoli RNT, exoribonuclease T of Escherichia coli, P30014. (C) Hsap AKA, A-kinase anchoring protein 18 gamma of Homo sapiens, AAF28106; Athal CPD1, putative CPD1 of Arabidopsis thaliana, CAA16750; Athal CPD2, putative CPD2 of Arabidopsis thaliana, CAA16751; yeast YG59, hypothetical 26.7 kDa protein of yeast, P53314; Ecoli LIGT, 2′-5′ RNA ligase of Escherichia coli, P37025; ns2, non-structural protein (ORF2-encoded) of the coronaviruses HCoV-O43 (AAA74377), BCoV-Quebec (P18517), and MHV-A59 (P19738); EToV pp1a, C-terminal fragment of EToV pp1a, S11237; HRoV VP3, VP3 of human rotavirus, BAA84964; ARoV VP3, VP3 of avian rotavirus PO-13, BAA24128. (D) Ecoli o177, putative polyprotein of Escherichia coli, AAC74129; Hsap Y1268a, KIAA1268 protein of Homo sapiens, BAA86582; Hsap H2A1.1, histone macroH2A1.1 of Homo sapiens, AAC33434; yeast YMX7, hypothetical 32.1 kDa protein of yeast, Q04299; yeast YBN2, hypothetical 19.9 kDa protein of yeast, P38218. (E) Yeast YBR1, putative ribosomal RNA methyltransferase (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P38238; yeast SPB1, putative rRNA methyltransferase SPB1 of yeast, P25582; yeast YGN6, putative ribosomal RNA methyltransferase YGL136c (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P53123; Ecoli FTSJ, cell division protein of Escherichia coli, NP_417646.
Figure 5
Figure 5
Nidoviruses encode homologs of cellular enzymes involved in RNA processing. (A) The cellular pathways for processing of pre-U16 snoRNA and pre-tRNA splicing are summarized, with relevant enzymatic activities indicated. For details, see the text. Homologs of the highlighted enzymes have been identified in nidoviruses (see also Figure 1 and the text). (B) Table summarizing the conservation of homologs of the cellular enzymes presumably involved in RNA processing in SARS-CoV and different nidovirus groups.

Similar articles

  • The genetic sequence, origin, and diagnosis of SARS-CoV-2.
    Wang H, Li X, Li T, Zhang S, Wang L, Wu X, Liu J. Wang H, et al. Eur J Clin Microbiol Infect Dis. 2020 Sep;39(9):1629-1635. doi: 10.1007/s10096-020-03899-4. Epub 2020 Apr 24. Eur J Clin Microbiol Infect Dis. 2020. PMID: 32333222 Free PMC article. Review.
  • Characterization of viral proteins encoded by the SARS-coronavirus genome.
    Tan YJ, Lim SG, Hong W. Tan YJ, et al. Antiviral Res. 2005 Feb;65(2):69-78. doi: 10.1016/j.antiviral.2004.10.001. Antiviral Res. 2005. PMID: 15708633 Free PMC article. Review.
  • Identification and characterization of severe acute respiratory syndrome coronavirus replicase proteins.
    Prentice E, McAuliffe J, Lu X, Subbarao K, Denison MR. Prentice E, et al. J Virol. 2004 Sep;78(18):9977-86. doi: 10.1128/JVI.78.18.9977-9986.2004. J Virol. 2004. PMID: 15331731 Free PMC article.
  • The Genome sequence of the SARS-associated coronavirus.
    Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YS, Khattra J, Asano JK, Barber SA, Chan SY, Cloutier A, Coughlin SM, Freeman D, Girn N, Griffith OL, Leach SR, Mayo M, McDonald H, Montgomery SB, Pandoh PK, Petrescu AS, Robertson AG, Schein JE, Siddiqui A, Smailus DE, Stott JM, Yang GS, Plummer F, Andonov A, Artsob H, Bastien N, Bernard K, Booth TF, Bowness D, Czub M, Drebot M, Fernando L, Flick R, Garbutt M, Gray M, Grolla A, Jones S, Feldmann H, Meyers A, Kabani A, Li Y, Normand S, Stroher U, Tipples GA, Tyler S, Vogrig R, Ward D, Watson B, Brunham RC, Krajden M, Petric M, Skowronski DM, Upton C, Roper RL. Marra MA, et al. Science. 2003 May 30;300(5624):1399-404. doi: 10.1126/science.1085953. Epub 2003 May 1. Science. 2003. PMID: 12730501
  • Characterization of a novel coronavirus associated with severe acute respiratory syndrome.
    Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle JP, Peñaranda S, Bankamp B, Maher K, Chen MH, Tong S, Tamin A, Lowe L, Frace M, DeRisi JL, Chen Q, Wang D, Erdman DD, Peret TC, Burns C, Ksiazek TG, Rollin PE, Sanchez A, Liffick S, Holloway B, Limor J, McCaustland K, Olsen-Rasmussen M, Fouchier R, Günther S, Osterhaus AD, Drosten C, Pallansch MA, Anderson LJ, Bellini WJ. Rota PA, et al. Science. 2003 May 30;300(5624):1394-9. doi: 10.1126/science.1085952. Epub 2003 May 1. Science. 2003. PMID: 12730500

Cited by

References

    1. Peiris J.S.M., Lai S.T., Poon L.L.M., Guan Y., Yam L.Y.C., Lim W. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet. 2003;361:1319–1325. - PMC - PubMed
    1. Ksiazek T.G., Erdman D., Goldsmith C.S., Zaki S.R., Peret T., Emery S. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1953–1966. - PubMed
    1. Drosten C., Gunther S., Preiser W., van der Werf S., Brodt H.R., Becker S. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. - PubMed
    1. Marra M.A., Jones S.J., Astell C.R., Holt R.A., Brooks-Wilson A., Butterfield Y.S. The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399–1404. - PubMed
    1. Rota P.A., Oberste M.S., Monroe S.S., Nix W.A., Campagnoli R., Icenogle J.P. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science. 2003;300:1394–1399. - PubMed

Publication types

MeSH terms

Associated data

-