Learn more: PMC Disclaimer | PMC Copyright Notice
Molecular Evolution of Human Coronavirus Genomes
Associated Data
Abstract
Human coronaviruses (HCoVs), including SARS-CoV and MERS-CoV, are zoonotic pathogens that originated in wild animals. HCoVs have large genomes that encode a fixed array of structural and nonstructural components, as well as a variety of accessory proteins that differ in number and sequence even among closely related CoVs. Thus, in addition to recombination and mutation, HCoV genomes evolve through gene gains and losses. In this review we summarize recent findings on the molecular evolution of HCoV genomes, with special attention to recombination and adaptive events that generated new viral species and contributed to host shifts and to HCoV emergence.
Video Abstract
Trends
Human coronaviruses (HCoVs) are zoonotic pathogens with large and complex genomes. Some HCoV accessory proteins were acquired from host genes, and some were lost or split during HCoV evolution. Most likely SARS-CoV ORF8 became dispensable during the shift to the human/civet host.
HCoV spike proteins adapted to use diverse cellular receptors. This occurred by divergence followed, in some cases, by convergent evolution to bind the same receptor.
Recombination and positive selection shaped the diversity of CoV genomes, especially the S gene. Positive selection in the S gene of MERS-CoV and related CoVs mainly acted on the heptad repeats.
In MERS-CoV and other lineage C beta-CoVs, positive selection targeted the nonstructural components, particularly ORF1a. Most adaptive events occurred in nsp3, which acts as a viral protease and contributes to suppression of interferon responses.
Human Coronaviruses Are Zoonotic Pathogens
The recent emergence of severe acute respiratory syndrome-related coronavirus (SARS-CoV) and of Middle East respiratory syndrome-related Coronavirus (MERS-CoV) (order Nidovirales, family Coronaviridae, subfamily Coronavirinae) as dangerous zoonoses stirred great interest in the ecology and evolution of coronaviruses. Before the SARS-CoV epidemic only two HCoVs were known: HCoV-229E and HCoV-OC43. Two additional HCoVs, HCoV-NL63 and HCoV-HKU1, were discovered in 2004–2005 from clinical specimens [1]. These viruses originated in animals and are mainly responsible for respiratory diseases in humans (Figure 1A, Key Figure). Specifically, all HCoVs are thought to have a bat origin, with the exception of lineage A beta-CoVs, which may have reservoirs in rodents [2]. The phylogenetic relationships of HCoVs and other animal CoVs mentioned in this review are summarized in Figure 1A.
A number of field studies identified and sequenced viruses related to HCoVs in wildlife reservoirs, and phylogenetic reconstruction provided important clues on the most likely events that led to the introduction of HCoVs in human populations. Several recent excellent reviews delve into the knowns and unknowns of HCoV origin in terms of reservoir species, amplification host, and, more generally, of CoV ecology 1, 3, 4, 5. In this review we instead focus on the molecular evolution of HCoV genomes. The general concepts of evolutionary analyses in viruses are outlined in Box 1 , whereas the most common approaches that were applied to the analysis of CoV sequence evolution in terms of phylogenetic reconstruction, detection of recombination, and identification of selection signatures are summarized in Boxes 1 and 2 .
HCoV Genome Organization
CoVs are positive-sense, single-strand RNA viruses with a likely ancient origin, and HCoVs repeatedly emerged during the past 1000 years (Box 3 ). All CoVs have nonsegmented genomes that share a similar organization. About two thirds of the genome consists of two large overlapping open reading frames (ORF1a and ORF1b; see Glossary), that are translated into the pp1a and pp1ab polyproteins. These are processed to generate 16 nonstructural proteins (nsp1 to 16). The remaining portion of the genome includes ORFs for the structural proteins: spike (S), envelope (E), membrane (M) and nucleoprotein (N). A variable number of accessory proteins are also encoded by distinct viruses (Figure 1B).
Among RNA viruses, CoVs have exceptionally long genomes (up to 32 kb). Genome expansion in CoVs is believed to be at least partially mediated by increased replication fidelity. Although estimates of the mutation rate for CoVs differ, possibly depending on the phase of CoV adaptation to novel hosts, several studies have shown that these viruses may possess an unusually high replication fidelity 6, 7, 8. Indeed, a major step that allowed genome expansion in CoVs and, more generally, in Nidovirales, was the acquisition of a set of RNA-processing enzymes that improved the low fidelity of RNA replication [9]. These enzymes include an RNA 3′-to-5′ exoribonuclease (ExoN) and possibly an endoribonuclease (NendoU) [9]. Additional evidence, though, suggests that features distinct from replication fidelity underlie genome expansion in Nidovirales. These include a peculiar genome organization [9] and a processive replication complex [10].
Importantly, CoV genome expansion allowed the acquisition and maintenance of genes encoding diverse accessory proteins that may promote virus adaptation to specific hosts and often contribute to the suppression of immune responses, as well as to virulence. Accessory proteins differ in number and sequence even among CoVs belonging to the same lineage (Figure 1B), raising interesting questions about their origin and evolution.
Gene Gains and Gene Losses
The acquisition (or loss) of novel protein-coding genes has the potential to drastically modify viral phenotypes. Thus, tracing these gain/loss events may identify important turning points in viral evolution.
Among SARS-CoV accessory proteins, the origin of ORF8 has remained mysterious for a while, as SARS-CoV-related (SARSr) bat viruses were isolated but found to encode divergent ORF8 proteins (amino acid identity with SARS-CoV ORF8 around 33%) 11, 12, 13. Very recently, SARSr-BatCoVs from Rhinolophus sinicus (Rs) and Rhinolophus ferrumequinum (Rf) were isolated 14, 15. Analysis of the ORF8 region revealed high sequence identity with civet/human SARS-CoV. Two groups came to the conclusion that recombination within SARSr-Rs-CoVs or between SARSr-Rs-CoVs and SARSr-Rf-CoVs led to the acquisition of an ORF8 closely related to that of civet/human SARS-CoV and ultimately originated the virus responsible for the human epidemic 14, 15. Interestingly, Lau and coworkers also reported that the ORF8 region has a dN/dS = 3.5 in SARS-CoVs isolated from humans (but not in SARSr-BatCoVs), indicating the action of positive selection (Box 1) [14]. This finding is interesting per se and becomes even more important considering that, early in the human epidemic, SARS-CoVs acquired a signature 29-nucleotide deletion which split ORF8 into two functional ORFs (ORF8a and b) [16]. These findings suggest that rapid evolution of ORF8 might facilitate host shifts [14]. This possibility is, however, questioned by the presence of additional SARS-CoV human isolates that carry independent and larger deletions in the ORF8 region [16]. Thus, an alternative explanation for these findings is that the activity of ORF8 became dispensable in the human host. If this were the case, relaxed purifying selection rather than positive selection might be responsible for the high dN/dS. To disentangle these alternative possibilities we analyzed ORF8 in human and civet viruses that carry an intact gene, as well as in bat viruses. Although we confirmed that dN/dS is well above 1 for human/civet SARS-CoV ORF8, we detected no evidence of positive selection using the M7/M8 ‘site models’ from PAML (Box 2) or with PARRIS (PARtitioning approach for Robust Inference of Selection) [17] (Figure 2A). Instead, we obtained evidence that relaxation of natural selection [18] in ORF8 accompanied the shift from bats to civets/humans (Figure 2A). These results suggest no major adaptive role for ORF8 during the human SARS-CoV epidemic and support the view that ORF8 is dispensable for virulence and transmission at least in the human/civet host.
A similar gene loss from the genome of a bat-derived ancestor occurred during the evolution of HCoV-229E. CoVs closely related to HCoV-229E were recently isolated from African hipposiderid bats [19], and a CoV belonging to the same species as HCoV-229E had been described in captive alpacas suffering from an acute respiratory syndrome 20, 21 (Figure 1A). Analysis of these viral genomes indicated that, compared to HCoV-229E, they carry an additional ORF at the genomic 3’ end [20] (Figure 1B). This ORF, which is designated ORF8 but shares no homology with the homonymous SARS-CoV gene, has unknown function and shows limited similarities to any other CoV gene [20]. We analyzed the sequences of recently identified alpha-CoVs from camels [22] and found that ORF8 is encoded by these viruses, as well (Figure 1B). Thus, it is presently unknown whether the loss of ORF8 conferred some advantage during the host shift to humans or, as in the case of ORF8 in SARS-CoV, it became dispensable in the human host.
Another interesting feature of some CoVs is that they encode phosphodiesterases (PDEs) (Figure 1B). These viral enzymes cleave 2’,5’-oligoadenylate, the product of OAS proteins, to prevent activation of the cellular endoribonuclease RNase L and consequently block interferon (IFN)-induced antiviral responses [23]. The PDE activity in the mouse hepatitis virus (MHV) NS2a protein is critical for hepatovirulence [23]. HCoV-OC43, as well as other lineage A nonhuman beta-CoVs, encode NS2a proteins that are characterized by a high degree of sequence similarity to the MHV PDE (Figure 1B). A protein with structure and sequence homology to NS2a is also encoded by an unrelated virus, Group A rotavirus. In this case the PDE activity resides in the C-terminal portion of VP3, a virulence factor [24]. Interestingly, both VP3 and NS2a show two motifs that are characteristic of the 2H-PDE family and share very little sequence similarity to the PDE domain of a cellular protein, AKAP7 [24] (Figure 2B). AKAP7 and the viral PDEs display structural homology (Figure 2B), and murine AKAP7 can complement an inactive MHV NS2a gene [25]. From an evolutionary standpoint, these observations suggest that: (i) beta-CoVs and rotaviruses have independently acquired PDE activities; and (ii) AKAP7 served as the source gene in both viral genera (Figure 2B). More recently, a PDE activity was also discovered in the NS4b protein of MERS-CoV (Figure 1B) and other lineage C beta-CoVs [26]. Similar to those in lineage A beta-CoVs and rotavirus, NS4b belongs to the 2H-phosphoesterase family and displays a predicted structure homologous to AKAP7 [26] (Figure 2B). It remains to be determined whether NS4b was acquired by capturing a vertebrate AKAP7, but the observation that distinct viruses acquired, most likely independently, a PDE activity underscores the importance of these enzymes for viral fitness.
It was recently proposed that CoVs (and other viruses) stole additional genes from their hosts [27]. Hemagglutinin-esterases (HEs) are encoded by lineage A beta-CoVs (e.g., HCoV-HKU1 and HCoV-OC43) (Figure 1B), as well as influenza C virus and toroviruses. Structural analysis suggested that these viral enzymes derive from an ancestral host lectin, although it is unclear whether acquisition occurred in an ancestral virus followed by speciation or multiple times [27]. Incidentally, the N-terminal domain of the CoV spike protein is also believed to derive from a cellular lectin [28]. Unlike the influenza virus C enzyme, CoV HEs lack membrane-fusion activity and are accessory to the spike protein by serving primarily as receptor-destroying enzymes (RDE) – that is, they aid viral detachment from carbohydrates present on infected cells 29, 30. In fact, HEs are present only in the genome of lineage A beta-CoVs, most of which use sialic acids as coreceptors [1] (Figure 1B). These observations suggest that sialic acid-binding spike proteins coevolved with HE genes serving as RDEs. This hypothesis is supported by the observation that the MHV spike protein evolved from an ancestral sugar-binding preference to a protein-binding mode and that several MHV strains lost expression of HE 27, 28 (Figure 1B).
Finally, it is important to notice that artificial selection can lead to unintended changes in viral genomes. Such changes most likely result from passages in culture that, on one hand, relieve the virus from pressures exerted in vivo (e.g., by the host immune system) and, on the other hand, derive from viral adaptation to the in vitro system. An example of these effects is the loss of a full-length ORF4 in the HCoV-229E prototype strain which, due to a two-nucleotide deletion, has a split gene, encoding two proteins (ORF4a and ORF4b) 31, 32 (Figure 1B). Conversely, clinical isolates display a full-length ORF4 sequence [32]. An intact ORF4 is also observed in bat and camel viruses related to HCoV-229E 19, 22, whereas the alpaca alpha-CoV displays a one-nucleotide insertion, resulting in a frameshift [20] (Figure 1B). The availability of only a single alpaca CoV genome makes it impossible to determine whether the inserted sequence is representative of the alpaca CoV population or, else, if it represents a sequencing error.
Overall, these observations suggest that loss of full-length ORF4 is a result of passaging in cell culture, a process that often generates attenuated viruses. An interesting finding on the role of ORF4a came from the observation that its protein product regulates virus production in vitro by functioning as a viroporin [33]. Most likely, the same function is performed by the full-length ORF4 as well.
Viroporins were also detected in SARS-CoV, HCoV-OC43, and HCoV-NL63 34, 35 (Figure 1B). As expected, given the relatedness of the two viruses, the proteins from HCoV-NL63 and HCoV-229E share substantial sequence similarity. Limited similarity is also observed with the SARS-CoV protein, especially in the first and second transmembrane regions, suggesting either a common origin or independent acquisition followed by convergent optimization of residues in the transmembrane helices (Figure 2C). Conversely, the HCoV-OC43 protein (encoded by ORF5, originally denoted NS12.9) is unrelated to the other CoV viroporins, both in terms of sequence and of domain topology [34] (Figure 2C). A protein homologous to the HCoV-OC43 viroporin is instead encoded by MHV (accessory protein NS5a) and functions as an antagonist of IFN-induced antiviral responses 34, 36. Whether the HCoV-OC43 viroporin has the same IFN-antagonizing activity remains to be investigated; however, mutant viruses lacking ORF5 display growth defects in vitro and in vivo, as well as reduced virulence in mice [34]. Interestingly, the viroporins from SARS-CoV, HCoV-NL63, and HCoV-229E can complement the viroporin-defective mutant HCoV-OC43 in vitro [34]. Thus, the conserved function of CoV viroporins was most likely attained by convergent evolution for acquisition of unrelated genes.
Evolution of Structural and Nonstructural Proteins
Clearly, CoV genomes do not only evolve by gene gains and losses, but also via subtler changes that modify protein sequences, and recombination has an important role in reassorting variants.
Several excellent reviews have focused on the evolutionary history of SARS-CoV genomes in terms of recombination and natural selection 37, 38, 39; hereafter, SARS-CoV will be mentioned only to draw comparisons with other CoVs.
From an evolutionary standpoint, nonstructural proteins have attracted less attention than the structural components. This is likely due to the fact that proteins exposed on the virus surface represent the preferential targets of the host immune response. Thus, analyzing and describing their variability and evolutionary dynamics has a clear relevance for the development of preventive strategies (e.g., vaccines) and of treatment options (e.g., administration of neutralizing antibodies). Moreover, structural proteins, and the S protein in particular, determine the first and essential steps in infection and most likely represent the major determinants of host and tissue tropism.
In CoVs, the S protein includes two functionally distinct units: the S1 region contains an N-terminal domain (NTD) and the receptor-binding domain (RBD, also referred to as C-terminal domain or CTD), whereas the S2 region includes the fusion peptide, two heptad repeats (HR1 and HR2), and the transmembrane region (Figure 3A) [38]. A striking feature of HCoV spike proteins is that they have adapted to use diverse cellular receptors and there is no congruence in the phylogeny of HCoV and their receptor usage. In fact, closely related viruses may use different receptors (Figure 1B). For instance, HCoV-229E uses aminopeptidase N (ANPEP), whereas HCoV-NL63 exploits ACE2, this latter representing the receptor for the relatively divergent SARS-CoV (Figure 1B). It is presently unclear how these binding specificities evolved. The latest developments on this topic and, more generally, on the evolution of structural and nonstructural proteins are detailed below for the five known HCoVs.
MERS-CoV
The evolutionary analysis of MERS-CoV is a rapidly moving field, as sequences from the latest phases of the epidemic have just become available. Analysis of an ever increasing number of viral sequences of both MERS-CoV and of related beta-CoVs revealed that genetic variability in the S gene was shaped by recombination and positive selection. In fact, both ancient and recent intra-spike recombination events were described 22, 40, 41. Interestingly, recombination events with breakpoints within the S gene occurred in camels in Saudi Arabia and originated the MERS-CoV lineage that spread to South Korea.
Analysis of positive selection of MERS-CoV spike genes indicated that several adaptive variants arose in MERS-CoV and in phylogenetically related CoVs [42]. Contrary to common expectation and to what happened during the SARS-CoV host shift to humans, positive selection did not target the RBD. In fact, most adaptive substitutions were detected in the region encompassing the heptad repeats, regions of central importance for virus cell entry (Figure 3A) 42, 43. In other CoVs, variants in the heptad repeats were previously shown to affect host or tissue tropism 44, 45, 46. Interestingly, during the South Korean outbreak, MERS-CoVs that carry point mutations in the spike protein RBD emerged and rapidly spread [47]. These viruses showed decreased binding to the cellular receptor [47] (Figure 3A). Because several immune epitopes are located in the RBD, these findings point to the possibility that MERS-CoV is evolving to avoid the binding of neutralizing antibodies, resulting in a trade-off with receptor-binding affinity [47]. If this were the case, the phases of MERS-CoV adaptation to humans may have consisted of initial events that modulated host tropism through changes in the heptad repeats followed by the emergence of virus variants that escape immune responses. In MERS-CoV and other lineage C beta-CoVs, positive selection also targeted the nonstructural components, particularly ORF1a [48]. Most adaptive events occurred in nsp3, a multifunctional protein which acts as a viral protease and contributes to the suppression of interferon responses through its deubiquitinating and deISGylating activities [49]. Selection in nsp3 is ongoing among MERS-CoV isolated from humans and camels [48]. In analogy to the S protein, though, no major selective event was found to be associated with camel-to-human transmission, although a positively selected change (R911C) in nsp3 was observed among human-derived viruses alone, suggesting that viral adaptation to our species represented the underlying pressure [48].
HCoV-229E
A recent analysis indicated that HCoV-229E may have recombined with the alpaca alpha-CoV virus within the S gene, as also demonstrated by the distinct phylogenetic trees for the S1 and S2 regions [19]. Also, HCoV-229E acquired a deletion in the S gene compared to bat viruses [19]. Recent sequencing of several of such viruses showed that this deletion is also present in the alpaca CoV S gene and in camel-derived alpha-CoVs [22]. This finding is particularly interesting because deletions in the NTD are associated with changes in tissue tropism in TGEV (transmissible gastroenteritis virus): in this porcine virus the spike has dual tropism for the respiratory and intestinal tracts, but the N-terminally deleted variants from PRCV (porcine respiratory coronavirus) only infect the respiratory tract 50, 51. In chiroptera, CoVs are mainly restricted to the gastrointestinal tract, whereas in humans and camelids, the upper and lower respiratory airways are infected. It will be important to determine whether the S gene deletion in HCoV-229E and camelid alphaCoVs is indeed responsible for a change in tissue tropism.
HCoV-NL63
Recombination contributed to shaping the diversity of the S gene among HCoV-NL63 viruses. Recombination between an ancestral HCoV-NL63 virus and the related PEDV was also detected in the M gene that, in its 3′ portion, is more similar to PEDV than to HCoV-229E [52]. Like SARS-CoV, HCoV-NL63 uses its RBD to bind ACE2. The binding site on the cellular receptor is the same for the two viruses but the RDBs show no sequence similarity. Interestingly, the RBDs of SARS-CoV and HCoV-NL63 do not display any structural similarity either: HCoV-NL63 contacts ACE2 with three discontinuous beta-loops, whereas SARS-CoV binds the receptor through a continuous subdomain [53] (Figure 3B). These observations suggest that the two viruses independently acquired the ability to bind the same ACE2 region via convergent evolution or that they shared an ACE2-binding ancestor long ago. Strikingly, TGEV, which is phylogenetically related to HCoV-NL63, uses two regions corresponding to the HCoV-NL63 beta-loops to bind a distinct cellular receptor, ANPEP (Figure 1B, Figure 3B) [54]. Finally, HCoV-229E, sharing sequence homology with HCoV-NL63 and TGEV (Figure 1A), binds ANPEP, but engages a region distinct from that bound by TGEV [55]. Overall, these data highlight the extraordinary plasticity of CoV RBDs, and their complex evolutionary dynamics whereby divergent evolution can be followed by convergent adaptation to the same receptor. This complexity is further expanded by the ability of some CoVs to use other cellular attachment molecules to complement the function of the RBD. Indeed, the S protein of HCoV-NL63 exploits heparan sulfate proteoglycans to adhere to host cells [56]. Interestingly, a similar ability to bind heparan sulfate can be gained by MHV with relatively few in vitro-acquired mutations in the S protein [57]. In line with the view that heparan sulfate is an aspecific receptor, the mutant MHV viruses display expanded host tropism [57], highlighting the potential relevance of combinatorial receptor usage or receptor shifts for interspecies transmission.
HCoV-OC43
Recombination seems to be rampant in HCoV-OC43 viruses and contributed to originate the A to E viral genotypes, as well as viruses that do not belong to these major genotypes 58, 59, 60. To our knowledge, no study has analyzed the fitness of recombinant viruses or, more generally, of viruses belonging to distinct genotypes. Nonetheless, two reports indicated that genotype D has become predominant in the East Asian population 58, 59. Whether this is due to population acquired immunity against the older A and B genotypes or to viral features unrelated to antigenicity remains to be determined.
The active recombination in HCoV-OC43 suggests that inference of natural selection is best performed by analysis of sequences belonging to the same genotype. In one such analysis, positive selection was found to act on the S gene of genotype D viruses [61]. Interestingly, several positively selected sites with high posterior probability of positive selection are located in the NTD, which is involved in the binding of sialic acids.
A positively selected site was located in the CTD, a region that has unclear function in the HCoV-OC43 S protein, as no known protein receptor has been identified to date [61]. However, recent data from HKU1 suggest that, by analogy, a protein receptor for HCoV-OC43 may exist [62] (see below).
HCoV-HKU1
The structure of the S protein of HKU1 was recently solved; the glycan-binding site is located in the NTD and is conserved with bovine coronavirus (BCoV) S1 [63]. Nonetheless, antibodies against the CTD, but not those against the NTD, block HKU1 infection of human tracheal–bronchial epithelial cells, suggesting that the CTD is the major RBD, and that a protein receptor for HKU1 exists [62]. In analogy to HCoV-NL63, glycans may mediate only the initial attachment to the host cells.
A recent survey of HKU1 clinical isolates from different geographic origins indicated that most viruses from Colorado form a subclade in the HKU1 phylogeny and carry three distinctive substitutions in the S protein within the NTD, CTD, and close to the S1/S2 cleavage site (W197F, F613Y, and H716D, respectively) [64]. It will be interesting to assess whether these differences are functional and derive from a selective process.
Concluding Remarks
Thanks to high-throughput techniques, a large number of complete CoV genomes have become available to the scientific community, and many more will be coming in the near future. Field studies have contributed enormously to widen our knowledge on the diversity of CoVs hosted by different vertebrates, and epidemiological surveys have provided CoV sequences from distinct geographic areas and associated with different disease phenotypes. In parallel, resources have been created to store and mine these data (e.g., The Virus Pathogen Database and Analysis Resource, ViPR [65]). These advances have allowed tracing the evolutionary history of the large and complex CoV genomes to an unprecedented detail. The emerging picture indicates that CoV genomes display high plasticity in terms of gene content and recombination. The long CoV genome expands the sequence space available for adaptive mutation, and the spike protein can adapt with relative ease to exploit different cellular receptors. These features are likely to underlie the alarming propensity of CoVs for host jumps. Despite these advances, major challenges remain (see Outstanding Questions). Efforts to underscore the viral genetic determinants that favor interspecies transmission should be pursued as an effective strategy to prevent or prepare for future HCoV emergence.
Glossary
dN | the observed number of nonsynonymous substitutions per nonsynonymous site. |
dS | the observed number of synonymous substitutions per synonymous site. |
Hemagglutinin-esterases (HEs) | a family of viral proteins that mediate binding to O-acetylated sialic acids. |
Homology | the relationship between elements (e.g., genes, proteins) deriving from a common ancestor. |
Lectins | a group of proteins with carbohydrate recognition activity. Lectins are categorized in many distinct families depending on structural and functional properties. |
Maximum likelihood (ML) | is a statistical method for estimating population parameters from a data sample. Given one or more unknown parameters and a sample data, the ML estimates of the parameters are the values maximizing the probability of obtaining the observed data. |
Open reading frame (ORF) | the part of a reading frame that contains no stop codons. An ORF is a continuous stretch of nucleotide triplets that have the potential to code for a protein or a peptide. |
Phosphodiesterases (PDEs) | are enzymes that break a phosphodiester bond. PDEs belonging to the 2H family are characterized by two H-Φ-[S/T]-Φ motifs (where Φ is a hydrophobic residue) separated by an average of 80 residues. |
Positive selection | the accumulation of favorable amino acid-replacing substitutions, which results in more nonsynonymous changes than expected under neutrality (dN/dS>1). |
Purifying selection | the elimination of deleterious amino acid-replacing substitutions, which results in fewer nonsynonymous changes than expected under neutrality (dN/dS < 1) (it is also referred to as negative selection). |
Viroporins | hydrophobic viral proteins that can promote the formation of channels following insertion into the host cell membrane and oligomerization. |