-
PDF
- Split View
-
Views
-
Cite
Cite
Pavol Sulo, Dana Szabóová, Peter Bielik, Silvia Poláková, Katarína Šoltys, Katarína Jatzová, Tomáš Szemes, The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the ‘yeast mitochondrial genetic code’, DNA Research, Volume 24, Issue 6, December 2017, Pages 571–583, https://doi.org/10.1093/dnares/dsx026
- Share Icon Share
Abstract
The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species.
1. Introduction
Because of advanced genetics and simple and easy production of mutants, the yeast Saccharomyces cerevisiae is the most widely used unicellular eukaryotic model organism. It is the best known member of the genus Saccharomyces with accepted species: S. cerevisiae, S. cariocanus, S. paradoxus, S. mikatae, S. kudriavzevii, S. arboricolus, S. bayanus var. bayanus, S. bayanus var. uvarum and S. pastorianus.1 These yeast species are involved in fermenting sugars and produce enormous amounts of ethanol; however, most of the strains used to manufacture beer, wine or fuel ethanol are interspecific hybrids.2–4 Therefore, the taxonomic classifications of alloploid species S. pastorianus and S. bayanus do not agree with the outcome of whole genome sequencing.4,5
These discrepancies arose exclusively from the comparison of nuclear chromosomes. The data for much smaller mitochondrial genomes encoding only eight proteins required for oxidative phosphorylation, one ribosomal protein, two ribosomal RNAs (rRNA), the RNA component of RNase P, and a set of 24 tRNA genes were largely overlooked or unavailable.6,7 Despite large numbers of nuclear genome sequences now accessible for different Saccharomyces species and isolates4,8,–10 completed mitochondrial genome sequences are known only for S. cerevisiae,6,S. paradoxus,11,S. pastorianus12 and Saccharomyces eubayanus.5 Recently, mitochondrial DNA (mtDNA) from S. bayanus and S. pastorianus were sequenced but not annotated.13 The problem is associated with the short DNA reads (100–200 nt) generated by the most next-generation sequencing platforms in genomic studies, especially by Illumina MiSeq.14–17 This approach can be used routinely for sequencing mtDNA from different yeast species18 but not from many species of Saccharomyces.9,16,17 Their mitochondrial genomes can be assembled only by combination of different gap-filling programmes19 or by extensive manual editing.20,21
The purpose of this work was to obtain complete mitochondrial genomes for all species assigned to Saccharomyces with the aim of comparing their genetic content to that of S. cerevisiae, as many different Saccharomyces mtDNAs can substitute, to some extent, for the original molecule in S. cerevisiae.22
2. Materials and methods
2.1. Yeast strains and mtDNA sequencing
Sequenced strains are being used as taxonomic standards (type strains)1 in whole genome studies,4,10 or their mtDNA is being transferred to S. cerevisiae in order to prepare xenomitochondrial cybrids.22 Their origin is described in detail in reference.22,Saccharomyces arboricolus NRRL Y-63701 and S. pastorianus NRRL Y-27171NT were kindly provided by the Agricultural Research Service Culture Collection, US Department of Agriculture, Peoria, IL, USA. Yeast cells were grown at 28 °C in YPD medium (2% glucose, 1% yeast extract and 1% peptone). The mtDNA was purified mainly by differential centrifugation23 or alternatively by bisbenzimide/CsCl buoyant density centrifugation.24,25 mtDNA sequencing libraries were prepared using the Nextera Library Preparation Kit (Illumina, San Diego, USA) and sequenced with average genome coverage between 176 and 3,698× on an Illumina MiSeq sequencing system using a MiSeq Reagent kit v3 (Illumina) with reads 2 × 100 bp in length. The sequencing summary is listed in Table 1.
Strain . | Paired reads . | Average length . | Coverage Xa . | DNA . | Contigs/nts . | Coverage % . | mtDNA size . |
---|---|---|---|---|---|---|---|
S. cerevisiae NRRL Y-12632T | 2,578,141 | 147 | 4,234 | Df | 5/87,765 | 98 | 89,507 |
S. paradoxus CBS 2908 | 1,580,034 | 96 | 2,283 | Gr | 5/66,147 | 99.6 | 66,436 |
S. paradoxus CBS 7400 | 2,242,369 | 96 | 3,014 | Gr | 2/70,538 | 98.7 | 71,419 |
S. cariocanus CBS 7994T | 1,897,384 | 96 | 2,354 | Gr | 10/75,729 | 98 | 77,380 |
S. mikatae CBS 8839T | 413,405 | 189 | 917 | Df | 2/85,137 | 99.9 | 85,211 |
S. kudriavzevii CBS 8840T | 2,259,934 | 96 | 2,692 | Gr | 3/80,240 | 99.5 | 80,588 |
S. arboricolus NRRL Y-63701 | 535,328 | 139 | 1,073 | Df | 9/70,040 | 101 | 69,363 |
S. pastorianus NRRL Y-27171T | 555,165 | 158 | 1,271 | Df | 7/68,800 | 99.7 | 69,019 |
S. bayanus CBS 380T | 1,221,753 | 85 | 1,604 | Gr | 1/64,742 | 100 | 64,736 |
S. uvarum CBS 395Tb | 25,298,820 | 36 | 14,059 | – | 34/51,450 | 79.4 | 64,779 |
Strain . | Paired reads . | Average length . | Coverage Xa . | DNA . | Contigs/nts . | Coverage % . | mtDNA size . |
---|---|---|---|---|---|---|---|
S. cerevisiae NRRL Y-12632T | 2,578,141 | 147 | 4,234 | Df | 5/87,765 | 98 | 89,507 |
S. paradoxus CBS 2908 | 1,580,034 | 96 | 2,283 | Gr | 5/66,147 | 99.6 | 66,436 |
S. paradoxus CBS 7400 | 2,242,369 | 96 | 3,014 | Gr | 2/70,538 | 98.7 | 71,419 |
S. cariocanus CBS 7994T | 1,897,384 | 96 | 2,354 | Gr | 10/75,729 | 98 | 77,380 |
S. mikatae CBS 8839T | 413,405 | 189 | 917 | Df | 2/85,137 | 99.9 | 85,211 |
S. kudriavzevii CBS 8840T | 2,259,934 | 96 | 2,692 | Gr | 3/80,240 | 99.5 | 80,588 |
S. arboricolus NRRL Y-63701 | 535,328 | 139 | 1,073 | Df | 9/70,040 | 101 | 69,363 |
S. pastorianus NRRL Y-27171T | 555,165 | 158 | 1,271 | Df | 7/68,800 | 99.7 | 69,019 |
S. bayanus CBS 380T | 1,221,753 | 85 | 1,604 | Gr | 1/64,742 | 100 | 64,736 |
S. uvarum CBS 395Tb | 25,298,820 | 36 | 14,059 | – | 34/51,450 | 79.4 | 64,779 |
Strain . | Paired reads . | Average length . | Coverage Xa . | DNA . | Contigs/nts . | Coverage % . | mtDNA size . |
---|---|---|---|---|---|---|---|
S. cerevisiae NRRL Y-12632T | 2,578,141 | 147 | 4,234 | Df | 5/87,765 | 98 | 89,507 |
S. paradoxus CBS 2908 | 1,580,034 | 96 | 2,283 | Gr | 5/66,147 | 99.6 | 66,436 |
S. paradoxus CBS 7400 | 2,242,369 | 96 | 3,014 | Gr | 2/70,538 | 98.7 | 71,419 |
S. cariocanus CBS 7994T | 1,897,384 | 96 | 2,354 | Gr | 10/75,729 | 98 | 77,380 |
S. mikatae CBS 8839T | 413,405 | 189 | 917 | Df | 2/85,137 | 99.9 | 85,211 |
S. kudriavzevii CBS 8840T | 2,259,934 | 96 | 2,692 | Gr | 3/80,240 | 99.5 | 80,588 |
S. arboricolus NRRL Y-63701 | 535,328 | 139 | 1,073 | Df | 9/70,040 | 101 | 69,363 |
S. pastorianus NRRL Y-27171T | 555,165 | 158 | 1,271 | Df | 7/68,800 | 99.7 | 69,019 |
S. bayanus CBS 380T | 1,221,753 | 85 | 1,604 | Gr | 1/64,742 | 100 | 64,736 |
S. uvarum CBS 395Tb | 25,298,820 | 36 | 14,059 | – | 34/51,450 | 79.4 | 64,779 |
Strain . | Paired reads . | Average length . | Coverage Xa . | DNA . | Contigs/nts . | Coverage % . | mtDNA size . |
---|---|---|---|---|---|---|---|
S. cerevisiae NRRL Y-12632T | 2,578,141 | 147 | 4,234 | Df | 5/87,765 | 98 | 89,507 |
S. paradoxus CBS 2908 | 1,580,034 | 96 | 2,283 | Gr | 5/66,147 | 99.6 | 66,436 |
S. paradoxus CBS 7400 | 2,242,369 | 96 | 3,014 | Gr | 2/70,538 | 98.7 | 71,419 |
S. cariocanus CBS 7994T | 1,897,384 | 96 | 2,354 | Gr | 10/75,729 | 98 | 77,380 |
S. mikatae CBS 8839T | 413,405 | 189 | 917 | Df | 2/85,137 | 99.9 | 85,211 |
S. kudriavzevii CBS 8840T | 2,259,934 | 96 | 2,692 | Gr | 3/80,240 | 99.5 | 80,588 |
S. arboricolus NRRL Y-63701 | 535,328 | 139 | 1,073 | Df | 9/70,040 | 101 | 69,363 |
S. pastorianus NRRL Y-27171T | 555,165 | 158 | 1,271 | Df | 7/68,800 | 99.7 | 69,019 |
S. bayanus CBS 380T | 1,221,753 | 85 | 1,604 | Gr | 1/64,742 | 100 | 64,736 |
S. uvarum CBS 395Tb | 25,298,820 | 36 | 14,059 | – | 34/51,450 | 79.4 | 64,779 |
2.2. mtDNA assembly and annotation
Sequencing provided several hundred megabytes of data in more than one million reads, 90–190 nt long. Paired reads were trimmed, short reads (often with erroneous sequences) removed and assembled into individual contigs using CLC Genomics Workbench 9.5 (Qiagen, Hilden, DE). From thousands of contigs with an average size of ∼1,500 nt, only those containing mtDNA by BLASTN comparison to already known mtDNA from related yeasts (S. cerevisiae NC001224; S. paradoxus JQ862335, S. pastorianus EU852811) were selected. Individual contigs with mtDNA segments were assembled into a single molecule using the Vector NTI v.9.0 (v.10) software package from InforMax, Inc. Overall, 1–10 contigs were obtained covering 98–101% of the entire sequence, depending on the purification method (Table 1). All mtDNAs were interrupted in the redundant intergenic regions, mostly in the origins of replication (ori ∼270 nt) and GC clusters. Contigs were manually edited and linked together from individual reads as previously reported.21 Some gaps were sealed by direct sequencing using fluorescent dye terminator sequencing chemistry. Assembly of S. mikatae CBS 8839 (NRRL Y-27341T) mtDNA was confirmed by direct sequencing as previously described.11 The assembled mitochondrial genomes were evaluated by mapping back the raw reads using CLC Genomics Workbench 9.5. However, a few substitutions, indels, and the occasional GC cluster can be found if our mtDNA is aligned to the mtDNA obtained from the same strains by different groups.13
Rough gene annotation was carried out with MFannot (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl).27 Precise exon and intron positions were corrected according to comparison with annotated sequences from several strains of S. cerevisiae and S. paradoxus.6,11,20 Intron and endonuclease nomenclature was according to28 and the intergenic open reading frames (ORFs) are numbered according to references.6,29 The sizes of rRNA were inferred from the model elaborated for their equivalents from S. cerevisiae mitochondria.30,31 The 5′ and 3′ ends of rnpB RNA were deduced from the secondary structure consensus.32 Gene nomenclature followed the rules described in GOBASE.33
Saccharomyces arboricolus CBS 10644 mtDNA was assembled from the sequence CM0015799 by manual editing from individual reads (EMBL: ERP001702, ERP001703, ERP001704). Saccharomyces uvarum CBS 395T was assembled from SRR147290 reads downloaded from the Sequence Read Archive, NCBI.26
2.3. Phylogeny
Phylogenetic relationships were analysed by the Maximum likelihood phylogeny PhyML program included in the CLC Genomics Workbench 9.5 package.34 Phylogenetic trees were constructed from unambiguously aligned portions of the concatenated DNA sequences coding for proteins using the neighbor-joining algorithm and the best model (GTR + G + I) selected by model testing tool (hLRT, BIC, AIC, AICc).34 The stability of individual branches was assessed by the bootstrap method.35 DNA concatemers for mitochondrial protein coding sequences belonging to other Saccharomyces strains were extracted from GenBank deposits. The nuclear DNA-derived phylogenetic tree was constructed from protein-coding genes used in population studies.36–39 Because of the hybrid alloploid nature of S. bayanus, S. uvarum and S. pastorianus we used gene sequences from the species that are the least related to S. cerevisiae variants. Corresponding DNA was obtained from assembled contigs occasionally sealed manually from reads (S. cerevisiae NRRL Y-12632T, S. cariocanus CBS 7994T, S. arboricolus NRRL Y-63701, S. paradoxus CBS 2908, S. bayanus CBS 380T). Other concatemers were obtained from sequences, scaffolds (contigs) deposited in the Saccharomyces genome database (http://www.yeastgenome.org/) for S. cerevisiae S288c, S. paradoxus CBS 432,6 for S. pastorianus Weihenstephan 34/70,12 for S. kudriavzevii CBS 8840T and S. mikatae CBS 883940 and for S. arboricolus CBS 10644.9 Data for S. bayanus CBS 380 were obtained from contigs assembled from SRR147291 and for S. uvarum CBS 395T were obtained from contigs assembled from SRR147290.26 Data for S. pastorianus NRRL Y-27171NT (CBS 1538) were compiled from contigs assembled by us and scaffolds BBYX01000001.13 Data for S. eubayanus CBS 12357T were obtained from the final genome assembly and annotations in GenBank JMCK00000000.5 Data accessibility. DNA sequences: GenBank accession nos KY095834–KY095933.
3. Results and discussion
3.1. General features of genes of the genetic code and GUG as a translation start
We have successfully completed mitochondrial genomes for nine Saccharomyces species, and their general features are listed in Table 2 (GenBank accession numbers KX657740–KX657750). All mtDNAs are circularly mapped and their size decreases with phylogenetic distance from S. cerevisiae (89.5 kb) to ∼64.7 kb for S. bayanus. GC content varies from 14% for S. paradoxus strains to 19.3% for S. pastorianus and correlates positively with the presence of introns and negatively with the size of AT-rich intergenic spacers. The mtDNAs contain the basic set of genes known in Saccharomyces, coding for the components of cytochrome oxidase (cox1, cox2, cox3), cytochrome b (cob), subunits of ATPase (atp6, atp8, atp9), both rRNA subunits (rnl, rns), the rps3 gene for ribosomal protein, the rnpB gene for the RNA subunit of RNase P and the tRNA package24 (Fig. 1).
Species . | Strain . | Size (bp) . | GC (%) . | Genes (%) . | Spacers (%) . | Introns (%) . | Accession number . |
---|---|---|---|---|---|---|---|
S. cerevisiae | NRRL Y-12632NT | 89,507 | 16.8 | 15.6 | 64.1 | 20.3 | KX657745 |
S. paradoxus | CBS 2908 | 66,436 | 14.0 | 20.3 | 70.7 | 9.0 | KX657748 |
S. paradoxus | CBS 7400 | 71,419 | 14.5 | 18.9 | 65.6 | 15.6 | KX657749 |
S. cariocanus | CBS 7994T | 77,380 | 15.6 | 17.8 | 69.8 | 12.5 | KX657744 |
S. mikatae | CBS 8839T | 85,211 | 16.1 | 16.1 | 62.5 | 21.4 | KX657747 |
S. kudriavzevii | CBS 8840T | 80,588 | 16.0 | 17.1 | 68.9 | 13.9 | KX657746 |
S. arboricolus | NRRL Y-63701 | 69,363 | 14.6 | 19.7 | 71.4 | 8.9 | KX657741 |
S. arboricolusa | CBS 10644T | 71,124 | 14.5 | 19.1 | 69.6 | 11.3 | KX657740a |
S. pastorianus | NRRL Y-27171NT | 69,019 | 19.3 | 19.9 | 65.9 | 14.2 | KX657750 |
S. bayanus | CBS 380T | 64,736 | 16.3 | 21.0 | 63.9 | 15.1 | KX657743 |
S. uvarum | CBS 395T | 64,779 | 16.4 | 21.0 | 63.9 | 15.1 | KX657742 |
Species . | Strain . | Size (bp) . | GC (%) . | Genes (%) . | Spacers (%) . | Introns (%) . | Accession number . |
---|---|---|---|---|---|---|---|
S. cerevisiae | NRRL Y-12632NT | 89,507 | 16.8 | 15.6 | 64.1 | 20.3 | KX657745 |
S. paradoxus | CBS 2908 | 66,436 | 14.0 | 20.3 | 70.7 | 9.0 | KX657748 |
S. paradoxus | CBS 7400 | 71,419 | 14.5 | 18.9 | 65.6 | 15.6 | KX657749 |
S. cariocanus | CBS 7994T | 77,380 | 15.6 | 17.8 | 69.8 | 12.5 | KX657744 |
S. mikatae | CBS 8839T | 85,211 | 16.1 | 16.1 | 62.5 | 21.4 | KX657747 |
S. kudriavzevii | CBS 8840T | 80,588 | 16.0 | 17.1 | 68.9 | 13.9 | KX657746 |
S. arboricolus | NRRL Y-63701 | 69,363 | 14.6 | 19.7 | 71.4 | 8.9 | KX657741 |
S. arboricolusa | CBS 10644T | 71,124 | 14.5 | 19.1 | 69.6 | 11.3 | KX657740a |
S. pastorianus | NRRL Y-27171NT | 69,019 | 19.3 | 19.9 | 65.9 | 14.2 | KX657750 |
S. bayanus | CBS 380T | 64,736 | 16.3 | 21.0 | 63.9 | 15.1 | KX657743 |
S. uvarum | CBS 395T | 64,779 | 16.4 | 21.0 | 63.9 | 15.1 | KX657742 |
aAssembled from individual reads (EMBL: ERP001702, ERP001703, ERP001704).9
Species . | Strain . | Size (bp) . | GC (%) . | Genes (%) . | Spacers (%) . | Introns (%) . | Accession number . |
---|---|---|---|---|---|---|---|
S. cerevisiae | NRRL Y-12632NT | 89,507 | 16.8 | 15.6 | 64.1 | 20.3 | KX657745 |
S. paradoxus | CBS 2908 | 66,436 | 14.0 | 20.3 | 70.7 | 9.0 | KX657748 |
S. paradoxus | CBS 7400 | 71,419 | 14.5 | 18.9 | 65.6 | 15.6 | KX657749 |
S. cariocanus | CBS 7994T | 77,380 | 15.6 | 17.8 | 69.8 | 12.5 | KX657744 |
S. mikatae | CBS 8839T | 85,211 | 16.1 | 16.1 | 62.5 | 21.4 | KX657747 |
S. kudriavzevii | CBS 8840T | 80,588 | 16.0 | 17.1 | 68.9 | 13.9 | KX657746 |
S. arboricolus | NRRL Y-63701 | 69,363 | 14.6 | 19.7 | 71.4 | 8.9 | KX657741 |
S. arboricolusa | CBS 10644T | 71,124 | 14.5 | 19.1 | 69.6 | 11.3 | KX657740a |
S. pastorianus | NRRL Y-27171NT | 69,019 | 19.3 | 19.9 | 65.9 | 14.2 | KX657750 |
S. bayanus | CBS 380T | 64,736 | 16.3 | 21.0 | 63.9 | 15.1 | KX657743 |
S. uvarum | CBS 395T | 64,779 | 16.4 | 21.0 | 63.9 | 15.1 | KX657742 |
Species . | Strain . | Size (bp) . | GC (%) . | Genes (%) . | Spacers (%) . | Introns (%) . | Accession number . |
---|---|---|---|---|---|---|---|
S. cerevisiae | NRRL Y-12632NT | 89,507 | 16.8 | 15.6 | 64.1 | 20.3 | KX657745 |
S. paradoxus | CBS 2908 | 66,436 | 14.0 | 20.3 | 70.7 | 9.0 | KX657748 |
S. paradoxus | CBS 7400 | 71,419 | 14.5 | 18.9 | 65.6 | 15.6 | KX657749 |
S. cariocanus | CBS 7994T | 77,380 | 15.6 | 17.8 | 69.8 | 12.5 | KX657744 |
S. mikatae | CBS 8839T | 85,211 | 16.1 | 16.1 | 62.5 | 21.4 | KX657747 |
S. kudriavzevii | CBS 8840T | 80,588 | 16.0 | 17.1 | 68.9 | 13.9 | KX657746 |
S. arboricolus | NRRL Y-63701 | 69,363 | 14.6 | 19.7 | 71.4 | 8.9 | KX657741 |
S. arboricolusa | CBS 10644T | 71,124 | 14.5 | 19.1 | 69.6 | 11.3 | KX657740a |
S. pastorianus | NRRL Y-27171NT | 69,019 | 19.3 | 19.9 | 65.9 | 14.2 | KX657750 |
S. bayanus | CBS 380T | 64,736 | 16.3 | 21.0 | 63.9 | 15.1 | KX657743 |
S. uvarum | CBS 395T | 64,779 | 16.4 | 21.0 | 63.9 | 15.1 | KX657742 |
aAssembled from individual reads (EMBL: ERP001702, ERP001703, ERP001704).9
![The genetic organization of Saccharomyces mtDNA. For simpler comparison, the circular genomes, exported from Vector NTI, were aligned at the beginning of the large rRNA subunit (rnl). Protein-coding genes, ribosomal RNA, rnpB are marked as arrows and bar, tRNA genes as black lines, introns with white rectangles, intronic and free-standing ORFs by gray arrows and replication origins with black circles. Gene nomenclature follows the rules described in GOBASE (atp for ATP synthetase subunits, cox for cytochrome oxidase subunits, cob for cytochrome b, rns for small rRNA ribosomal subunit, rnl for large rRNA ribosomal subunit, T2, C, H, etc. for particular tRNA coding genes, rps3 for ribosomal protein and rnpB for the RNA subunit of RNase P). Sizes are given on the bottom line in kbp.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f1.jpeg?Expires=1722418866&Signature=dNgEnHw3PEOrlr7vKcDUG4PKwatpo4y8fqhrA5I-jf5KG2UJC0PtJjcUVuubKHcjgc4NRRVR~dVfygPwpaUhXSI42PUl0I~NdOkLnvI2SmEv5H71NC-0QlKLRlouAHSn2UZaHnptv7uY3lj~wuJtklwgc3uVB29YTqgRCzlHcS1d~qfXaK8vvgefHpuEQeqTh6DIIMY9W5IwNnAthEXDuqcqGvIxehIJEc6yC19IfoX5vgOc8YngFMN~vH9q2MyK9~UByupR4Gqz~PMPNCsd74LHNpdRMxA9yxCK5IYjVaGLUbk8WDR-RNK7YRZ8yna-qxh2F0aUvzgfUQ6WXwyX-Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The genetic organization of Saccharomyces mtDNA. For simpler comparison, the circular genomes, exported from Vector NTI, were aligned at the beginning of the large rRNA subunit (rnl). Protein-coding genes, ribosomal RNA, rnpB are marked as arrows and bar, tRNA genes as black lines, introns with white rectangles, intronic and free-standing ORFs by gray arrows and replication origins with black circles. Gene nomenclature follows the rules described in GOBASE (atp for ATP synthetase subunits, cox for cytochrome oxidase subunits, cob for cytochrome b, rns for small rRNA ribosomal subunit, rnl for large rRNA ribosomal subunit, T2, C, H, etc. for particular tRNA coding genes, rps3 for ribosomal protein and rnpB for the RNA subunit of RNase P). Sizes are given on the bottom line in kbp.
The genetic code in Saccharomyces mtDNA differs from the universal code by TGA being read as tryptophan, CTN as threonine, ATA as methionine and CGN for arginine, with CGA and CGC absent, that is recognized as the ‘yeast mitochondrial code’ (transl_table = 3).41–43 The trnR2 gene (anticodon ACG) was found in all Saccharomyces mtDNAs. Thus, all four CGN codons possess coding potential. However, the CGU codon is present in exons only once (besides S. arboricolus) coding for a non-conserved arginine residue in rps3 protein (Supplementary Material S1). In addition, the CGU codon occurs quite frequently in intron-coded or free-standing ORFs, whereas the CGG codon is present only in GC clusters located inside them. The arginine CGA and CGC codons considered as unassigned are also used sporadically in maturase/homing endonucleases (HE) reading frames coded by introns41–44 (Supplementary Material S1). The most convincing evidence is a CGC codon present in the cox1I2 reading frame in a number of S. cerevisiae strains including type strain NRRL Y-12632 and the well-studied model strain S288c. This reverse transcriptase/maturase protein is required for group II intron splicing from cox1 pre-mRNA as well as for intron mobility.45,46 Apparently, tRNA coded by the trnR2 gene is required in Saccharomyces pre-dominantly for the expression of mobile elements and is therefore gained and lost in different clades from the Saccharomyces/Kluyveromyces complex.47,48 Protein-coding genes start with ATG, with some exceptions. The translation start in the S. cariocanus cox3 gene is GTG (GUG), reported to be nearly as effective as the original ATG in the S. cerevisiae cox2 gene.49 Translation initiation at the nearby downstream AUG is not an option because mutagenesis of the regular initiation codon does not allow translation from this site.49 Instead, a stem loop structure in the mRNA sequence of the first six codons plays a role in the recognition of the translation start.50 The first 50 nucleotides in the cox3 genes of all sequenced Saccharomyces are identical, which emphasizes the reliability of GUG as the initiation start codon (Fig. 2A, B). Atypical start codons are present in many mitochondrial systems, but GUG as a start codon is known only in some birds, invertebrates, plants and protists, but not in yeast and molds.43,44 Also, a maternally inherited and practically homoplasmic mtDNA mutation, 8527 A > G, changing the initiation codon AUG into GUG, was observed in the human atp6 gene. The patient harbouring this mutation exhibited clinical symptoms of Leber hereditary optic neuropathy, but his mother was healthy.51 The second exception is free-standing ORF1s in S. kudriavzevii, S. mikatae, S. bayanus, S. uvarum, S. arboricolus and S. pastorianus, where GUG is used as a translation start site. This gene with unknown function codes for a hypothetical maturase-like protein, the start codon overlaps with the 3′ end of the cox2 gene and the reading frame extends 16 bp downstream of the cox2 stop codon. Consequently, the ‘yeast mitochondrial code’ should be updated to accept CGA and CGC as arginine codons and GTG as the alternative initiation codon (Fig. 2C). The preferable termination codon in exons is UAA, and UAG is used only occasionally in cox2 of S. kudriavzevii and S. mikatae and more often in intron-coded ORFs. Codon usage in the reading frames shows a strong bias towards codons ending with T or A (in 10:1), as previously observed in the mitochondrial genomes of other yeasts6,11 (Supplementary Material S1).
![GTG as the initiation codon and revised yeast mitochondrial code. (A) Initiation codons in cox3 gene. (B) Initiation codons in ORF1 gene. Sequences overlapping with cox2 3′end are underlined. (C) Revised yeast mitochondrial code. Exceptional codons are grey-shadowed.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f2.jpeg?Expires=1722418866&Signature=SpgDoQ6Eu60KZgGGiKjhyyMLskm-gsfEotlJNBWi5zP-LTZlRDSdADQT55of-QcQ7Vfc0TbJm66GAEz3YdPwJeOA2Td67w25V98tEsMG07GpOKwoL2Ed07opJD355tnibXYglDpyrf3xNsOS7ql4Bg0wN3bgn0JfuwPXdFMYGgHJwfnfxPvQaLyYJzEd2ih19yC7C0BWhBSpoEHor0SBrmJvk~E9lQ8oLJLx8HB29VEA20gTsgl-mpm5KX3Fq~fmM1WhPoGgIqDtOpkuWbArafTT0EcPNXZmmcWmFf0TZ0WKxlb7ueesHCmH2gsUdiJhvOSqGHTcXs60jhYSL-gqRQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
GTG as the initiation codon and revised yeast mitochondrial code. (A) Initiation codons in cox3 gene. (B) Initiation codons in ORF1 gene. Sequences overlapping with cox2 3′end are underlined. (C) Revised yeast mitochondrial code. Exceptional codons are grey-shadowed.
All sequenced mitochondrial genomes exhibit sequence conservation at the nucleotide level within exons (excluding extremely variable rps3), with at least 94% sequence identity between the most unrelated species and a substitution rate of 11%. The substitutions are mostly neutral and have a low impact on the protein sequence, as they generate nearly identical amino acid residues. Therefore, the overall protein identity, even among less-related species, is 97%. Interestingly, the atp9 protein sequence is identical in all sequenced species. The same DNA substitution rate (6%) in the other small gene, atp8, is responsible for the three changes at the protein level (Supplementary Material S2).
3.2 Introns and free-standing ORFs
Introns are present in Saccharomyces only within cox1, cob and rnl genes.22,27 Although the majority are group I introns, four of them belong to group II, and only two of these (cox1I1, cox1I2) code for a protein with the maturase/reverse transcriptase motif. The intervening sequences are inserted into the same sites as has been described in other Saccharomyces, especially more than 100 strains of S. cerevisiae11,19,20,21,52 (Fig. 3). However, a detailed analysis indicates that the insertion site for cox1I3α should be shifted to position 242. Position 243 would generate lysine instead of an asparagine residue in the GGFGN section in both S. arboricolus molecules. This motif is conserved across the tree of life from yeast to humans. G-G-F-G-N-[WY]-[FL]-[MV] is a part of the well-recognized water membrane interface motif in cytochrome c and quinol oxidases (MeMotif—http://projects.biotec.tu-dresden.de/memotif).53
![Occurrence of introns and free-standing ORFs. Introns and free-standing ORFs with at least one motif characteristic for the homing endonucleases (LAGLIDADG, GIY-YIG, etc.) are in black; without reading frame are white; with truncated ORF interrupted by stop codon are spotted. Numbers indicate the base preceding the intron insertion site in the CDS. Positions of introns, known as mobile, in S. cerevisiae are marked in bold6; I—group I introns, II—group II introns. ORF1, ORF2 and ORF4 correspond to the nomenclature used in reference29 (their ORF5 is an ORF coded by the rnl ω intron). Only ORFs containing at least one endonuclease/maturase motif were considered.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f3.jpeg?Expires=1722418866&Signature=y9-ACK9ni8mucXDHazyGyN9Oa-UMHPvG1OXatfiYf24kg~dSWPmuKyR7ZwD4Fhg6kvOCe1OfbMPfkYubaaS42GzQiKjNzqzi8ls27j5G1s3zpYRwbwTMIegE64a4AVvU6nGN7tJmahJeFxla28P3wGuVY5DtPW95oxbiJb85f4JuHJNMecTroHJ4FY~rKD4VO9qEElnahXozYQtHYO-WmQGGZWOY1VVK~MyVWt-wzer~SukfMrmt7uCfA1HLC2Vavze2R0raRd83TwGR1XxI82RUZcuMl6UKunTESUkx9-KZ~jKq1lpJpfiT6~SzcgSFaXqM9Q03b2ZcB0gtbkeCjg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Occurrence of introns and free-standing ORFs. Introns and free-standing ORFs with at least one motif characteristic for the homing endonucleases (LAGLIDADG, GIY-YIG, etc.) are in black; without reading frame are white; with truncated ORF interrupted by stop codon are spotted. Numbers indicate the base preceding the intron insertion site in the CDS. Positions of introns, known as mobile, in S. cerevisiae are marked in bold6; I—group I introns, II—group II introns. ORF1, ORF2 and ORF4 correspond to the nomenclature used in reference29 (their ORF5 is an ORF coded by the rnl ω intron). Only ORFs containing at least one endonuclease/maturase motif were considered.
A number of introns also contain ORFs in phase with the upstream exon, sometimes interrupted by a frame-shift stop codon, often due to the insertion of GC clusters. Only those possessing at least one motif characteristic for the HE (LAGLIDADG, GIY-YIG, etc.) were considered as potentially active54,55 (Fig. 3).
However, there are two exceptions: cox1I5β and cobI1α, where an ORF coding for a nuclease/maturase motif is not fused to the upstream exon. cox1I5β is not known as mobile and the splicing requires at least five different nuclei-coded factors and an ORF from cobI3.6,56 The reason is that the 3′ splice site lies unusually far from the catalytic core of the intron; therefore, the ORF 357 amino acids (AA) long containing two LAGLIDADG motifs in the S288c strain was not considered as potentially active.56 But a recent S. cerevisiae mitochondrial transcriptome study revealed an alternative splice site that puts the cox1I5β ORF in frame with the upstream exon.57 We found the same ACTTATTATATATT consensus of the alternative splice site in several other Saccharomyces species (S. mikatae, S. cariocanus, S. bayanus, S. uvarum, S. pastorianus strains and S. eubayanus and S. paradoxus CBS 432; Fig. 4A).
![Alternative splicing of cox1I5β (A) and cobI1α (B). Arrows mark exons–introns boundaries. Consensus of alternative splice site as described in reference57 and hypothetical alternative splice site in cobI1α are underlined.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f4.jpeg?Expires=1722418866&Signature=VHqSTrgQar~OiONK4BoTNQFdFLxptsSCj0JUyksXoqmlwpSDuLJROBz1fZz1SdVK4oPn1rKWgx3HFOmQoCg9xVQJAumSHAbcgIkWe6r0LZG3Ud45m3W6PTScaxtq5Rg8SlKVrqd7HNrpVimoB3SsJgP0Ebi0Zf103tk-gGKBF-uziIZkH7lDL0or8stIdOw-1QjkEf0HlKsvl4LeqdCHEyiTGf6sIkrZrj2lwe7lP0dTqQs02is01v3oO5D92L3dV8wo2aZyg7X8LW2Tacu16jTZFXaCqGdyCYOiy3~aPTaxkwxzJ~bFZlfph~p90Gs~VjwxS-mQDS~IhXuqrjDn2g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Alternative splicing of cox1I5β (A) and cobI1α (B). Arrows mark exons–introns boundaries. Consensus of alternative splice site as described in reference57 and hypothetical alternative splice site in cobI1α are underlined.
Perhaps, a similar mechanism can be used in cobI1α. We found a hypothetical alternative splice site sharing sequence similarity AAATATTAG(G/A)T(A/T)AATTTA allowing the translation of an intron reading frame with an upstream exon (Fig. 4B). In S. cariocanus, S. pastorianus, S. mikatae and S. eubayanus it contains a 169–297 AA-long reading frame with one GIY-YIG endonuclease maturase motif.58
Introns, especially those containing HEs, represent a class of mobile elements spread horizontally into an intron-less allele by ‘homing’. They can be found in different species in the same position of the same gene in a chain-involved cyclical gain and loss.19,59,60 Consequently, they are randomly distributed among the different species.19,20 However, there are a few remarkable exceptions. Intron cox1I3β has not been found in S. cerevisiae although more than a hundred strains have been sequenced. Besides both S. arboricolus strains, cox1I3β is present in all other Saccharomyces species.19–21 Other evidence of uneven cox1I3β distribution comes from S. cerevisiae xenomitochondrial cybrids containing S. paradoxus mtDNA. They exhibit a slower growth rate on a non-fermentable carbon source, decreased respiration capacity and reduced cytochrome aa3 content associated with the inefficient splicing of cox1I3β.22 Apparently, the occurrence of cox1I3β is related to Dobzhansky–Muller nuclear incompatibilities in nucleo-mitochondrial communication that may result in the divergence of yeast species.22
Very rare is the orphan cox1I4γ intron that occurs only in S. bayanus CBS 380 and S. uvarum CBS 395. We were unable to find homologs in spite of extensive search in the literature and databases.19 Also, cox1I4α, well known as coding for HE I-Scell with a LAGLIDADG motif,61 was not identified in any other Saccharomyces but S. cerevisiae. Unlike cox1I4γ it can be found in less related species from the Saccharomyces/Kluyveromyces complex Naumovia castellii62 and Kluyveromyces marxianus species.19,63
Intron homing is catalyzed by HEs64,65 that have different evolutionary history and can exist alone.66,67 Such free-standing ‘endonucleases’, named ORFs,29 have already been described in S. cerevisiae and S. pastorianus mtDNA, but their occurrence is strain dependent.6,29,ORF1 orthologs are most frequent in different Saccharomyces species (Fig. 3). They code for one or two LAGLIDADG motifs and a DNA-binding helix-turn-helix domain.68 Because of the number of stop codons or a GC cluster insertion ORF2 is present mostly in truncated form.19,21 A reading frame coding for one LAGLIDADG motif was found only in S. cariocanus. We were unable to detect ORF3 in any ‘non-cerevisiae’ Saccharomyces. Apparently this dubious ORF does not encode a functional protein. Full size ORF4 can be found in S. cerevisiae and S. arboricolus. Among all previously identified small ORFs, regions with significant homology to ORF56 were found only in S. paradoxus11 and S. cariocanus. Therefore, ORF5 is considered as a probable genomic artifact,69 although very small proteins coded by mtDNA are expected, since at least two small proteins were reported to be coded by rns in mammalian mitochondria.70,71
3.3. Intergenic DNA, GC clusters and origins of replication
The putative origins of replication in S. cerevisiae and S. paradoxus are ∼270 bp in length and contain GC blocks separated by AT-rich stretches.6,72 According to novel bioinformatics analysis (strain S288c) it is a five-part consensus sequence [GGGGGAGGGGGTGGGTGAT ∼200 A/T-rich GGGTCCC 29 A/T-rich GGGACC] downstream of the 20-base promoter [DDWDWTAWAAGT↓ARTADDDD].57,Saccharomyces cerevisiae mtDNA contains seven or eight ori elements, depending on the strain, but only ori 2, 3 and 5 are active. They contain a transcription initiation site adjacent to the five-part consensus sequence providing a contiguous transcript of >350 nucleotides. The other ori sites are disabled by the insertion of a G-class GC cluster CCCGGTTTCTTACGAAACCGGGACCTCGGAGA(C/A)GT into the promoter site.57,73 The ori elements are present in all Saccharomyces species. Their number decreases with the phylogenetic distance from S. cerevisiae as the S. paradoxus species possesses eight, but species from the lager beer species/S. uvarum clade only four (Table 3). All of them can be potentially active as they are adjacent to the hypothetical promoter sequence DDWDWTAWAAGT↓ARTADDDD57 (Supplementary Material S4A).
Saccharomyces strain . | ori . | Number of GC clusters . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G . | V . | M1 . | M1′ . | M2 . | M2′ . | M2″ . | M3 . | M4 . | U . | SpS* . | Classified/total . | ||
S. cerevisiae | 7/2a | 5 | 4 | 48 | 7 | 46 | 1 | 6 | 11 | 4 | 50 | – | 127/177 |
NRRL Y-12632T | |||||||||||||
S. paradoxus | 8/8 | – | 8 | 20 | 1 | 12 | – | 1 | 1 | 1 | 30 | – | 46/76 |
CBS 2908 | |||||||||||||
S. paradoxus | 8/8 | – | 9 | 19 | 1 | 12 | 1 | 1 | 1 | 1 | 32 | – | 45/77 |
CBS 7400 | |||||||||||||
S. cariocanus | 7/7 | – | 37 | 10 | 2 | 23 | 5 | – | – | – | 72 | – | 77/149 |
CBS 7994T | |||||||||||||
S. mikatae | 7/7 | – | 13 | 32 | 7 | 30 | 1 | – | – | 10 | 50 | – | 90/143 |
CBS 8839T | |||||||||||||
S. kudriavzevii | 8/8 | – | 8 | 26 | 9 | 37 | 5 | – | – | – | 51 | – | 85/136 |
CBS 8840T | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 6 | 9 | 6 | 5 | 2 | – | – | 76 | 7 | 36/112 |
NRRL Y-63701 | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 7 | 2 | 9 | 5 | 6 | – | 2 | 63 | 6 | 38/105 |
CBS 10644T | |||||||||||||
S. pastorianus | 4/4 | 5 | 3 | 13 | – | 44 | 1 | 6 | 3 | 1 | 101 | 6 | 82/183 |
NRRL Y-27171T | |||||||||||||
S. bayanus | 4/4 | 5 | 13 | 16 | – | 17 | – | – | – | – | 38 | 7 | 57/95 |
CBS 380T | |||||||||||||
S. uvarum | 4/4 | 5 | 13 | 17 | – | 17 | – | – | – | – | 38 | 7 | 58/96 |
CBS 395T | |||||||||||||
S. cerevisiae | 8/3 | 5 | 4 | 55 | 7 | 37 | 9 | 13 | 3 | 9 | 48 | – | 127/177 |
S288c | |||||||||||||
S. eubayanus | 4/4 | 1 | – | 26 | – | 33 | – | – | – | – | 56 | 5 | 64/120 |
CBS 12357T | |||||||||||||
S. pastorianus | 4/4 | 6 | 3 | 13 | 1 | 50 | 3 | 10 | – | – | 85 | 9 | 86/171 |
WS 34/70 | |||||||||||||
S. paradoxus | 8/8 | – | 7 | 14 | – | 8 | – | 1 | 1 | 1 | 50 | – | 32/82 |
CBS 432T |
Saccharomyces strain . | ori . | Number of GC clusters . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G . | V . | M1 . | M1′ . | M2 . | M2′ . | M2″ . | M3 . | M4 . | U . | SpS* . | Classified/total . | ||
S. cerevisiae | 7/2a | 5 | 4 | 48 | 7 | 46 | 1 | 6 | 11 | 4 | 50 | – | 127/177 |
NRRL Y-12632T | |||||||||||||
S. paradoxus | 8/8 | – | 8 | 20 | 1 | 12 | – | 1 | 1 | 1 | 30 | – | 46/76 |
CBS 2908 | |||||||||||||
S. paradoxus | 8/8 | – | 9 | 19 | 1 | 12 | 1 | 1 | 1 | 1 | 32 | – | 45/77 |
CBS 7400 | |||||||||||||
S. cariocanus | 7/7 | – | 37 | 10 | 2 | 23 | 5 | – | – | – | 72 | – | 77/149 |
CBS 7994T | |||||||||||||
S. mikatae | 7/7 | – | 13 | 32 | 7 | 30 | 1 | – | – | 10 | 50 | – | 90/143 |
CBS 8839T | |||||||||||||
S. kudriavzevii | 8/8 | – | 8 | 26 | 9 | 37 | 5 | – | – | – | 51 | – | 85/136 |
CBS 8840T | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 6 | 9 | 6 | 5 | 2 | – | – | 76 | 7 | 36/112 |
NRRL Y-63701 | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 7 | 2 | 9 | 5 | 6 | – | 2 | 63 | 6 | 38/105 |
CBS 10644T | |||||||||||||
S. pastorianus | 4/4 | 5 | 3 | 13 | – | 44 | 1 | 6 | 3 | 1 | 101 | 6 | 82/183 |
NRRL Y-27171T | |||||||||||||
S. bayanus | 4/4 | 5 | 13 | 16 | – | 17 | – | – | – | – | 38 | 7 | 57/95 |
CBS 380T | |||||||||||||
S. uvarum | 4/4 | 5 | 13 | 17 | – | 17 | – | – | – | – | 38 | 7 | 58/96 |
CBS 395T | |||||||||||||
S. cerevisiae | 8/3 | 5 | 4 | 55 | 7 | 37 | 9 | 13 | 3 | 9 | 48 | – | 127/177 |
S288c | |||||||||||||
S. eubayanus | 4/4 | 1 | – | 26 | – | 33 | – | – | – | – | 56 | 5 | 64/120 |
CBS 12357T | |||||||||||||
S. pastorianus | 4/4 | 6 | 3 | 13 | 1 | 50 | 3 | 10 | – | – | 85 | 9 | 86/171 |
WS 34/70 | |||||||||||||
S. paradoxus | 8/8 | – | 7 | 14 | – | 8 | – | 1 | 1 | 1 | 50 | – | 32/82 |
CBS 432T |
aori2 considered as active is absent; G, V, M1, M1′, M2, M2′, M2″, M3 and M4 are GC cluster classes identified according to consensus described in detail in reference20 with various degrees of degeneration (Supplementary Material S3C); U—unclassified GC clusters defined as longer than ≥ 20 nt with GC content ≥ 40%; SpS—species-specific GC clusters found in the S. bayanus, S. uvarum, S. pastorianus clade consensus sequence TCGTNWCGYACCGTCCAATWGGACGGTACG and in S. arboricolus GGGGTCCC N(28–30) GGGGTCCC. Strains assembled in this work are marked in bold.
Saccharomyces strain . | ori . | Number of GC clusters . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G . | V . | M1 . | M1′ . | M2 . | M2′ . | M2″ . | M3 . | M4 . | U . | SpS* . | Classified/total . | ||
S. cerevisiae | 7/2a | 5 | 4 | 48 | 7 | 46 | 1 | 6 | 11 | 4 | 50 | – | 127/177 |
NRRL Y-12632T | |||||||||||||
S. paradoxus | 8/8 | – | 8 | 20 | 1 | 12 | – | 1 | 1 | 1 | 30 | – | 46/76 |
CBS 2908 | |||||||||||||
S. paradoxus | 8/8 | – | 9 | 19 | 1 | 12 | 1 | 1 | 1 | 1 | 32 | – | 45/77 |
CBS 7400 | |||||||||||||
S. cariocanus | 7/7 | – | 37 | 10 | 2 | 23 | 5 | – | – | – | 72 | – | 77/149 |
CBS 7994T | |||||||||||||
S. mikatae | 7/7 | – | 13 | 32 | 7 | 30 | 1 | – | – | 10 | 50 | – | 90/143 |
CBS 8839T | |||||||||||||
S. kudriavzevii | 8/8 | – | 8 | 26 | 9 | 37 | 5 | – | – | – | 51 | – | 85/136 |
CBS 8840T | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 6 | 9 | 6 | 5 | 2 | – | – | 76 | 7 | 36/112 |
NRRL Y-63701 | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 7 | 2 | 9 | 5 | 6 | – | 2 | 63 | 6 | 38/105 |
CBS 10644T | |||||||||||||
S. pastorianus | 4/4 | 5 | 3 | 13 | – | 44 | 1 | 6 | 3 | 1 | 101 | 6 | 82/183 |
NRRL Y-27171T | |||||||||||||
S. bayanus | 4/4 | 5 | 13 | 16 | – | 17 | – | – | – | – | 38 | 7 | 57/95 |
CBS 380T | |||||||||||||
S. uvarum | 4/4 | 5 | 13 | 17 | – | 17 | – | – | – | – | 38 | 7 | 58/96 |
CBS 395T | |||||||||||||
S. cerevisiae | 8/3 | 5 | 4 | 55 | 7 | 37 | 9 | 13 | 3 | 9 | 48 | – | 127/177 |
S288c | |||||||||||||
S. eubayanus | 4/4 | 1 | – | 26 | – | 33 | – | – | – | – | 56 | 5 | 64/120 |
CBS 12357T | |||||||||||||
S. pastorianus | 4/4 | 6 | 3 | 13 | 1 | 50 | 3 | 10 | – | – | 85 | 9 | 86/171 |
WS 34/70 | |||||||||||||
S. paradoxus | 8/8 | – | 7 | 14 | – | 8 | – | 1 | 1 | 1 | 50 | – | 32/82 |
CBS 432T |
Saccharomyces strain . | ori . | Number of GC clusters . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G . | V . | M1 . | M1′ . | M2 . | M2′ . | M2″ . | M3 . | M4 . | U . | SpS* . | Classified/total . | ||
S. cerevisiae | 7/2a | 5 | 4 | 48 | 7 | 46 | 1 | 6 | 11 | 4 | 50 | – | 127/177 |
NRRL Y-12632T | |||||||||||||
S. paradoxus | 8/8 | – | 8 | 20 | 1 | 12 | – | 1 | 1 | 1 | 30 | – | 46/76 |
CBS 2908 | |||||||||||||
S. paradoxus | 8/8 | – | 9 | 19 | 1 | 12 | 1 | 1 | 1 | 1 | 32 | – | 45/77 |
CBS 7400 | |||||||||||||
S. cariocanus | 7/7 | – | 37 | 10 | 2 | 23 | 5 | – | – | – | 72 | – | 77/149 |
CBS 7994T | |||||||||||||
S. mikatae | 7/7 | – | 13 | 32 | 7 | 30 | 1 | – | – | 10 | 50 | – | 90/143 |
CBS 8839T | |||||||||||||
S. kudriavzevii | 8/8 | – | 8 | 26 | 9 | 37 | 5 | – | – | – | 51 | – | 85/136 |
CBS 8840T | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 6 | 9 | 6 | 5 | 2 | – | – | 76 | 7 | 36/112 |
NRRL Y-63701 | |||||||||||||
S. arboricolus | 6/6 | – | 1 | 7 | 2 | 9 | 5 | 6 | – | 2 | 63 | 6 | 38/105 |
CBS 10644T | |||||||||||||
S. pastorianus | 4/4 | 5 | 3 | 13 | – | 44 | 1 | 6 | 3 | 1 | 101 | 6 | 82/183 |
NRRL Y-27171T | |||||||||||||
S. bayanus | 4/4 | 5 | 13 | 16 | – | 17 | – | – | – | – | 38 | 7 | 57/95 |
CBS 380T | |||||||||||||
S. uvarum | 4/4 | 5 | 13 | 17 | – | 17 | – | – | – | – | 38 | 7 | 58/96 |
CBS 395T | |||||||||||||
S. cerevisiae | 8/3 | 5 | 4 | 55 | 7 | 37 | 9 | 13 | 3 | 9 | 48 | – | 127/177 |
S288c | |||||||||||||
S. eubayanus | 4/4 | 1 | – | 26 | – | 33 | – | – | – | – | 56 | 5 | 64/120 |
CBS 12357T | |||||||||||||
S. pastorianus | 4/4 | 6 | 3 | 13 | 1 | 50 | 3 | 10 | – | – | 85 | 9 | 86/171 |
WS 34/70 | |||||||||||||
S. paradoxus | 8/8 | – | 7 | 14 | – | 8 | – | 1 | 1 | 1 | 50 | – | 32/82 |
CBS 432T |
aori2 considered as active is absent; G, V, M1, M1′, M2, M2′, M2″, M3 and M4 are GC cluster classes identified according to consensus described in detail in reference20 with various degrees of degeneration (Supplementary Material S3C); U—unclassified GC clusters defined as longer than ≥ 20 nt with GC content ≥ 40%; SpS—species-specific GC clusters found in the S. bayanus, S. uvarum, S. pastorianus clade consensus sequence TCGTNWCGYACCGTCCAATWGGACGGTACG and in S. arboricolus GGGGTCCC N(28–30) GGGGTCCC. Strains assembled in this work are marked in bold.
GC clusters were originally defined as regions 35 bp long on average and 45–62% GC74 and are in general the major source of noticeable intergenic region polymorphism.6 According to the characteristic primary structures, they have been assigned to seven families which exhibit varying degrees of homology.73,74 Recently, the M2″ class consensus was reduced to 14 bases (TCCGGCCGAAGGAG).20 Therefore, we took into consideration as GC clusters only regions longer than ≥ 20 nt with GC content ≥ 40%. The average number of GC clusters in S. cerevisiae is above 100 and their occurrence decreases in other Saccharomyces species20 from 90 to 30 (Table 3). A G-class ori-specific GC cluster has been found in any species from the S. bayanus, S. uvarum, S. pastorianus clade, but remote from ori. The most frequent are M1 and M2 classes, although members of other classes can be found. In addition, a species-specific GC-rich sequence (GGGGTCCC N(28–30) GGGGTCCC) was recognized in both S. arboricolus strains and a GC cluster with the consensus TCGTNWCGYACCGTCCAATWGGACGGTACG in species from the lager beer species/S. uvarum clade.
3.4. Transcription units and adjacent motifs
The highly-conserved sequence motif WTATAAGTA is known as the yeast mitochondrial transcription initiation site.6,75–77 Recently, transcriptome analysis confirmed experimentally the sequence DDWDWTAWAAGT↓ARTADDDD as the consensus promoter site in S. cerevisiae57 and 19 potential transcription initiation sites6 were reduced to 11. We found these consensus promoter sites in all sequenced Saccharomyces at approximately the same position as they were reported for S. cerevisiae. Because of the rearrangement of the trnS1-rps3 block in the mtDNA of the S. bayanus, S. uvarum, S. pastorianus clade, an additional transcription site was found upstream of the trnS1 gene (Supplementary Material S4A). When all hypothetical promoter sites are compared, the S. cerevisiae consensus57 is slightly degenerated mainly at the 5′ site to DNNDNTAWAAGT↓ARTADDDD that is more related to the original nonanucleotide WTATAAGTA in spite of the phage T7 origin of yeast mitochondrial RNA polymerase. Apparently, the 23 base-long phage T7 consensus promoter is reduced in Saccharomyces mtDNA.
The 3′ termini of mitochondrial mRNA cleavage sites in budding yeast are recognized according to the postulated dodecamer motif 5′-AAUAA (U/C) AUUCUU-3′ in the 3′ UTR region.77–79 A processing site was confirmed by transcriptome analysis where an alternative heptakaidecamer 5′AATAATATTCTTAT↓AGTCCGGCC↓CGCCC with part of the M2 GC cluster is recognized instead.57 Variations to the dodecamer have been found in all protein-coding transcription units at the 3′ UTR region of all Saccharomyces species, in most cases up to 200 nucleotides downstream from the termination codon (Supplementary Material S4B). A putative atp8 transcription termination site exhibits the most degenerated consensus sequence. Interesting features are the multiple cleavage sites for ORF1 present in S. mikatae, S. kudriavzevii and S. arboricolus. An alternative heptakaidecamer transcription processing site was found only in S. cerevisiae.
3.5. Gene order and synteny
The most profound feature is the alteration in gene order that involves mainly trnfM-rnpB-trnP, rns-trnW, trnE-cob and trnS1-rps3 gene clusters. S. cerevisiae NRRL Y-12632 mtDNA with a length of almost 90 kb belongs among the largest genomes and has the same gene order as is known in 105 other strains.20,21,S. paradoxus CBS 2908 and CBS 7400 mtDNA differs in size (66 kb versus 71 kb), associated with the intron occurrence in the cob gene. The gene order is the same as in 14 other strains.11,19 They differ from S. cerevisiae by the excision of the trnFT1V-cox3-trnfM-rnpB-trnP-rns-trnW segment that is translocated behind the trnS1-rps3 gene cluster where rns-trnW is inverted (Figs 1 and 5). Surprisingly, the 77 kb mtDNA of S. cariocanus is syntenic to S. cerevisiae, although these species can be distinguished according to the cox1I3β intron present in S. cariocanus (Figs 1 and 3). S. mikatae mtDNA is >85 kb long and the gene architecture is altered from S. cerevisiae by the movement and inversion of the trnfM-rnpB-trnP gene cluster behind the rns-trnW locus (Fig. 5). S. kudriavzevii mtDNA is ∼80 kb long and the gene order differs from S. cerevisiae by the excision of rns-trnW and its transplacement to the complementary strand at the 3′ end of the trnS1-rps3 gene cluster (Fig. 5). Both S. arboricolus mtDNAs are around 70 kb in size and the genome architecture exhibits very extensive reshuffling in relation to S. cerevisiae. The rns-trnW cluster is excised and transferred at the 3′ end of the atp9-trnS1-rps3 gene cluster. The remaining trnfM-rnpB-trnP-cox1-atp8-atp6 genome segment is translocated at the end of the cob gene (Figs 1 and 5). S. bayanus, S. eubayanus, S. uvarum and S. pastorianus share the same mtDNA architecture. Besides all alterations known for S. arboricolus the characteristic transcription unit atp9-trnS1-rps3 is broken and the trnS1-rps3 part is inverted at the same location (Figs 1 and 5). Strains from the lager beer species/S. uvarum clade despite their hybrid nature are classified as separate species, although they exhibit strikingly similar mtDNA sequences. Nearly identical are mtDNAs from S. uvarum CBS 395T and S. bayanus CBS 380T.
![Comparison of mitochondrial gene order in Saccharomyces. Individual gene clusters are highlighted by different shades of grey. Pentagon symbols indicate the transcription direction.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f5.jpeg?Expires=1722418866&Signature=4lBVKnbPC3-UKWtzswzzdy8d~ICkngrhqKc~P625dZ2PO9Ew3C2lgZyfTaW5l3JqpS-f4nZ-wPtrchXbaXoFEr7XkeebyQLCYJGii3J7VuO1GdLQAVbZmsZ0K7k9MFFSPzMVHVQLCczc~SNpWPAOdEp3gnsVLPN2SyWT36YVJWfGPiFaI8yTtxcJUNbqmZj6NMjIS4s~DA8mIuK6CwuJRteE8MG7H2olwI4sTjRKTRQSYCTFQ0N2y08XbMyMT0Yg1IZHDsKgtuXK6eRbi0GUStvsarOO2A2d7k3Vwfv5gdeXB~zKBBGy~iM3YzEWk3XVL~6V-LKVpl2LaK0Em6JLJg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Comparison of mitochondrial gene order in Saccharomyces. Individual gene clusters are highlighted by different shades of grey. Pentagon symbols indicate the transcription direction.
The alteration in gene order within yeast genera is not so frequent and is associated with the genome size. The majority of yeast mtDNA is relatively small (< 40 kbp) with the genes located on the same strand. The size of <15% of mtDNA exceeds 50 kb and the genes are scattered in conserved blocks (e.g. cox1-atp8-atp6, rnl-trn) coded in both strands with altered transcription orientation.7 Obviously, larger genomes like in Saccharomyces are more prone to rearrangements. An extremely high degree of gene conservation and synteny is the characteristic feature among and within the species possessing smaller mtDNA of the Lachancea,80,Torulaspora81 and Yarrowia clades.82 However, significant mitochondrial genome rearrangement was observed in yeasts with larger mtDNA from the Nakaseomyces83 and Dekkera/Brettanomyces genera.84 Apparently, alteration in mtDNA gene order depends most of all on the size of intergenic regions and is obvious if their ratio to genes exceeds 60%.7 The gene order alteration mechanism was deduced exclusively from the DNA sequence, proposing an intermediate with duplicate segments, associated with conserved gene blocks mimicking transcription units.11,84,85 This is not the case in S. bayanus, S. eubayanus, S. uvarum and S. pastorianus where the atp9-trnS1-rps3 transcription unit is broken apart and the transposition of trnS1-rps3 to the opposite strand requires formation of a de novo promoter.
3.6. Phylogeny and taxonomy
As mentioned above, currently S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. arboricolus, S. bayanus, S. uvarum and S. pastorianus are the only species accepted by taxonomists as Saccharomyces.1 However, whole (nuclear) genome studies demonstrated that S. pastorianus is an alloploid of S. cerevisiae and the newly described species S. eubayanus as well as S. bayanus is a hybrid between S. uvarum and S. eubayanus with a small contribution of S. cerevisiae.4,5,13,26,86,Saccharomyces cariocanus is reproductively isolated by four chromosomal translocations from S. paradoxus but not by sequence and therefore is considered to be a S. paradoxus subspecies.4,5,86 All these data came exclusively from nuclear genetic information, although mitochondrial genomes are more susceptible to mutations than their nuclear counterparts.
The accuracy of taxonomic classification as well as the evolution history can be inferred from phylogenetic trees derived from concatenated DNA or protein sequences of mitochondrial or nuclear origin. However, support for many lineages in present phylogenetic trees is often weak and more robust analyses of relationships will require whole genome comparisons.87 To shed light on these taxonomy–phylogeny pitfalls we constructed phylogenetic trees from unambiguously aligned portions of the concatenated mtDNA sequences coding for proteins (Fig. 6A). Because of limited gene availability, we used 10 nuclear genes as a good compromise, especially if they had been used in population studies36–39 (Fig. 6B). Branching in both trees does not correlate with basic taxonomic classification and the tree derived from the combined sequences of the D1/D2 LSU RNA gene and ITS used in taxonomy1 (Fig. 6). Phylogenetic comparison of mitochondrial and nuclear genes clearly demonstrates that S. cariocanus should be assigned as a separate species. It belongs in the nuclei-derived tree to the S. paradoxus clade; however, in the mitochondrial tree it forms a statistically significant separate branch. Considering its mtDNA gene order, which differs from that of S. paradoxus, and the presence of the cox1I3β intron, S. cariocanus should be designated as a separate species (Fig. 6). This conclusion, rather than horizontal transfer, supports the low spore viability obtained from crosses with the S. paradoxus tester as well as the absence of any S. cerevisiae genes in the nuclear genome.4,5,86 Multilocus phylogenies based on conserved mitochondrial genes (except extremely variable rps3) are largely representative; however, they occasionally do not correlate with trees derived from nuclear genes.7,84 The most plausible explanation for this discrepancy is different selection pressures for mitochondrial and nuclear genes, where complex patterns of hybridization and introgression are involved.7,84 However, the S. cariocanus paradox can be also explained as a partially developing divergence, because the genus Saccharomyces is considered as a continuum of taxa differentiating towards speciation.1
![Saccharomyces phylogeny. Both trees were constructed from unambiguously aligned concatenated DNA sequences using the Maximum likelihood phylogeny PhyML program. (A) mtDNA-derived phylogeny from DNA sequences coding for proteins in the order cox1, atp8, atp6, cob, atp9, cox2, cox3. (B) Nuclear DNA-derived phylogenetic tree from CCA1, CYT1, MLS1, RPS5, LAS1, MET4, NUP116, ZDS2, PDR10 and DSN1 protein-coding genes used in population studies.36–39 The branch length is proportional to the nucleotide differences indicated by the bar. The numbers given at the nodes are the frequencies of a given branch appearing in 1,000 bootstrap replications. All are above 50%, indicating good statistical support. Naumovia castellii NRRL Y-12630 was used as an outgroup.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/dnaresearch/24/6/10.1093_dnares_dsx026/1/m_dsx026f6.jpeg?Expires=1722418866&Signature=e3juhtC6XNp5uslWAN5kxDDsKFCTkkoNImHmDN9JHKVpfSqBzeEHTIh1W4VFu5~kISr2vLDzIGvquUP5lFcU-qBWo31K6mMgXDpfg2XDBrPjAIhY~IM44fnPDT2OUD7GPq-E1BqRbiEhXQSnuajWFO2j73q92iZ0ZqguGRFn4TEgutO-hFXQoOIGeJpVp36YYkjhguBqLbG-~SmOyEFMoDx4JTLQzIFhYtlj8YmR-5MICPlaeUsfJ9sPSTB6g0~n8m8RY6mGcfXvelVo5K5SIXm0P5wa1CJ7iMnx2HXErFtmazTVi1PQgzhhaYGE0YI-3vWux73eKTBoiqYvLoKMkA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Saccharomyces phylogeny. Both trees were constructed from unambiguously aligned concatenated DNA sequences using the Maximum likelihood phylogeny PhyML program. (A) mtDNA-derived phylogeny from DNA sequences coding for proteins in the order cox1, atp8, atp6, cob, atp9, cox2, cox3. (B) Nuclear DNA-derived phylogenetic tree from CCA1, CYT1, MLS1, RPS5, LAS1, MET4, NUP116, ZDS2, PDR10 and DSN1 protein-coding genes used in population studies.36–39 The branch length is proportional to the nucleotide differences indicated by the bar. The numbers given at the nodes are the frequencies of a given branch appearing in 1,000 bootstrap replications. All are above 50%, indicating good statistical support. Naumovia castellii NRRL Y-12630 was used as an outgroup.
Even more complicated relationships are found among the hybrid species from the lager beer species/S. uvarum clade (Fig. 6). It consists historically of two species associated with human activities. S. pastorianus is a typical bottom lager beer yeast and S. bayanus (CBS 380) was isolated from turbid beer. Only S. uvarum (CBS 395) was isolated from natural substrate (blackcurrant).26 The origin of S. pastorianus was studied in detail and has been elucidated by the discovery of a new species S. eubayanus.5 All modern S. pastorianus are aneuploid descendants of a tetraploid S. eubayanus/S. cerevisiae hybrid (2n, 2n) with limited loss of contribution from either parent.4,5,13,88 Apparently, the S. pastorianus strain sequenced in this work as well as a number of other strains inherited mtDNA exclusively from S. eubayanus mtDNA.13,89 The systematics of S. bayanus and S. uvarum has been confusing and controversial for decades. S. uvarum (S. bayanus var. uvarum) strains represent a nearly pure lineage that contains very little genetic input from other Saccharomyces species although type strain CBS 395 does not produce viable spores and sequencing revealed that the strain is aneuploid.26 However, a number of different strains (including CBS 7001 ‘genomic type strain’) are good sporulants and introgression from other Saccharomyces species is low.4,90,S. bayanus is not a ‘species’ in an evolutionary sense and should be rather understood as a product of the artificial brewing environment with no occurrence in nature. Therefore, all known strains of S. bayanus (including the type strain CBS 380) are hybrids of S. eubayanus and S. uvarum that contain contributions from S. cerevisiae in at least some cases.26 Comparison of mtDNA clearly demonstrates its S. uvarum origin in S. bayanus CBS 380. A similar conclusion was reported by91 when they compared part of the cox2 gene in the collection of S. eubayanus and its interspecies hybrids. A model of S. bayanus formation involves multiple hybridization events of S. pastorianus with wild strains of S. uvarum. In the case of S. bayanus CBS 380 it happened only recently as mtDNA from S. uvarum CBS 395T and S. bayanus CBS 380T is nearly identical. They only differ in four substitutions and five indels of which one is the insertion of a GC cluster and one a TTATTTAC repeat. Therefore, comparison of mtDNA should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species.
4. Conclusions
We sequenced mtDNA from a variety of Saccharomyces species by Illumina MiSeq. All are circularly mapped molecules decreasing in size with phylogenetic distance from S. cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing ORFs or GC clusters. Their most profound feature is species-specific alteration in gene order, apparently accompanying the speciation process in yeasts with larger mtDNA. Conserved mtDNA gene order seems to be a species-specific feature as pointed out by the comparison of more than 100 S. cerevisiae strains20,21 and ∼15 S. paradoxus strains19 as well as 50 strains from Lachancea thermotolerans.80 On the other hand, reshuffling of mitochondrial genes evidently accompanies the yeast speciation process if mitochondrial genomes are large enough.7 The genetic code differs from well-known yeast mitochondrial code as GUG is used as the translation start in the S. cariocanus cox3 gene as well as in some free-standing ORF1. Arginine CGA and CGC codons considered as unassigned are present in maturases/HE coded by introns. Because of the alternative splicing of cox1I5β and cobI1α, AUA considered as an initiation codon is absent. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that S. cariocanus should be assigned as a separate species and S. bayanus CBS 380 should not be considered as a distinct species due to mtDNA nearly identical to S. uvarum mtDNA. Apparently, comparison of mtDNAs should be included in genomic studies, as it is an important tool to understand the origin and evolutionary history of some yeast species.
Acknowledgements
The authors thank two anonymous reviewers for their valuable comments. The study is a result of implementation of project REVOGENE – Research Center for Molecular Genetics (ITMS 26240220067) supported by the Research & Development Operational Program funded by the ERDF. Part of this work was funded by grants from VEGA 1/0360/12 and 1/0048/16.
Conflict of interest
None declared.
Accession numbers
KX657740, KX657741, KX657742, KX657743, KX657744, KX657745, KX657746, KX657747, KX657748, KX657749, KX657750
Supplementary data
Supplementary data are available at DNARES online.
References
CLC Genomics Workbench 9.5.2. https://www.qiagenbioinformatics.com/
Author notes
Present address: Department of Membrane Biochemistry, Centre of Biosciences, Slovak Academy of Sciences, Bratislava 84005, Slovakia
Edited by Prof. Takashi Ito