Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(12):e48837.
doi: 10.1371/journal.pone.0048837. Epub 2012 Dec 12.

The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation

Affiliations

The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation

Konstantinos Mavromatis et al. PLoS One. 2012.

Abstract

Background: The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.

Methodology/principal findings: In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.

Conclusion: These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The distribution of projects among the 12 sequencing methods used.
With dark green color are indicated the projects for which there are more than 5 sequenced projects and were used in downstream analysis.
Figure 2
Figure 2. Assembly quality as assessed by the number of scaffolds in draft assemblies.
Data is shown for the six sequencing methods with more than 5 projects. Indicated are the range from upper to lower quartile (boxes), the median (thick black line), and the minimum/maximum values.
Figure 3
Figure 3. Assembly quality for the draft genomes included in this analysis.
Assembly quality is assessed by (a) the number of gaps in the draft assemblies, and (b) gap size expressed as a percentage of genome length. Data is shown for the six sequencing methods with more than 5 projects.
Figure 4
Figure 4. Genes missed in draft assemblies.
Data is shown for the sequencing methods with more than 5 projects. (a) Missed gene sequences, i.e., the number of genes in the finished genome whose nucleotide sequence is absent from the draft assembly. (b) Unrecognized genes, i.e., the number of genes whose nucleotide sequence is present in the draft assembly but that were not predicted by Prodigal (v2.5).
Figure 5
Figure 5. Misassemblies as detected by low gene quality.
Low quality genes are genes present in the finished genome that had a similarity (tBLASTn) to the draft genome but the alignment was either short (<50% of the gene length) or identity was <90%. Data is shown for the six sequencing methods with more than 5 projects.
Figure 6
Figure 6. Distributions of functions, based on COG group assignments, of gene sequences missing in draft assemblies.
Data is shown for six sequencing technologies; omitted is Illumina PacBio for which there are currently only eight genome projects without any missing genes.

Similar articles

Cited by

References

    1. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74: 5463–5467. - PMC - PubMed
    1. Elahi E, Ronaghi M (2004) Pyrosequencing: a tool for DNA sequencing analysis. Methods Mol Biol 255: 211–219 doi:10.1385/1-59259-752-1:211. - DOI - PubMed
    1. Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, et al. (2009) Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods 6: 291–295 doi:10.1038/nmeth.1311. - DOI - PMC - PubMed
    1. Eid J, Fehr A, Gray J, Luong K, Lyle J, et al. (2009) Real-Time DNA Sequencing from Single Polymerase Molecules. Science 323: 133–138 doi:10.1126/science.1162986. - DOI - PubMed
    1. Chain PSG, Grafham DV, Fulton RS, FitzGerald MG, Hostetler J, et al. (2009) Genome Project Standards in a New Era of Sequencing. Science 326: 236–237 doi:10.1126/science.1180614. - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding

This work was performed under the auspices of the US Department of Energy Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract no. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract no. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract no. DE-AC02-06NA25396, Oak Ridge National Laboratory under contract DE-AC05-00OR22725. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
-