Removing noise from pyrosequenced amplicons
- PMID: 21276213
- PMCID: PMC3045300
- DOI: 10.1186/1471-2105-12-38
Removing noise from pyrosequenced amplicons
Abstract
Background: In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms.
Results: AmpliconNoise outperforms alternative algorithms substantially reducing per base error rates for both the GS FLX and latest Titanium protocol. All three sources of error lead to inflation of diversity estimates. In particular, chimera formation has a hitherto unrealised importance which varies according to amplification protocol. We show that AmpliconNoise allows accurate estimates of OTU number. Just as importantly AmpliconNoise generates the right OTUs even at low sequence differences. We demonstrate that Perseus has very high sensitivity, able to find 99% of chimeras, which is critical when these are present at high frequencies.
Conclusions: AmpliconNoise followed by Perseus is a very effective pipeline for the removal of noise. In addition the principles behind the algorithms, the inference of true sequences using Expectation-Maximization (EM), and the treatment of chimera detection as a classification or 'supervised learning' problem, will be equally applicable to new sequencing technologies as they appear.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-1.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-2.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-3.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-4.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-5.gif)
![Figure 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-6.gif)
![Figure 7](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-7.gif)
![Figure 8](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3045300/bin/1471-2105-12-38-8.gif)
Similar articles
-
Pipeline for amplifying and analyzing amplicons of the V1-V3 region of the 16S rRNA gene.BMC Res Notes. 2016 Aug 2;9:380. doi: 10.1186/s13104-016-2172-6. BMC Res Notes. 2016. PMID: 27485508 Free PMC article.
-
Groundtruthing next-gen sequencing for microbial ecology-biases and errors in community structure estimates from PCR amplicon pyrosequencing.PLoS One. 2012;7(9):e44224. doi: 10.1371/journal.pone.0044224. Epub 2012 Sep 6. PLoS One. 2012. PMID: 22970184 Free PMC article.
-
Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies.PLoS One. 2011;6(12):e27310. doi: 10.1371/journal.pone.0027310. Epub 2011 Dec 14. PLoS One. 2011. PMID: 22194782 Free PMC article.
-
Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons.Genome Res. 2011 Mar;21(3):494-504. doi: 10.1101/gr.112730.110. Epub 2011 Jan 6. Genome Res. 2011. PMID: 21212162 Free PMC article.
-
Accurate determination of microbial diversity from 454 pyrosequencing data.Nat Methods. 2009 Sep;6(9):639-41. doi: 10.1038/nmeth.1361. Epub 2009 Aug 9. Nat Methods. 2009. PMID: 19668203
Cited by
-
Conversion of boreal forests to agricultural systems: soil microbial responses along a land-conversion chronosequence.Environ Microbiome. 2024 May 11;19(1):32. doi: 10.1186/s40793-024-00576-3. Environ Microbiome. 2024. PMID: 38734653 Free PMC article.
-
Regulation and analysis of Simiao Yong'an Decoction fermentation by Bacillus subtilis on the diversity of intestinal microbiota in Sprague-Dawley rats.Vet World. 2024 Mar;17(3):712-719. doi: 10.14202/vetworld.2024.712-719. Epub 2024 Mar 25. Vet World. 2024. PMID: 38680148 Free PMC article.
-
Arbuscular Mycorrhizal Fungi and Rhizobium Improve Nutrient Uptake and Microbial Diversity Relative to Dryland Site-Specific Soil Conditions.Microorganisms. 2024 Mar 27;12(4):667. doi: 10.3390/microorganisms12040667. Microorganisms. 2024. PMID: 38674611 Free PMC article.
-
Gut dysbiosis was inevitable, but tolerance was not: temporal responses of the murine microbiota that maintain its capacity for butyrate production correlate with sustained antinociception to chronic voluntary morphine.bioRxiv [Preprint]. 2024 Apr 17:2024.04.15.589671. doi: 10.1101/2024.04.15.589671. bioRxiv. 2024. PMID: 38659831 Free PMC article. Preprint.
-
Nitrogen-fixing bacterial communities differ between perennial agroecosystem crops.FEMS Microbiol Ecol. 2024 May 14;100(6):fiae064. doi: 10.1093/femsec/fiae064. FEMS Microbiol Ecol. 2024. PMID: 38637314 Free PMC article.
References
-
- Margulies M, Egholm M, Altman W, Attiya S, Bader J, Bemben L, Berka J, Braverman M, Chen Y, Chen Z, Dewell S, Du L, Fierro J, Gomes X, Godwin B, He W, Helgesen S, Ho C, Irzyk G, Jando S, Alenquer M, Jarvie T, Jirage K, Kim J, Knight J, Lanza J, Leamon J, Lefkowitz S, Lei M, Li J, Lohman K, Lu H, Makhijani V, McDade K, McKenna M, Myers E, Nickerson E, Nobile J, Plant R, Puc B, Ronan M, Roth G, Sarkis G, Simons J, Simpson J, Srinivasan M, Tartaro K, Tomasz A, Vogt K, Volkmer G, Wang S, Wang Y, Weiner M, Yu P, Begley R, Rothberg J. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases