Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 5:2:e425.
doi: 10.7717/peerj.425. eCollection 2014.

FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

Affiliations

FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

Genivaldo Gueiros Z Silva et al. PeerJ. .

Abstract

One of the major goals in metagenomics is to identify the organisms present in a microbial community from unannotated shotgun sequencing reads. Taxonomic profiling has valuable applications in biological and medical research, including disease diagnostics. Most currently available approaches do not scale well with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here we introduce FOCUS, an agile composition based approach using non-negative least squares (NNLS) to report the organisms present in metagenomic samples and profile their abundances. FOCUS was tested with simulated and real metagenomes, and the results show that our approach accurately predicts the organisms present in microbial communities. FOCUS was implemented in Python. The source code and web-sever are freely available at http://edwards.sdsu.edu/FOCUS.

Keywords: Metagenomes; Modeling; k-mer.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Workflow of the FOCUS program.
Figure 2
Figure 2. Genera-level taxonomy classification sorted by FOCUS prediction for the metagenome from a diseased human oral cavity using FOCUS, MetaPhlAn, MG-RAST, PhymnBL, RAIphy, Taxy, and FOCUS (mean).
Error bars represent the standard deviation uncertainty in tested metagenome.
Figure 3
Figure 3. Scalability test using different sub-sets of the human oral cavity under disease metagenome using FOCUS, MetaPhlAn, MG-RAST, PhymnBL, RAIphy, Taxy.
Figure 4
Figure 4. Genera-level taxonomy classification sorted by FOCUS prediction for the metagenome from a healthy human oral cavity using FOCUS, MetaPhlAn, MG-RAST, PhymnBL, RAIphy, Taxy, and FOCUS (mean).
Error bars show the standard deviation for the real metagenome.
Figure 5
Figure 5. Genera-level taxonomy classification sorted by FOCUS prediction for the metagenome from a fecal metagenomic sample of a healthy human using FOCUS, MetaPhlAn, MG-RAST, PhymnBL, RAIphy, Taxy, and FOCUS (mean).
Error bars show the standard deviation for the real metagenome.
Figure 6
Figure 6. Heat-map representing the distance between the FOCUS and MetaPhlAn results for 300 metagenomes from the Human Microbiome Project across 15 body sites.
The distance was computed using the Euclidean distance between the results of both tools.
Figure 7
Figure 7. Genera-level taxonomy classification for the SimShort dataset using FOCUS, PhymnBL, RAIphy, and FOCUS (mean).
Figure 8
Figure 8. Class-level taxonomy classification for the SimHC dataset using FOCUS, PhymnBL, RAIphy, and FOCUS (mean).
Figure 9
Figure 9. Genera-level taxonomy classification for the SimHC dataset using FOCUS, MetaPhlAn, MG-RAST, PhymnBL, RAIphy, Taxy, GASiC, and FOCUS (mean).
Figure 10
Figure 10. Numerical evaluation between the real and predicted abundance for the synthetic metagenomes computed by the Euclidean distance between the real and the predicted values.

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Aziz RK, Devoid S, Disz T, Edwards RA, Henry CS, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Stevens RL, Vonstein V, Xia F. SEED servers: high-performance access to the seed genomes, annotations, and metabolic models. PLoS ONE. 2012;7:e425. doi: 10.1371/journal.pone.0048053. - DOI - PMC - PubMed
    1. Belda-Ferre P, Alcaraz LD, Cabrera-Rubio R, Romero H, Simón-Soro A, Pignatelli M, Mira A. The oral metagenome in health and disease. ISME Journal. 2012;6:46–56. doi: 10.1038/ismej.2011.85. - DOI - PMC - PubMed
    1. Brady A, Salzberg S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nature Methods. 2011;8:367–367. doi: 10.1038/nmeth0511-367. - DOI - PMC - PubMed
    1. Carr R, Shen-Orr SS, Borenstein E. Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution. PLoS Computer Biology. 2013;9:e425. doi: 10.1371/journal.pcbi.1003292. - DOI - PMC - PubMed

Grants and funding

GGZS and DAC were supported by NSF Grants (DEB-1046413 and CNS-1305112 to RAE). BED was supported by NWO Veni (016.111.075), CAPES/BRASIL and the Dutch Virgo Consortium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

-