Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 17;4(5):e00592-13.
doi: 10.1128/mBio.00592-13.

Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool

Affiliations

Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool

Qiong Wang et al. mBio. .

Abstract

Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any.

Importance: High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool's utility on nifH gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.

PubMed Disclaimer

Figures

FIG 1
FIG 1
FrameBot performance using reference sequences at various percentages of identity to query sequences. Target protein sequences were chosen from the FunGene site (http://fungene.cme.msu.edu) at various distances from the known defined community sequences. The error rates at 100% identity represent baseline sequencing errors. The test genes are nifH (nitrogenase reductase) (A); bphA (biphenyl dioxygenase alpha subunit) (B); and but (butyryl-CoA: acetate CoA-transferase) (C). Dotted lines represent the overall error rates for FragGeneScan and HMMFrame on the same amplicon data. The error rate from HMMFrame for nifH shown here (0.36%) is calculated from an HMM trained on the group I, II, and III sequences from the augmented Zehr reference set. When trained on the entire augmented Zehr reference set, the error rate rose to 0.67%, and when trained on the group I-only sequences, the error rate was 0.34%.
FIG 2
FIG 2
Relative abundances of NEON reads grouped by nearest matches at the phylum and class levels, averaged for each site (observatory) as indicated by state. The three most dominant genera in alphaproteobacteria are also shown. Other, all phyla with less than 0.5% nearest matches from any site.
FIG 3
FIG 3
Principal component analysis of NEON samples. (A) PC1 and PC2. (B) PC2 and PC3. The input data were standardized using the Wisconsin square root normalization as implemented in R. Ellipses represent 1 standard deviation of the points from the centroid. The soil environmental variables were fitted to the ordination using the envfit method from the labdsv R package. Arrows were plotted for variables with significance of fit ≤ 0.01.

Similar articles

Cited by

References

    1. Stackebrandt E, Ebers J. 2006. Taxonomic parameters revisited: tarnished gold standards. Microbiol. Today 33:153–155
    1. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM. 2009. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37:D141–D145 - PMC - PubMed
    1. Dagan T, Artzy-Randrup Y, Martin W. 2008. Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proc. Natl. Acad. Sci. U. S. A. 105:10039–10044 - PMC - PubMed
    1. Zhang Y, Sun Y. 2011. HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors. BMC Bioinformatics 12:198.10.1186/1471-2105-12-198 - DOI - PMC - PubMed
    1. Rho M, Tang H, Ye Y. 2010. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38:e191.10.1093/nar/gkq747 - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-