Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Sep 16:7:651.
doi: 10.1186/1756-0500-7-651.

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT

Affiliations
Comparative Study

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT

Gabriel Moreno-Hagelsieb et al. BMC Res Notes. .

Abstract

Background: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI's BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary.

Findings: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT.

Conclusions: Despite faster programs miss sequence matches otherwise found by NCBI's BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Ten-genomes experiment. (A) UBLAST, LAST and BLAT ran in less than a hundredth of the time as NCBI’s BLAST, with LAST running the fastest; (B) All these programs matched fewer genome proteins to proteins in the SwissProt database than BLAST+, with BLAT showing the lowest numbers and the highest variation; (C) UBLAST produced the most similar overannotation estimates to those produced by BLAST+, while BLAT produced the most dissimilar ones. Filtering the SwissProt database at different identity thresholds did not have much of an effect in speed or in overannotation estimates.
Figure 2
Figure 2
Effect of overannotation. (A) As in the small experiment, the faster programs tended to match a lower proportion of genome proteins to the SwissProt database than BLAST+, with BLAT missing the highest proportion of matches. (B) The difference between the overannotation estimates produced by the faster programs compared to those produced by BLAST+ tended to be small and increase only modestly with the original overannotation estimate.
Figure 3
Figure 3
Effect of matches to SwissProt. The difference in overannotation estimates, as compared against those produced with BLAST+, seems more pronounced for genomes with fewer proteins finding matches in the SwissProt database. This effect is more noticeable with BLAT.

Similar articles

Cited by

References

    1. Ussery DW, Hallin PF. Genome update: annotation quality in sequenced microbial genomes. Microbiology. 2004;150(Pt 7):2015–2017. doi: 10.1099/mic.0.27338-0. - DOI - PubMed
    1. Moreno-Hagelsieb G. Operons across prokaryotes: genomic analyses and predictions 300+ genomes later. Curr Genomics. 2006;7:163–170. doi: 10.2174/138920206777780247. - DOI
    1. Moreno-Hagelsieb G, Janga SC. Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles. Proteins. 2008;70(2):344–352. doi: 10.1002/prot.21564. - DOI - PubMed
    1. Ely B, Scott LE. Correction of the Caulobacter crescentus NA1000 genome annotation. PLoS ONE. 2014;9(3):e91668. doi: 10.1371/journal.pone.0091668. - DOI - PMC - PubMed
    1. Samayoa J, Yildiz FH, Karplus K. Identification of prokaryotic small proteins using a comparative genomic approach. Bioinformatics (Oxford, England) 2011;27(13):1765–1771. doi: 10.1093/bioinformatics/btr275. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-