NCBI prokaryotic genome annotation pipeline
- PMID: 27342282
- PMCID: PMC5001611
- DOI: 10.1093/nar/gkw569
NCBI prokaryotic genome annotation pipeline
Abstract
Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.
Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Figures
![Figure 1.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig1.gif)
![Figure 2.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig2.gif)
![Figure 3.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig3.gif)
![Figure 4.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig4.gif)
![Figure 5.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig5.gif)
![Figure 6.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig6.gif)
![Figure 7.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/5001611/bin/gkw569fig7.gif)
Similar articles
-
RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.Nucleic Acids Res. 2024 Jan 5;52(D1):D762-D769. doi: 10.1093/nar/gkad988. Nucleic Acids Res. 2024. PMID: 37962425 Free PMC article.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database (Oxford). 2020 Jan 1;2020:baaa062. doi: 10.1093/database/baaa062. Database (Oxford). 2020. PMID: 32761142 Free PMC article. Review.
-
RefSeq: an update on prokaryotic genome annotation and curation.Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860. doi: 10.1093/nar/gkx1068. Nucleic Acids Res. 2018. PMID: 29112715 Free PMC article.
-
Comparison of RefSeq protein-coding regions in human and vertebrate genomes.BMC Genomics. 2013 Sep 25;14:654. doi: 10.1186/1471-2164-14-654. BMC Genomics. 2013. PMID: 24063302 Free PMC article.
Cited by
-
Genomic analysis of a halophilic bacterium Nesterenkonia sp. CL21 with ability to produce a diverse group of lignocellulolytic enzymes.Folia Microbiol (Praha). 2024 Jun 6. doi: 10.1007/s12223-024-01178-9. Online ahead of print. Folia Microbiol (Praha). 2024. PMID: 38842626
-
Roseateles caseinilyticus sp. nov. and Roseateles cellulosilyticus sp. nov., isolated from rice paddy field soil.Antonie Van Leeuwenhoek. 2024 Jun 4;117(1):87. doi: 10.1007/s10482-024-01988-4. Antonie Van Leeuwenhoek. 2024. PMID: 38833203
-
Comparative Analyses of Bacteriophage Genomes.Methods Mol Biol. 2024;2802:427-453. doi: 10.1007/978-1-0716-3838-5_14. Methods Mol Biol. 2024. PMID: 38819567
-
Annotation and Comparative Genomics of Prokaryotic Transposable Elements.Methods Mol Biol. 2024;2802:189-213. doi: 10.1007/978-1-0716-3838-5_8. Methods Mol Biol. 2024. PMID: 38819561
-
How to Obtain and Compare Metagenome-Assembled Genomes.Methods Mol Biol. 2024;2802:135-163. doi: 10.1007/978-1-0716-3838-5_6. Methods Mol Biol. 2024. PMID: 38819559
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources