Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;42(Database issue):D553-9.
doi: 10.1093/nar/gkt1274. Epub 2013 Dec 6.

RefSeq microbial genomes database: new representation and annotation strategy

Affiliations

RefSeq microbial genomes database: new representation and annotation strategy

Tatiana Tatusova et al. Nucleic Acids Res. 2014 Jan.

Erratum in

Abstract

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of bacterial species by phyla. Top four phyla with >100 species sequenced: Proteobacteria–1828, Firmicutes–978, Actinobacteria–747 and Bacteroidetes/Chlorobi group–408.
Figure 2.
Figure 2.
M. tuberculosis RGTB327 alignments to the reference genome of M. tuberculosis H37Rv. Vertical red lines show sequence mismatches caused by indels, which result in a large number (∼900) of frameshifted genes. These indels are likely caused by sequencing or assembly errors.
Figure 3.
Figure 3.
(A) Protein sequence in NP_414555 record annotated on the reference genome of E. coli str. K-12 substr. MG1655 is represented by WP_000516135. (B) This sequence has been annotated on 1285 genomes from 16 Escherichia and Shigella species.

Similar articles

Cited by

References

    1. Loman NJ, Constantinidou C, Chan JZ, Halachev M, Sergeant M, Penn CW, Robinson ER, Pallen MJ. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat. Rev. Microbiol. 2012;10:599–606. - PubMed
    1. Koser CU, Ellington MJ, Cartwright EJ, Gillespie SH, Brown NM, Farrington M, Holden MT, Dougan G, Bentley SD, Parkhill J, et al. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog. 2012;8:e1002824. - PMC - PubMed
    1. Timme RE, Allard MW, Luo Y, Strain E, Pettengill J, Wang C, Li C, Keys CE, Zheng J, Stones R, et al. Draft genome sequences of 21 Salmonella enterica serovar enteritidis strains. J. Bacteriol. 2012;194:5994–5995. - PMC - PubMed
    1. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130–D135. - PMC - PubMed
    1. Nakamura Y, Cochrane G, Karsch-Mizrachi I. The International nucleotide sequence database collaboration. Nucleic Acids Res. 2013;41:D21–D24. - PMC - PubMed

Publication types

Substances

-