Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 14:6:e6233.
doi: 10.7717/peerj.6233. eCollection 2019.

Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments

Affiliations

Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments

Hugo R Barajas et al. PeerJ. .

Abstract

Background: The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis.

Methods: Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms.

Results: The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species.

Discussion: Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.

Keywords: Comparative genomics; Core genome; Genomic similarity score; Streptococcus.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Core genome variability amongst different streptococci clades.
Each core protein, for each streptococci species, was aligned against the reference S. pyogenes. The pairwise identity of each core protein, calculated by global sequence alignment, was sorted and plotted. The dendrogram shows a summary of genomic similarity score (GSS) distances. The identity variability highlights the species diversity even for the conserved coding genes. S. pyogenes (spy), S. dysgalactiae (sdy), S. agalactiae(sag), S. parauberis (spu), S. iniae (sin), S.uberis (sub), S. equi subsp. zooepidemicus (seq_z), S. equi ssp., equi (seq_z), S. suis (ssu), S. thermophilus (sth), S. salivarius (ssa), S. mutans (smu), S. intermedius (sint), S. oligofermentans (sol), S. sanguinis (ssan), S. gordonii (sgo), S. parasanguinis (sps), S. pasteurianus (spas), S. oralis (sor), S. pneumoniae (spn), S. pseudopneumoniae (sppn), S. mitis (smi), S. gallolyticus (sga), S. macedonicus (sma), S. lutetiensis (slu), S. infantarius (sinf), B. subtilis (bs), and B. licheniformis (bl).
Figure 2
Figure 2. Genomic similarity score outperforms 16S rRNA strain resolution and solves genus-wide comparisons when compared to ANI.
(A) Neighbor-joining 16S rRNA reconstruction, with 1,000 bootstraps. (B) Average nucleotide identity dendrogram. (C) Genomic similarity score (GSS) dendrogram. Some of the paraphyletic groups of streptococci are classified because clinical or practical uses (Kilian, 2007) are pyogenic, suis, salivarius, mutans, and mitis. The suis clade is rearranged closer to the mitis group, and resolution at the species level is achieved in the GSS dendrogram compared to single marker gene and ANI dendrograms. S. pyogenes (spy), S. dysgalactiae (sdy), S. agalactiae(sag), S. parauberis (spu), S. iniae (sin), S.uberis (sub), S. equi subsp. zooepidemicus (seq_z), S. equi ssp., equi (seq_z), S. suis (ssu), S. thermophilus (sth), S. salivarius (ssa), S. mutans (smu), S. intermedius (sint), S. oligofermentans (sol), S. sanguinis (ssan), S. gordonii (sgo), S. parasanguinis (sps), S. pasteurianus (spas), S. oralis (sor), S. pneumoniae (spn), S. pseudopneumoniae (sppn), S. mitis (smi), S. gallolyticus (sga), S. macedonicus (sma), S. lutetiensis (slu), S. infantarius (sinf), B. subtilis (bs), and B. licheniformis (bl).

Similar articles

Cited by

References

    1. Alcaraz LD, Belda-Ferre P, Cabrera-Rubio R, Romero H, Simón-Soro A, Pignatelli M, Mira A. Identifying a healthy oral microbiome through metagenomics. Clinical Microbiology and Infection. 2012;18(Suppl 4):54–57. doi: 10.1111/j.1469-0691.2012.03857.x. - DOI - PubMed
    1. Alcaraz LD, Moreno-Hagelsieb G, Eguiarte LE, Souza V, Herrera-Estrella L, Olmedo G. Understanding the evolutionary relationships and major traits of Bacillus through comparative genomics. BMC Genomics. 2010;11(1):332. doi: 10.1186/1471-2164-11-332. - DOI - PMC - PubMed
    1. Battistuzzi FU, Feijao A, Hedges SB. A genomic timescale of prokaryote evolution: insights into the origin of methanogenesis, phototrophy, and the colonization of land. BMC Evolutionary Biology. 2004;4(1):44. doi: 10.1186/1471-2148-4-44. - DOI - PMC - PubMed
    1. Belda-Ferre P, Alcaraz LD, Cabrera-Rubio R, Romero H, Simón-Soro A, Pignatelli M, Mira A. The oral metagenome in health and disease. ISME Journal. 2012;6(1):46–56. doi: 10.1038/ismej.2011.85. - DOI - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST: architecture and applications. BMC Bioinformatics. 2009;10(1):421. doi: 10.1186/1471-2105-10-421. - DOI - PMC - PubMed

Grants and funding

Hugo R. Barajas, Miguel F. Romero, and Shamayim Martinez-Sanchez had graduate student fellowships from CONACyT. Luis David Alcaraz received funding from DGAPA-PAPIIT-UNAM TA2001171 and SEP-CONACyT Ciencia Básica 237387. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

-