Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010;11(10):R107.
doi: 10.1186/gb-2010-11-10-r107. Epub 2010 Oct 29.

Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species

Affiliations

Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species

Claudio Donati et al. Genome Biol. 2010.

Erratum in

  • Genome Biol. 2011;12(10):140

Abstract

Background: Streptococcus pneumoniae is one of the most important causes of microbial diseases in humans. The genomes of 44 diverse strains of S. pneumoniae were analyzed and compared with strains of non-pathogenic streptococci of the Mitis group.

Results: Despite evidence of extensive recombination, the S. pneumoniae phylogenetic tree revealed six major lineages. With the exception of serotype 1, the tree correlated poorly with capsular serotype, geographical site of isolation and disease outcome. The distribution of dispensable genes--genes present in more than one strain but not in all strains--was consistent with phylogeny, although horizontal gene transfer events attenuated this correlation in the case of ancient lineages. Homologous recombination, involving short stretches of DNA, was the dominant evolutionary process of the core genome of S. pneumoniae. Genetic exchange occurred both within and across the borders of the species, and S. mitis was the main reservoir of genetic diversity of S. pneumoniae. The pan-genome size of S. pneumoniae increased logarithmically with the number of strains and linearly with the number of polymorphic sites of the sampled genomes, suggesting that acquired genes accumulate proportionately to the age of clones. Most genes associated with pathogenicity were shared by all S. pneumoniae strains, but were also present in S. mitis, S. oralis and S. infantis, indicating that these genes are not sufficient to determine virulence.

Conclusions: Genetic exchange with related species sharing the same ecological niche is the main mechanism of evolution of S. pneumoniae. The open pan-genome guarantees the species a quick and economical response to diverse environments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Maximum likelihood phylogenetic tree obtained using the SNPs of the core genome of the 44 S. pneumoniae genomes. The tree has been rooted using the four S. mitis genomes as outgroup, but note that the branch connecting the S. pneumoniae clade to the S. mitis clade is not to scale. The branches are annotated with their bootstrap support (numbers in italics). Red bars indicate strains belonging to the same sequence type (ST), while blue bars indicate strains belonging to the same clonal complex (CC). The six major lineages are identified by roman numbers I to VI.
Figure 2
Figure 2
Split network obtained using the SNPs of the core genome to depict the impact of recombination on 44 S. pneumoniae strains. In this representation, all the conflicting phylogenetic signals due to each SNP are represented as alternative bipartitions that account for the non-tree-like structure of the inner part of the network. The six lineages highlighted in Figure 1 are also indicated.
Figure 3
Figure 3
The S. pneumoniae pan-genome according to the finite supragenome model. (a) Number of new genes as a function of the number of sequenced genomes. The predicted number of new genes drops sharply to zero when the number of genomes exceeds 50. (b) Number of core genes as a function of the number of sequenced genomes. The number of core genes converges to 1,647 for number of genomes n→∞.
Figure 4
Figure 4
The S. pneumoniae pan-genome according to the power law model. The number of specific genes is plotted as a function of the number (n) of strains sequentially added (see Materials and methods). For each n, points are the values obtained for the different strain combinations; red symbols are the average of these values, and error bars represent standard deviations. The superimposed line is a fit with a decaying power law y = A/nB. The fit parameters are A = 295 ± 117 and B = 1.0 ± 0.15.
Figure 5
Figure 5
Size of the pan-genome versus the number of polymorphic sites. The slope of the fitted line gives the ratio between the rate of acquisition of new genes and the population mutation rate ω/θ = 0.017 ± 0.0017. In the inset, the size of the pan-genome (red dots) and number of polymorphic sites (black dots) as a function of the number of genomes are shown. The lines are least squares fit with a logarithmic law. The error bars represent the standard deviation of the data.
Figure 6
Figure 6
Histogram of the number of genomes sharing variable regions of size greater than 500 bp. The distribution is bimodal, with most of the variable regions either being present in most of the strains, or being present only in a small number of strains.
Figure 7
Figure 7
Histogram of the parsimony score Sp of the presence/absence of the variable regions of size greater than 500 bp, computed for the tree shown in Figure 1. For a given dispensable region, Sp represents the number of acquisition and loss events (Sp = Na + Nl, where Na and Nl are the number of acquisitions and losses, respectively) required for its pattern of presence/absence on the tree in Figure 1. The colors indicate the number of acquisitions Na, while the number of losses can be calculated as Nl = Sp -Na. For simplicity, all segments with Na > 1 have been collapsed in a single bar. Since an acquisition followed by a recombination event can always be explained by multiple acquisitions, events with Na > 1 are possible intra-species recombination events.
Figure 8
Figure 8
Average value of D' plotted as a function of the distance (in base pairs) along the chromosome between the pairs of polymorphic sites. The green line is a least-square fit with the exponential function y = A + Be-x/x0, with A = 0.07103 ± 0.0002, B = 0.201 ± 0.001 and x0 = 896 ± 7.
Figure 9
Figure 9
Maximum likelihood phylogenetic tree obtained using the SNPs of the core genome of the 44 strains of S. pneumoniae, 4 strains of S. mitis and 1 strain each of S. oralis and S. infantis. For clarity the clade containing the S. pneumoniae strains has been collapsed. The numbers on the internal nodes label the last common ancestor of the S. pneumoniae species (1), of the S. mitis species (2), and of the S. pneumoniae-S. mitis complex (3).
Figure 10
Figure 10
Variation of the number of dispensable and core genes upon the addition of new species or strains. Strains are added sequentially, starting with the 44 S. pneumoniae strains followed by the S. mitis (region I), S. oralis and S. infantis (region II), S sanguinis and S. pyogenes (region III) strains.
Figure 11
Figure 11
Presence/absence pattern of the PI-1, PI-2 pilus-encoding islets, psrP, and allelic variants of the core pspA and pspC genes. To show the degree of correlation with the phylogeny of S. pneumoniae, the data are reported on the phylogenetic tree of the S. pneumoniae strains. Only the topology of the tree is shown, branch lengths are not to scale. Red bars mark strains of the same ST, and blue bars mark strains of different STs, but of the same CC. For PI-1, PI2 and psrP, green squares indicate presence while gray squares indicate absence. For pspA and pspC, the numbers indicate the allelic variants defined according to [42,47].

Similar articles

Cited by

References

    1. Kilian M, Poulsen K, Blomqvist T, Havarstein LS, Bek-Thomsen M, Tettelin H, Sorensen UB. Evolution of Streptococcus pneumoniae and its close commensal relatives. PLoS ONE. 2008;3:e2683. doi: 10.1371/journal.pone.0002683. - DOI - PMC - PubMed
    1. Hakenbeck R, Balmelle N, Weber B, Gardes C, Keck W, de Saizieu A. Mosaic genes and mosaic chromosomes: intra- and interspecies genomic variation of Streptococcus pneumoniae. Infect Immun. 2001;69:2477–2486. doi: 10.1128/IAI.69.4.2477-2486.2001. - DOI - PMC - PubMed
    1. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM, Mora M, Scarselli M, Margarity Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N. et al.Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci USA. 2005;102:13950–13955. doi: 10.1073/pnas.0506758102. - DOI - PMC - PubMed
    1. Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE, Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J, Bentley SD, Post JC, Ehrlich GD, Hu FZ. Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol. 2007;189:8186–8195. doi: 10.1128/JB.00690-07. - DOI - PMC - PubMed
    1. Redfield RJ, Findlay WA, Bosse J, Kroll JS, Cameron AD, Nash JH. Evolution of competence and DNA uptake specificity in the Pasteurellaceae. BMC Evol Biol. 2006;6:82. doi: 10.1186/1471-2148-6-82. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-