Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Nov;15(11):1576-83.
doi: 10.1101/gr.3709305.

Calibrating a coalescent simulation of human genome sequence variation

Affiliations
Comparative Study

Calibrating a coalescent simulation of human genome sequence variation

Stephen F Schaffner et al. Genome Res. 2005 Nov.

Abstract

Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Fit of standard neutral model to empirical data. Comparison of simulated data under standard neutral model to empirical data on autosomes. Error bars represent one standard error. (A,B) Linkage disequilibrium (measured by r2 and D′) as a function of distance. (Solid line) Standard neutral model; (squares) West African data; (triangles) European data. (A) r2 as a function of physical distance. (B) D′ as a function of genetic distance. (C,D) Genetic distance (FST) and allele frequency spectrum for data and standard neutral model. (White) Data; (gray) model. (C) FST between European and West African populations. (D) European allele frequency spectrum.
Figure 2.
Figure 2.
Comparison of best-fit model with empirical data, autosomes. Error bars represent one standard error. (A,B,C) Allele frequency spectrum. (White) Data; (black) model. (A) West African. (B) East Asian. (C) European sample. (D,E,F) Fraction of alleles that are ancestral/chimpanzee, binned by allele frequency. (White) Data; (black) model. (D) West African. (E) East Asian. (F) European. (G,H,I) Linkage disequilibrium (r2) versus physical distance. (Points) Data; (line) model. (G) West African. (H) East Asian. (I) European. (J,K,L) Fraction of marker pairs with perfect LD (D′ = 1.0) versus genetic distance. (J) West African. (K) East Asian. (L) European. (M) Genetic distance (FST). (White) Data; (black) model.
Figure 3.
Figure 3.
Demographic model. N1: ancestral population size. (N2) African population size. (N3) non-African population size. (Texp) Time of ancestral population expansion (if any). Bottlenecks are indicated by constrictions. (Not shown: recurring migration between African and European populations, and between Asian and African populations.)
Figure 4.
Figure 4.
Comparison of best fit-model with empirical data, X-chromosome. Error bars represent one standard error. (A,B,C) Allele frequency spectrum. (White) Data; (black) best-fit model; (gray) standard neutral model. (A) West African. (B) East Asian. (C) European sample. (D,E,F) Fraction of alleles that are ancestral/chimpanzee, binned by allele frequency. (White) Data; (black) best-fit model; (gray) standard neutral model. (D) West African. (E) East Asian. (F) European. (G,H) Linkage disequilibrium (r2) versus physical distance. (Points) Data; (solid line) best-fit model; (dashed line) standard neutral model. (G) West African. (H) European. (East Asian omitted because of poor statistics.) (I,J) Fraction of marker pairs with perfect LD (D′ = 1.0) versus genetic distance. (I) West African. (J) European. (Points) Data; (solid line) best-fit model; (dashed line) standard neutral model. (East Asian omitted because of poor statistics.) (K) Genetic distance (FST). (White) Data; (black) best-fit model; (gray) standard neutral model. (L,M) Fraction of sequence in haplotype blocks of different sizes. (White) Data; (black) best-fit model; (gray) standard neutral model. (L) West African. (M) non-African (European + East Asian).
Figure 5.
Figure 5.
Comparison of best-fit model with data: 52 gene regions. Here 40 genes are genotyped in three populations; long genes were subdivided into smaller regions. The mean FST and heterozygosity are shown (black), and compared to the same measures for simulated data (gray); simulated regions were 120 kb long with 30 ± 10 SNPs per region. (A) Yoruba sample. (B) Chinese sample. (C) CEPH sample.

Similar articles

Cited by

References

    1. Akey, J.M., Eberle, M.A., Rieder, M.J., Carlson, C.S., Shriver, M.D., Nickerson, D.A., and Kruglyak, L. 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2: e286. - PMC - PubMed
    1. Anderson, E.C. and Slatkin, M. 2004. Population-genetic basis of haplotype blocks in the 5q31 region. Am. J. Hum. Genet. 74: 40–49. - PMC - PubMed
    1. Ardlie, K.G., Kruglyak, L., and Seielstad, M. 2002. Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3: 299–309. - PubMed
    1. Collins, F.S., Brooks, L.D., and Chakravarti, A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8: 1229–1231. - PubMed
    1. Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., Nickerson, D.A., and Stephens, M. 2004. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36: 700–706. - PubMed

Web site references

    1. http://www.broad.mit.edu/∼sfs/cosi; authors' Web site.

Publication types

-