Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 3;77(10):2113-2127.
doi: 10.1093/evolut/qpad120.

Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models

Affiliations

Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models

Vivak Soni et al. Evolution. .

Abstract

The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.

Keywords: background selection; demography; distribution of fitness effects; genetic hitchhiking; genome scans; selective sweeps.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Nucleotide diversity over a physical distance for different values of 2Nes and τ, estimated using Equation 13 from Kim and Stephan (2000).
Figure 2.
Figure 2.
Example patterns around a single selective sweep for different values of 2Nes in an equilibrium population with fixed mutation and recombination rates. In each case, 2Nes values go from lowest (top panel) to highest: 100; 1,000; 10,000. The red data point is the position of the beneficial fixation. (a) Inference results from SweepFinder2. Blue data points are CLR values inferred for each window. The red dashed line is the threshold for sweep detection, determined by the highest CLR value across 200 simulated replicates in which no beneficial mutations are occurring. Inference was performed at each SNP (see Methods Section for further details). (b) Sweep inference with the H12 statistic. Blue data points are H12 values estimated for each window. As with SweepFinder2, the red dashed line is the threshold for sweep detection. Inference was performed across 1 kb windows for each SNP, with the SNP at the center of each window. For the underlying summary statistics (Tajima’s D; π; and r2), see Supplementary Figure S2.
Figure 3.
Figure 3.
Sweep inference and summary statistics for a single simulation replicate of recurrent selective sweeps for different values of 2Nes in an equilibrium population with fixed mutation and recombination rates. In each case, 2Nes values go from lowest (top panel) to highest: 100; 1,000; 10,000. For all panels, red data points are the positions of beneficial fixations within the previous 0.5N generations prior to sampling. (a) Inference results from SweepFinder2. Blue data points are CLR values inferred for each window. The red dashed line is the threshold for sweep detection, determined by the highest CLR value across 200 simulated replicates in which no beneficial mutations are modeled. Inference was performed at each SNP (see Methods Section for further details). (b) Sweep inference with the H12 statistic. Blue data points are H12 values estimated for each window. As with SweepFinder2, the red dashed line is the threshold for sweep detection. Inference was performed across 1 kb windows for each SNP, with the SNP at the center of each window. (C–E) Summary statistics across the simulated region.
Figure 4.
Figure 4.
ROC curves, showing the change in true-positive rate (TPR) as the false-positive rate (FPR) increases, for sweep inference in an equilibrium population with fixed mutation and recombination rates across 200 simulated replicates, for 10 kb windows. (a) ROC curves for SweepFinder2 when using a null background SFS (i.e., the background SFS is generated across a simulation run in which all else is modeled identically, except that no beneficial mutations occur). (b) ROC curves for SweepFinder2 when using an empirical background SFS (i.e., the background SFS is the empirical data itself). (c) ROC curves for H12.
Figure 5.
Figure 5.
ROC curves, showing the change in true-positive rate (TPR) as the false-positive rate (FPR) increases, for sweep inference in populations with differing demographic histories, across 200 replicates each, for windows of size 10 kb. The panels on the left are for inference with SweepFinder2 and on the right with the H12 statistic. Where population size change occurs, it is instantaneous, occurring N generations prior to sampling.
Figure 6.
Figure 6.
ROC curves comparing sweep inference for fixed and variable recombination and mutation rates under equilibrium demographic conditions, across 200 simulation replicates using SweepFinder2 using the null background SFS (left) and H12 (right), for 10 kb windows. Dashed lines indicate variable rates, while filled lines indicate fixed rates. For variable rates, each 10 kb region has a rate drawn from a distribution such that each simulated replicate has the same mean rate as the fixed rate comparison (see Methods for further details).

Update of

Similar articles

Cited by

References

    1. Adams, M. D., Celniker, S. E., Holt, R. A., Evans, C. A., Gocayne, J. D., Amanatides, P. G., Scherer, S. E., Li, P. W., Hoskins, R. A., Galle, R. F., George, R. A., Lewis, S. E., Richards, S., Ashburner, M., Henderson, S. N., Sutton, G. G., Wortman, J. R., Yandell, M. D., Zhang, Q., … Chen, L. X. (2000). The genome sequence of Drosophila melanogaster. Science, 287(5461), 2185–2195. 10.1126/science.287.5461.2185 - DOI - PubMed
    1. Akey, J. M. (2009). Constructing genomic maps of positive selection in humans: Where do we go from here? Genome Research, 19(5), 711–722. 10.1101/gr.086652.108 - DOI - PMC - PubMed
    1. Akey, J. M., Zhang, G., Zhang, K., Jin, L., & Shriver, M. D. (2002). Interrogating a high-density SNP map for signatures of natural selection. Genome Research, 12(12), 1805–1814. 10.1101/gr.631202 - DOI - PMC - PubMed
    1. Andolfatto, P. (2005). Adaptive evolution of non-coding DNA in Drosophila. Nature, 437(7062), 1149–1152. 10.1038/nature04107 - DOI - PubMed
    1. Baer, C. F., Miyamoto, M. M., & Denver, D. R. (2007). Mutation rate variation in multicellular eukaryotes: Causes and consequences. Nature Reviews Genetics, 8(8), 619–631. 10.1038/nrg2158 - DOI - PubMed

Publication types

-