Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 23;11(2):e1005004.
doi: 10.1371/journal.pgen.1005004. eCollection 2015 Feb.

Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps

Affiliations

Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps

Nandita R Garud et al. PLoS Genet. .

Abstract

Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Neutral demographic models.
We considered six neutral demographic models for the North American D. melanogaster population: (A) An admixture model as proposed by Duchen et al. [45]. (B) An admixture model with the European population undergoing a bottleneck. This model was also tested by Duchen et al. [45], but the authors found it to have a poor fit. See S1 Table for parameter estimates and symbol explanations for models A and B. (C) A constant N e = 106 model. (D) A constant N e = 2.7x106 model fit to Watterson’s θ W measured in short intron autosomal polymorphism data from the DGRP data set. (E) A severe short bottleneck model and (F) a shallow long bottleneck model fit to short intron regions in the DGRP data set using DaDi [47]. See S2 Table for parameter estimates for models E and F. All models except for the constant N e = 106 model fit the DGRP short intron data in terms of S and π (S3 Table).
Fig 2
Fig 2. Elevated long-range LD in DGRP.
LD in DGRP data is elevated as compared to any neutral demographic model, especially for long distances. Pairwise LD was calculated in DGRP data for regions of the D. melanogaster genome with ρ ≥ 5×10–7 cM/bp. Neutral demographic simulations were generated with ρ = 5×10–7 cM/bp. Pairwise LD was averaged over 3×104 simulations in each neutral demographic scenario.
Fig 3
Fig 3. Number of adaptive haplotypes in sweeps of varying softness.
The number of origins of adaptive mutations on unique haplotype backgrounds was measured in simulated sweeps of varying softness arising from (A) de novo mutations with θ A values ranging from 10–2 to 102 and (D) SGV with starting frequencies ranging from 10–6 to 10–1. Sweeps were simulated under a constant N e = 106 demographic model with a recombination rate of 5×10–7 cM/bp, selection strength of s = 0.01, partial frequency of the adaptive allele after selection has ceased of PF = 1 and 0.5, and in sample sizes of 145 individuals. 1000 simulations were averaged for each data point. Additionally we show sample haplotype frequency spectra for (B) incomplete and (C) complete sweeps arising from de novo mutations as well as (E) incomplete and (F) complete sweeps arising from SGV. In (G) we show haplotype frequency spectra for a random simulation under the six neutral models considered in this paper. The height of the first bar (light blue) in each frequency spectrum indicates the frequency of the most prevalent haplotype in the sample of 145 individuals, and heights of subsequent colored bars indicate the frequency of the second, third, and so on most frequent haplotypes in a sample. Grey bars indicate singletons. Sweeps generated with a low θ A or low starting partial frequency of the adaptive allele prior to the onset of selection have one frequent haplotype in the sample and look hard. In contrast, sweeps look increasingly soft as the θ A or starting partial frequency of the adaptive allele prior to the onset of selection increase and there are multiple frequent haplotypes in the sample.
Fig 4
Fig 4. Haplotype homozygosity statistics.
Depicted are squares of haplotype frequencies for hard (red) and soft (blue) sweeps. Each edge of the square represents haplotype frequencies ranging from 0 to 1. The top row shows incomplete hard sweeps with one prevalent haplotype present in the population at frequency p 1, and all other haplotypes present as singletons. The bottom row shows incomplete soft sweeps with one primary haplotype with frequency p 1 and a second, less abundant haplotype at frequency p 2, with the remaining haplotypes present as singletons. H1 is the sum of the squares of frequencies of each haplotype in a sample and corresponds to the total colored area. Hard sweeps are expected to have a higher H1 value than soft sweeps. In H12, the first and second most abundant haplotype frequencies in a sample are combined into a single combined haplotype frequency and then homozygosity is recalculated using this revised haplotype frequency distribution. By combining the first and second most abundant haplotypes into a single group, H12 should have more similar power to detect hard and soft sweeps than H1. H2 is the haplotype homozygosity calculated after excluding the most abundant haplotype. H2 is expected to be larger for soft sweeps than for hard sweeps. We ultimately use the ratio H2/H1 to differentiate between hard and soft sweeps as we expect this ratio to have even greater discriminatory power than H2 alone.
Fig 5
Fig 5. H12 values in sweeps of varying softness.
H12 values were measured in simulated sweeps arising from (A) de novo mutations with θ A values ranging from 10–2 to 102 and (B) SGV with starting frequencies ranging from 10–6 to 10–1. Sweeps were simulated under a constant N e = 106 demographic model with a recombination rate of 5×10–7 cM/bp, selection strength of s = 0.01, ending partial frequencies of the adaptive allele after selection has ceased, PF = 1 and 0.5, and in samples of 145 individuals. Each data point was averaged over 1000 simulations. H12 values rapidly decline as the softness of a sweep increases and as the ending partial frequency of the adaptive allele decreases. In (C) and (D), s was varied while keeping PF constant at 0.5 for sweeps from de novo mutations and SGV, respectively. H12 values increase as s increases, though for very weak s we observe a ‘hardening’ of sweeps where fewer adaptive alleles reach establishment frequency. In (E) and (F), the time since selection ended (T E) was varied for incomplete (PF = 0.5) and complete (PF = 1) sweeps respectively while keeping s constant at 0.01. As the age of a sweep increases, sweep signatures decay and H12 loses power.
Fig 6
Fig 6. Power analysis of H12 and iHS under different sweep scenarios.
The plots show ROC curves for H12 and iHS under various sweep scenarios with the specified selection coefficients (s), and the time of the end of selection (T E) in units of 4N e generations. In all scenarios, the ending partial frequency of the adaptive allele was 0.5. False positive rates (FPR) were calculated by counting the number of neutral simulations that were misclassified as sweeps under a specific cutoff. True positive rates (TPR) were calculated by counting the number of simulations correctly identified as sweeps under the same cutoff. Hard and soft sweeps were simulated from de novo mutations with θ A = 0.01 and 10, respectively, under a constant effective population size of N e = 106, a neutral mutation rate of 10–9 bp/gen, and a recombination rate of 5×10–7 cM/bp. A total of 5000 simulations were conducted for each evolutionary scenario. H12 performs well in identifying recent and strong selective sweeps, and is more powerful than iHS in identifying soft sweeps.
Fig 7
Fig 7. Elevated H12 values and long-range LD in DGRP data.
(A) Genome-wide H12 values in DGRP data are elevated as compared to expectations under any neutral demographic model tested. Plotted are H12 values for DGRP data reported in analysis windows with ρ ≥ 5×10-7 cM/bp. Red dots overlaid on the distribution of H12 values for DGRP data correspond to the highest H12 values in outlier peaks of the DGRP scan at the 50 top peaks depicted in Fig. 8A. Note that most of the points in the tail of the H12 values calculated in DGRP data are part of the top 50 peaks as well. Neutral demographic simulations were generated with ρ = 5×10–7 cM/bp. Plotted are the result of approximately 1.3x105 simulations under each neutral demographic model, representing ten times the number of analysis windows in DGRP data.
Fig 8
Fig 8. H12 and iHS scan in DGRP data along the four autosomal arms.
(A) H12 scan. Each data point represents the H12 value calculated over an analysis window of size 400 SNPs centered at the particular genomic position. Grey points indicate regions in the genome with recombination rates lower than 5×10–7 cM/bp we excluded from our analysis. The orange line represents the 1-per-genome FDR line calculated under a neutral demographic model with a constant population size of 106 and a recombination rate of 5×10–7 cM/bp. Red and blue points highlight the top 50 H12 peaks in the DGRP data relative to the 1-per-genome FDR line. Red points indicate the peaks that overlap the top 10% of 100Kb windows with an enrichment of SNPs with |iHS| > 2 in B. We identify three well-characterized cases of selection in D. melanogaster at Ace, CHKov1, and Cyp6g1 as the three highest peaks. (B) iHS scan. Plotted are the number of SNPs in 100Kb windows with |iHS| > 2. Highlighted in red and blue are the top 10%100Kb windows (a total of 95 windows). Red points correspond to those windows that overlap the top 50 peaks in the H12 scan. The positive controls, Ace, CHKov1, and Cyp6g1 are all among the top 10% windows.
Fig 9
Fig 9. Haplotype frequency spectra for the top 10 peaks and extreme outliers under neutral demographic scenarios.
(A) Haplotype frequency spectra for the top 10 peaks in the DGRP scan with H12 values ranging from highest to lowest. For each peak, the frequency spectrum corresponding to the analysis window with the highest H12 value is plotted, which should be the “hardest” part of any given peak. At all peaks there are multiple haplotypes present at high frequency, compatible with signatures of soft sweeps shown in Fig. 5. None of the cases have a single haplotype present at high frequency, as would be expected for a hard sweep. (B) In contrast, the haplotype frequency spectra corresponding to the extreme outliers under the six neutral demographic scenarios have critical H120 values that are significantly lower than the H12 values at the top 10 peaks.
Fig 10
Fig 10. H2/H1 values measured in sweeps of varying softness.
Similar to Fig. 5, H2/H1 values were measured in simulated sweeps arising from (A) de novo mutations with θ A values ranging from 10–2 to 102 and (B) SGV with starting frequencies ranging from 10–6 to 10–1. Sweeps were simulated under a constant N e = 106 demographic model with a recombination rate of 5×10-7 cM/bp, selection strength of s = 0.01, ending partial frequencies of the adaptive allele after selection ceased, PF = 1 and 0.5, and in samples of 145 individuals. Each data point was averaged over 1000 simulations. H2/H1 values rapidly increase with increasing softness of a sweep, but do not depend strongly on PF. In (C) and (D), s was varied while keeping PF constant at 0.5 for sweeps from de novo mutations and SGV, respectively. In the case of sweeps from SGV, H2/H1 values increase as s increases, reflecting a hardening of sweeps with smaller s. In (E) and (F), the time since selection ended (T E) was varied for incomplete (PF = 0.5) and complete (PF = 1) sweeps respectively while keeping s constant at 0.01. As the age of a sweep increases, the sweep signature decays and H2/H1 approaches one.
Fig 11
Fig 11. Range of H12 and H2/H1 values expected for hard and soft sweeps.
Bayes factors (BFs) were calculated for a grid of H12 and H2/H1 values to demonstrate the range of H12 and H2/H1 values expected under hard versus soft sweeps. Each panel shows the results for a specific evolutionary scenario defined by the underlying demographic model, the θ A value used for simulating soft sweeps, and the recombination rate as specified below. BFs were calculated by taking the ratio of the number of soft sweep versus hard sweep simulations that were within a Euclidean distance of 10% of a given pair of H12 and H2/H1 values. Red portions of the grid represent H12 and H2/H1 values that are more easily generated by hard sweeps, while grey portions represent regions of space more easily generated under soft sweeps. Each panel presents the results from one million hard and soft sweep simulations. Hard sweeps were always generated with θ A = 0.01. (A), (B), and (C) compare the range of BFs obtained when soft sweeps are generated under θ A = 5, 10, and 50, keeping the recombination rate (ρ) constant at 5×10–7 cM/bp. (A), (D), and (E) compare the range of BFs obtained when ρ is varied from 5×10–7, 10–7, and 10–6, keeping the θ A constant at 10. (A) and (F) compare the range of BFs generated under the constant N e = 106 and admixture demographic models for θ A = 10 and ρ = 5×10–7 cM/bp. When H12 values are smaller than 0.05, there is little evidence for a sweep, and most BFs are smaller than one. As H12 values become larger, virtually all sweeps with H2/H1 values > 0.05 are soft. The H12 and H2/H1 values for the top 50 peaks in the DGRP scan are overlaid in yellow. All sweep candidates have H12 and H2/H1 values that are more easily generated by soft sweeps than hard sweeps in most scenarios. (A) Soft sweeps simulated with θ A = 10, ρ = 5×10–7 cM/bp, and a constant N e = 106 demographic model. (B) Soft sweeps simulated with θ A = 5, ρ = 5×10–7 cM/bp and a constant N e = 106 demographic model. (C) Soft sweeps simulated with θ A = 50, ρ = 5×10–7 cM/bp, and a constant N e = 106 demographic model. (D) Soft sweeps simulated with θ A = 10, ρ = 10–7 cM/bp, and a constant N e = 106 demographic model. (E) Soft sweeps simulated with θ A = 10, ρ = 10–6 cM/bp, and a constant N e = 10 demographic model. (F) Soft sweeps simulated with θ A = 10, ρ = 5×10–7 cM/bp, and an admixture demographic model.

Similar articles

Cited by

References

    1. Fay JC, Wyckoff GJ, Wu CI (2002) Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415: 1024–1026. - PubMed
    1. Smith NG, Eyre-Walker A (2002) Adaptive protein evolution in Drosophila. Nature 415: 1022–1024. - PubMed
    1. Bierne N, Eyre-Walker A (2004) The genomic rate of adaptive amino acid substitution in Drosophila. Molecular Biology and Evolution 21: 1350–1360. - PubMed
    1. Andolfatto P (2005) Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149–1152. - PubMed
    1. Shapiro JA, Huang W, Zhang C, Hubisz MJ, Lu J, et al. (2007) Adaptive genic evolution in the Drosophila genomes. Proceedings of the National Academy of Sciences of the United States of America 104: 2271–2276. - PMC - PubMed

Publication types

LinkOut - more resources

-