Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 14;9(1):3753.
doi: 10.1038/s41467-018-05936-5.

Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans

Collaborators, Affiliations

Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans

Jedidiah Carlson et al. Nat Commun. .

Abstract

A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Mutation rates vary according to sequence context. a Heatmap of estimated relative mutation rates for all possible for A > G and C > T transition subtypes, up to a 7-mer resolution (high-resolution heatmaps for all possible subtypes are included in Supplementary Fig. 1). The leftmost panels show the relative mutation rates for the 1-mer types, and the subsequent panels to the right show these rates stratified by increasingly broader sequence context. Each 4 × 4 grid delineates a set of 16 subtypes, defined by the upstream sequence (y-axis) and downstream sequence (x-axis) from the central (mutated) nucleotide. Boxed regions indicate motifs previously identified by Aggarwala and Voight as hypermutable (pink) or hypomutable (green), relative to their similar subtypes. b Zoomed-in view showing hypermutable NTT[A > T]AAA subtypes relative to other 7-mer A > T subtypes
Fig. 2
Fig. 2
Discordance between ERV-estimated and common SNV-estimated mutation rates. a Relationship between 7-mer relative mutation rates estimated among BRIDGES ERVs (x-axis) and the 1000G intergenic SNVs (y-axis) on a log-log scale. We note that the strength of this correlation is driven by hypermutable CpG > TpG transitions. b Type-specific 2D-density plots, as situated in the scatterplot of a. The dashed line indicates the expected relationship if no bias is present. c Heatmap showing ratio between the relative mutation rates for each 7-mer mutation subtype. Subtypes with higher rates among the 1000G SNVs (relative to ERV-derived rates) are shaded gold, and subtypes with lower rates in the 1000G SNVs are shaded green. Relative differences are truncated at 2 and 0.5, as only 2.5% of subtypes showed differences beyond this range
Fig. 3
Fig. 3
Distributions of statistically significant mutagenic effects of genomic features. a Effects of seven genomic features where associations with multiple mutation types were detected. For features with bidirectional effects, we separately plotted distributions of positive associations (OR > 1; above dashed line) and negative associations (OR < 1; below dashed line). The number of 7-mer subtypes within each type for which that feature is statistically significant in a positive or negative direction is shown above or below each distribution. Distributions are only shown for types with 10 or more 7-mer subtypes associated in the same direction. *Odds ratios for the three continuously valued features (recombination rate, replication timing, and GC content) indicate the change in odds of mutability per 10% increase in the value of that feature. Effects in CpG islands tend to be stronger than other features, so are shown on a wider scale. b Distributions of significant mutagenic effects for the 5 features only associated with CpG > TpG transitions
Fig. 4
Fig. 4
Comparison of goodness-of-fit for different mutation rate estimation strategies. For each mutation type and each model i, we calculated ΔAICi=AICi-AICmin as a measure of relative model performance, with lower values of ΔAIC indicating better fit to the GoNL/ITMI de novo mutation data. ΔAIC is shown on the horizontal axis on an arcsinh scale. For each mutation type, the best-fitting model thus has a ΔAIC = 0. Models with ΔAIC < 10 (grey-shaded area) are considered comparable to the optimal model, whereas models with ΔAIC > 10 are considered to explain substantially less variation than the optimal model

Similar articles

Cited by

References

    1. Ségurel L, Wyman MJ, Przeworski M. Determinants of mutation rate variation in the human germline. Annu. Rev. Genom. Hum. Genet. 2014;15:47–70. doi: 10.1146/annurev-genom-031714-125740. - DOI - PubMed
    1. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. - DOI - PMC - PubMed
    1. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231. - DOI - PMC - PubMed
    1. Nielsen R, et al. Genomic scans for selective sweeps using SNP data. Genome Res. 2005;15:1566–1575. doi: 10.1101/gr.4252305. - DOI - PMC - PubMed
    1. MacArthur DG, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508:469–476. doi: 10.1038/nature13127. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-