Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;143(3):211-232.
doi: 10.1007/s00439-024-02644-7. Epub 2024 Feb 23.

Novel genotype-phenotype correlations, differential cerebellar allele-specific methylation, and a common origin of the (ATTTC)n insertion in spinocerebellar ataxia type 37

Affiliations

Novel genotype-phenotype correlations, differential cerebellar allele-specific methylation, and a common origin of the (ATTTC)n insertion in spinocerebellar ataxia type 37

Marina Sanchez-Flores et al. Hum Genet. 2024 Mar.

Abstract

Spinocerebellar ataxia subtype 37 (SCA37) is a rare disease originally identified in ataxia patients from the Iberian Peninsula with a pure cerebellar syndrome. SCA37 patients carry a pathogenic intronic (ATTTC)n repeat insertion flanked by two polymorphic (ATTTT)n repeats in the Disabled-1 (DAB1) gene leading to cerebellar dysregulation. Herein, we determine the precise configuration of the pathogenic 5'(ATTTT)n-(ATTTC)n-3'(ATTTT)n SCA37 alleles by CRISPR-Cas9 and long-read nanopore sequencing, reveal their epigenomic signatures in SCA37 lymphocytes, fibroblasts, and cerebellar samples, and establish new molecular and clinical correlations. The 5'(ATTTT)n-(ATTTC)n-3'(ATTTT)n pathogenic allele configurations revealed repeat instability and differential methylation signatures. Disease age of onset negatively correlated with the (ATTTC)n, and positively correlated with the 3'(ATTTT)n. Geographic origin and gender significantly correlated with age of onset. Furthermore, significant predictive regression models were obtained by machine learning for age of onset and disease evolution by considering gender, the (ATTTC)n, the 3'(ATTTT)n, and seven CpG positions differentially methylated in SCA37 cerebellum. A common 964-kb genomic region spanning the (ATTTC)n insertion was identified in all SCA37 patients analysed from Portugal and Spain, evidencing a common origin of the SCA37 mutation in the Iberian Peninsula originating 859 years ago (95% CI 647-1378). In conclusion, we demonstrate an accurate determination of the size and configuration of the regulatory 5'(ATTTT)n-(ATTTC)n-3'(ATTTT)n repeat tract, avoiding PCR bias amplification using CRISPR/Cas9-enrichment and nanopore long-read sequencing, resulting relevant for accurate genetic diagnosis of SCA37. Moreover, we determine novel significant genotype-phenotype correlations in SCA37 and identify differential cerebellar allele-specific methylation signatures that may underlie DAB1 pathogenic dysregulation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Long-read nanopore sequencing of the genomic region including the DAB1 ATTTT/ATTTC repeat tract enriched by CRISPR/Cas9. a The integrative genomics viewer (IGV) showing the entire region of interest within DAB1 intron 11 enriched by CRISPR/Cas9 successfully captured from SPA001 SCA37 patient’s cerebellum. Long sequenced reads were phased using WhatsHap for haplotype reconstruction for wild-type (top) and expanded (bottom) alleles. Read counts and repeat size for WT-(ATTTT)n and SCA37-(ATTTC)n from SPA001 b and SPA002 c blood lymphocytes and cerebellar samples. d No allele dropout was observed in read counts of expanded alleles compared to normal alleles (two sample t test; p value = 0.63, n = 14). Dot point indicates outlier WT read counts for the HMW extracted SPC0001 fibroblasts. Waterfall plots generated using Guppy Sup base calling mode showed pure (ATTTT)n and (ATTTC)n repeat tracts for SPA001 and SPA002 PBLs (e and g) and cerebella f and h. No interruptions were identified in any 5ʹ(ATTTT)n–(ATTTC)n–3ʹ(ATTTT)n repeats tract (Suppl. Figure 5). Relevantly, SCA37 alleles sequenced by long reads showed an ATTTTTTT sequence preceding the 5ʹ(ATTTT)n in SCA37 alleles in contrast to the ATTTATTT sequence preceding the WT-(ATTTT)n alleles
Fig. 2
Fig. 2
Tissue-specific length, variability, and instability index for the WT-(ATTTT)n and the inserted SCA37-(ATTTC)n repeated tracts. a Higher repeat variability was observed for the SCA37-(ATTTC)n repeat compared to the WT-(ATTTT)n. Slightly higher instability index biased towards contraction was observed for the SCA37-(ATTTC)n repeat compared to the WT-(ATTTT)n in blood samples b, d. b The (ATTTC)n instability index for cerebellum (average of instability index =  + 3.21) revealed an expansion-biased tissue-specific compared to blood samples (average of instability index =− 0.33). In fibroblasts, the instability index showed contraction for both the pathogenic (ATTTC)n c (instability index = 1.62) and the WT-(ATTTT)n (instability index =− 1.4) repeat tract e, compared to blood (average of instability index =− 0.33 for (ATTTC)n; average of instability index =  + 0.02 for WT-(ATTT)n; Suppl. Table 8 and Suppl. Figure 10)
Fig. 3
Fig. 3
Repeat instability of the 5ʹ(ATTTT)n upstream and 3ʹ(ATTTT)n downstream of the inserted (ATTTC)n repeat in the mutant pathogenic allele. a The 3ʹ(ATTTT)n located downstream of the (ATTTC)n pentanucleotide repeat insertion (right) showed higher repeat variability (average SD ± 3.9) compared to the upstream 5ʹ(ATTTT)n (average SD ± 2.7) (left) in the SCA37 allele. b Remarkably, the 3ʹ(ATTTT)n presented the highest repeat variability between Spanish (75–91), Portuguese (58–90), and German (408–420) cases. c The instability index in both the 5ʹ(ATTTT)n and 3ʹ(ATTTT)n flanking the (ATTTC)n in the pathogenic alleles in cerebellum revealed a contraction-biased tissue-specific compared to blood samples (cerebellar 5’(ATTTT)n instability index =− 1.59; blood 5ʹ(ATTTT)n instability index = 0.11; cerebellar 3’(ATTTT)n instability index =− 2.5; blood 3ʹ(ATTTT)n instability index =  + 0.33). d In fibroblasts, the instability index also showed contraction biased for both 5ʹ(ATTTT)n and 3ʹ(ATTTT)n repeated tracts in pathogenic alleles compared to blood (fibroblasts 5’(ATTTT)n instability index =− 3.07; fibroblasts 3ʹ—(ATTTT) instability index =− 1.53) (Suppl. Table 8 and Suppl. Figure 10)
Fig. 4
Fig. 4
CpG methylation signatures of the SCA37 region within DAB1 on 1p32 in cerebellum, peripheral blood cells and fibroblasts. For methylation studies, SCA37 and wild-type alleles were classified using WhatsHap software. a Similar allele-specific methylation signatures were present in two SCA37 cerebellar samples (SPA001-CB and SPA002-CB) compared to blood b and fibroblasts c. d Three global differentially methylated regions (DMR) were identified in SCA37 pathogenic alleles compared to WT-(ATTTT)n alleles. An hypomethylated region (R1) upstream of the 5’(ATTTT)n–(ATTTC)n–3ʹ(ATTTT)n SCA37 tract showed a 8.84% mean reduction of methylation frequencies ranging from “chr1:57367323” to “chr1:57371263”. In contrast, two regions, R2 and R3 downstream of the SCA37 repetitive tract, were found differentially hypermethylated compared to WT alleles, increasing 5.51% (R2; ranging from “chr1:57364049” to “chr1:57367009”) and 5.34% (R3; ranging from “chr1:57361325” to “chr1:57363731”) their global methylation frequencies. Blood samples did not reveal significant differences in methylation frequencies between WT and SCA37 alleles b. c Fibroblasts showed a slightly global increase of methylation frequencies in the SCA37 alleles compared to WT alleles with an increase of 6.44% in R1 (ranging from “chr1:57361325” to “chr1:57363731”) and 13.44% in R2 (ranging from “chr1:57364049” to “chr1:57367009”). The y-axis represents methylation frequencies shown in percentage and the genomic positions represented on the x-axis indicates DMR coordinates. Blue and orange lines represent smoothed methylation frequencies for wild-type and SCA37 disease alleles, respectively. The position of the 5ʹ(ATTTT)n–(ATTTC)n–3ʹ(ATTTT)n SCA37 tract is represented with a red vertical dotted line
Fig. 5
Fig. 5
Novel genotype–phenotype associations and predictive linear regression models established in SCA37. a Significant Pearson correlation coefficients were obtained using “Age of onset” as dependent variable and the “SCA37-(ATTTC)n” (n = 56), “SCA37-3ʹ(ATTTT)n” (n = 22), “Country” (n = 56) and “Gender” (n = 56) as independent variables. b and c Scatterplot based on the regression model of the dependent variable “Age of onset” and the independent variables “Gender” and “(ATTTC)n”. b an additional linear regression model (red line) with confidence interval of 95% (red shadow) plotted over the observed (“actual”) values of the dependent variable (z-scores) and their predicted (“fitted”) values. c the residuals of the regression model are plotted against the predicted values of the dependent variable. d A bee swarm plot summarizing the distribution of SHAP values for each variable of the regression model is shown. Male gender and shorter “(ATTTC)n” have higher impact value in the age of onset prediction model (pink blots) than female gender and longer “(ATTTC)n” (blue dots). Rank of the selected models of age of onset e and disease evolution f using the dataset reporting the methylated CpG regions with the five most relevant models (lowest BIC and highest R2) indicated. g and h The best model for age of onset (R2 = 0.998, n = 8; p value < 0.0007) was found to include the variable 3ʹ(ATTTT)n and the CpG regions “chr1:57361330” in DMR3 and “chr1:57360976” in R4. i and j The best model of disease evolution (R2 = 0.999, n = 7; p value < 0.0008) was obtained with the CpG regions “chr1:57367557” in DMR1, “chr1:57362080” in DMR3, and “chr1:57360845” in R4. k and l The best prediction model for “Age of onset” considering the “(ATTTC)n”, includes two CpG regions “chr1: 57367004” and “chr1: 57365681”, both located in DMR2 (R2 = 0.932, n = 8). m and n A significant model of “disease evolution” associated with the independent variable “(ATTTC)n”, and the combination of two different CpG regions “chr1:57370049” and “chr1:57368270”, both located in DMR1 (R2 = 0.926, n = 7)
Fig. 6
Fig. 6
Importance value and impact of methylated CpGs regions in predictive models. SHAP values and relative importance of the 15 most recurrent methylated CpG regions in regression models of “Age of onset” a and “Disease evolution” b. c SHAP values and relative importance of a model of age of onset using 3ʹ(ATTTT)n and CpG regions “chr1:57365870” in DMR2 and “chr1:57361330” in DMR3 (See Fig. 5e)
Fig. 7
Fig. 7
Distinctive SCA37 haplotypes and common origin of the SCA37 mutation. a Haplotype analysis revealed the presence of a 964 kb shared region (red box) in all Iberian SCA37 patients segregating with the causative SCA37 mutation, revealing a common origin of the SCA37 mutation in the Iberian Peninsula which originated approximately 859 years ago (95% CI: 647–1378). Red bars represent informative SNPs positions. b DMLE + 2.3. analysis showing a posterior probability density of the mutation age for population grown rate (r = 0.17) and the proportion of sampled disease-bearing chromosomes (f = 0.14) considering an intergenerational time interval of 25 years. The estimated median age identified is 34 generations. Green bars show the 95% confidence interval between 26 and 55 generations. The frequency at which each number of generations was resulted from the iterations is shown on the y-axis. Outcome considering the total or the southwest of the Iberian Peninsula population with either 20 or 25 years/generation are included in Fig. S10 of the Additional file 2. c Haplotype network showing the phylogenetic relationship among five different SCA37 pathogenic alleles. Circle size is proportional to the number of chromosomes; line length is proportional to the genetic distance among haplotypes. d Phylogenetic reconstruction based on genetic distances (DA) between the five haplotypes. The numbers next to nodes, represent a measure of support for the node. The line bar with 0.01 value indicates the number of genetic changes (nucleotide substitutions per site)

Similar articles

Cited by

References

    1. Alazard R. Identification of the “NORE” (N-Oct-3 responsive element), a novel structural motif and composite element. Nucleic Acids Res. 2005;33:1513–1523. doi: 10.1093/nar/gki284. - DOI - PMC - PubMed
    1. Bae S, Park J, Kim J-S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics. 2014;30:1473–1475. doi: 10.1093/bioinformatics/btu048. - DOI - PMC - PubMed
    1. Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16:37–48. doi: 10.1093/oxfordjournals.molbev.a026036. - DOI - PubMed
    1. Battaglia S, Dong K, Wu J, Chen Z, Najm FJ, Zhang Y, Moore MM, Hecht V, Shoresh N, Bernstein BE. Long-range phasing of dynamic, tissue-specific and allele-specific regulatory elements. Nat Genet. 2022;54:1504–1513. doi: 10.1038/s41588-022-01188-8. - DOI - PMC - PubMed
    1. Bettencourt C, Santos C, Kay T, Vasconcelos J, Lima M. Analysis of segregation patterns in Machado-Joseph disease pedigrees. J Hum Genet. 2008;53:920–923. doi: 10.1007/s10038-008-0330-y. - DOI - PubMed

LinkOut - more resources

-