Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 5;44(8):e73.
doi: 10.1093/nar/gkv1525. Epub 2016 Jan 5.

Robust detection of alternative splicing in a population of single cells

Affiliations

Robust detection of alternative splicing in a population of single cells

Joshua D Welch et al. Nucleic Acids Res. .

Abstract

Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3' bias and technical noise. These unique properties of single cell RNA-seq data make study of alternative splicing difficult, and thus most single cell studies have restricted analysis of transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model to detect genes whose isoform usage shows biological variation significantly exceeding technical noise in a population of single cells. Importantly, SingleSplice is tailored to the unique demands of single cell analysis, detecting isoform usage differences without attempting to infer expression levels for full-length transcripts. Using data from spike-in transcripts, we found that our approach detects variation in isoform usage among single cells with high sensitivity and specificity. We also applied SingleSplice to data from mouse embryonic stem cells and discovered a set of genes that show significant biological variation in isoform usage across the set of cells. A subset of these isoform differences are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Diagram of SingleSplice method. (A) SingleSplice constructs an expression-weighted splice graph directly from aligned reads (top), then identifies alternative splicing modules (ASMs) and calculates the coverage on each ASM path (indicated in black, red, yellow and green). (B) For each ASM path, a distribution is fit to capture the expected variation in coverage due to technical noise. (C) SingleSplice computes the expected variation in isoform usage by sampling repeatedly from the fitted noise distributions. The resulting sampled values are used to compute an empirical P-value for the null hypothesis that the observed variation in isoform usage results from technical noise alone.
Figure 2.
Figure 2.
Fitting a technical noise model using spike-in transcripts. (A) Gamma regression model to predict variance in coverage as a function of mean expression level. The observed data are shown as black points and the gamma fit is drawn in red. (B) Logistic regression model predicting dropout rate as a function of mean expression level. The observed data are shown as black points, and the regression line is shown in red. (C) Expected (line) and observed (histogram) ratio distributions for a pair of spike-in transcripts showing no ratio change. Note that expectation and observation match very well in this case, indicating that the model effectively predicts the effects of technical noise. (D) Expected (line) and observed (histogram) ratio distributions for a pair of spike-in transcripts showing simulated isoform switching. Note that the observed ratio values differ significantly from what is expected based on technical noise alone.
Figure 3.
Figure 3.
Accounting for effects of cell size. (A) Variation in the relative proportions of reads mapping to spike-in transcripts and cellular transcripts indicates that the amount of cellular RNA varies reproducibly during the cell cycle. (B) Since spike-in transcripts are added at constant amounts, their measured expression levels should vary randomly across the set of cells. Instead, PCA using only reads per kilobase length per million reads (RPKMs) from spike-in transcripts before cell size normalization predicts cell cycle stage. (C) Spike-in expression levels should fluctuate randomly due to technical noise, but instead spike-in expression levels before normalization are strongly correlated with each other and with cell size. Note how closely the blue, orange and grey lines trend together. (D) Normalizing for cell size using the fraction of reads that come from spike-in versus cellular RNA removes this effect.
Figure 4.
Figure 4.
Testing the sensitivity and specificity of SingleSplice using spike-in transcripts. (A) True negative examples are created by pairing spike-in transcripts. Any variation in the ratio of these transcripts is due to technical noise. (B) Scatter plot showing expected (SingleSplice prediction) ratio variance versus observed ratio variance for true negative test cases. Each box represents a single pair of spike-ins, and area of the box is proportional to the mean expression level. Test cases where SingleSplice correctly identified the pair of spike-ins as showing no isoform variation are colored green. (C) True positive examples are created by swapping half of the measured expression levels of a pair of spike-in transcripts. Ratio variation in these examples comes from technical noise and simulated isoform switching. (D) Scatter plot showing expected versus observed ratio variance for true positive test cases. Test cases where SingleSplice correctly identified the pair of spike-ins as showing significant isoform variation are colored green.
Figure 5.
Figure 5.
Discovery of splicing changes during the cell cycle. (A) Expected (line) and observed (histogram) ratio distributions for the Rbm25 gene. Note that the isoform usage differs significantly from what is expected based on technical noise alone. (BD) The Hnrnpc, Snhg3 and Rbm25 genes show isoform usage changes during the cell cycle. The exon-intron structure (5′ to 3′ in direction of transcription) of each pair of ASM paths is shown above the corresponding plot. The ratios shown in these panels are computed with respect to the top ASM path – i.e. a ratio of 0 corresponds to only the bottom ASM path, and a ratio of 1 indicates only the top ASM path. (E) PCA using isoform ratios alone separates cells according to cell cycle stage.

Similar articles

Cited by

References

    1. Nilsen T.W., Graveley B.R. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463:457–463. - PMC - PubMed
    1. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10:57–63. - PMC - PubMed
    1. Sandberg R. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods. 2013;11:22–24. - PubMed
    1. Shapiro E., Biezuner T., Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 2013;14:618–630. - PubMed
    1. Saliba A.-E., Westermann A.J., Gorski S.A., Vogel J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014;42:8845–8860. - PMC - PubMed

Publication types

LinkOut - more resources

-