Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data
- PMID: 29928470
- PMCID: PMC6004614
Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data
Abstract
We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.
Figures
Similar articles
-
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490. Bioinformatics. 2018. PMID: 29036318 Free PMC article.
-
Clustering distributions with the marginalized nested Dirichlet process.Biometrics. 2018 Jun;74(2):584-594. doi: 10.1111/biom.12778. Epub 2017 Sep 28. Biometrics. 2018. PMID: 28960246
-
Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods.BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):412. doi: 10.1186/s12859-018-2382-0. BMC Bioinformatics. 2018. PMID: 30453873 Free PMC article.
-
Hierarchical Dirichlet process model for gene expression clustering.EURASIP J Bioinform Syst Biol. 2013 Apr 12;2013(1):5. doi: 10.1186/1687-4153-2013-5. EURASIP J Bioinform Syst Biol. 2013. PMID: 23587447 Free PMC article.
-
A COMPOSITIONAL MODEL TO ASSESS EXPRESSION CHANGES FROM SINGLE-CELL RNA-SEQ DATA.Ann Appl Stat. 2021 Jun;15(2):880-901. doi: 10.1214/20-aoas1423. Epub 2021 Jul 12. Ann Appl Stat. 2021. PMID: 37332668 Free PMC article.
Cited by
-
scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data.Brief Bioinform. 2024 Mar 27;25(3):bbae148. doi: 10.1093/bib/bbae148. Brief Bioinform. 2024. PMID: 38600665 Free PMC article.
-
scCURE identifies cell types responding to immunotherapy and enables outcome prediction.Cell Rep Methods. 2023 Nov 20;3(11):100643. doi: 10.1016/j.crmeth.2023.100643. Cell Rep Methods. 2023. PMID: 37989083 Free PMC article.
-
A new and effective two-step clustering approach for single cell RNA sequencing data.BMC Genomics. 2023 Nov 9;23(Suppl 6):864. doi: 10.1186/s12864-023-09577-x. BMC Genomics. 2023. PMID: 37946133 Free PMC article.
-
Essential procedures of single-cell RNA sequencing in multiple myeloma and its translational value.Blood Sci. 2023 Nov 2;5(4):221-236. doi: 10.1097/BS9.0000000000000172. eCollection 2023 Oct. Blood Sci. 2023. PMID: 37941914 Free PMC article. Review.
-
scKINETICS: inference of regulatory velocity with single-cell transcriptomics data.Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i394-i403. doi: 10.1093/bioinformatics/btad267. Bioinformatics. 2023. PMID: 37387147 Free PMC article.
References
-
- Amir El-ad David, Davis Kara L, Tadmor Michelle D, Simonds Erin F, Levine Jacob H, Bendall Sean C, Shenfeld Daniel K, Krishnaswamy Smita, Nolan Garry P, Pe’er Dana. visne enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nature biotechnology. 2013;31(6):545–552. - PMC - PubMed
-
- Antoniak Charles E. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The Annals of Statistics. 1974:1152–1174.
-
- Bengio Yoshua. Statistical language and speech processing. Springer; 2013. Deep learning of representations: Looking forward. In; pp. 1–37.
-
- Blei David M, Jordan Michael I. Variational methods for the dirichlet process. In: Brodley Carla E., editor. Proceedings of the International Conference on Machine Learning (ICML 2004) Vol. 69. 2004. (ACM International Conference Proceeding Series).
Grants and funding
LinkOut - more resources
Full Text Sources