Sparse canonical methods for biological data integration: application to a cross-platform study
- PMID: 19171069
- PMCID: PMC2640358
- DOI: 10.1186/1471-2105-10-34
Sparse canonical methods for biological data integration: application to a cross-platform study
Abstract
Background: In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines.
Results: We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results.
Conclusion: sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.
Figures
Similar articles
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
-
Group sparse canonical correlation analysis for genomic data integration.BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245. BMC Bioinformatics. 2013. PMID: 23937249 Free PMC article.
-
Penalized co-inertia analysis with applications to -omics data.Bioinformatics. 2019 Mar 15;35(6):1018-1025. doi: 10.1093/bioinformatics/bty726. Bioinformatics. 2019. PMID: 30165424 Free PMC article.
-
A sparse PLS for variable selection when integrating omics data.Stat Appl Genet Mol Biol. 2008;7(1):Article 35. doi: 10.2202/1544-6115.1390. Epub 2008 Nov 18. Stat Appl Genet Mol Biol. 2008. PMID: 19049491
-
Platforms for biomarker analysis using high-throughput approaches in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics.IARC Sci Publ. 2011;(163):121-42. IARC Sci Publ. 2011. PMID: 22997859 Review.
Cited by
-
Network dynamics and therapeutic aspects of mRNA and protein markers with the recurrence sites of pancreatic cancer.Heliyon. 2024 May 17;10(10):e31437. doi: 10.1016/j.heliyon.2024.e31437. eCollection 2024 May 30. Heliyon. 2024. PMID: 38803850 Free PMC article.
-
Chemometric analysis illuminates the relationship among browning, polyphenol degradation, Maillard reaction and flavor variation of 5 jujube fruits during air-impingement jet drying.Food Chem X. 2024 Apr 28;22:101425. doi: 10.1016/j.fochx.2024.101425. eCollection 2024 Jun 30. Food Chem X. 2024. PMID: 38736979 Free PMC article.
-
Diet-omics in the Study of Urban and Rural Crohn disease Evolution (SOURCE) cohort.Nat Commun. 2024 May 4;15(1):3764. doi: 10.1038/s41467-024-48106-6. Nat Commun. 2024. PMID: 38704361 Free PMC article.
-
Integrative Analysis of Differentially Expressed Genes in Time-Course Multi-Omics Data with MINT-DE.Res Sq [Preprint]. 2024 Jan 1:rs.3.rs-3806701. doi: 10.21203/rs.3.rs-3806701/v1. Res Sq. 2024. PMID: 38260696 Free PMC article. Preprint.
-
Rapid intestinal and systemic metabolic reprogramming in an immunosuppressed environment.BMC Microbiol. 2023 Dec 9;23(1):394. doi: 10.1186/s12866-023-03141-z. BMC Microbiol. 2023. PMID: 38066426 Free PMC article.
References
-
- Wold H. In: Multivariate Analysis. krishnaiah pr, editor. Academic Press, New York, Wiley; 1966.
-
- Hotelling H. Relations between two sets of variates. Biometrika. 1936;28:321–377.
-
- Krämer N. An overview of the shrinkage properties of partial least squares regression. Computational Statistics. 2007;22:249–273. doi: 10.1007/s00180-007-0038-z. - DOI
-
- Chun H, Keles S. Tech rep. Department of Statistics, University of Wisconsin, Madison, USA; 2007. Sparse Partial Least Squares Regression with an Application to Genome Scale Transcription Factor Analysis.
-
- Bylesjö M, Eriksson D, Kusano M, Moritz T, Trygg J. Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. The Plant Journal. 2007;52:1181–1191. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials