Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 3;39(4):btad162.
doi: 10.1093/bioinformatics/btad162.

Con-AAE: contrastive cycle adversarial autoencoders for single-cell multi-omics alignment and integration

Affiliations

Con-AAE: contrastive cycle adversarial autoencoders for single-cell multi-omics alignment and integration

Xuesong Wang et al. Bioinformatics. .

Abstract

Motivation: We have entered the multi-omics era and can measure cells from different aspects. Hence, we can get a more comprehensive view by integrating or matching data from different spaces corresponding to the same object. However, it is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Though some techniques can be used to measure scATAC-seq and scRNA-seq simultaneously, the data are usually highly noisy due to the limitations of the experimental environment.

Results: To promote single-cell multi-omics research, we overcome the above challenges, proposing a novel framework, contrastive cycle adversarial autoencoders, which can align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Con-AAE can efficiently map the above data with high sparsity and noise from different spaces to a coordinated subspace, where alignment and integration tasks can be easier. We demonstrate its advantages on several datasets.

Availability and implementation: Zenodo link: https://zenodo.org/badge/latestdoi/368779433. github: https://github.com/kakarotcq/Con-AAE.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) scRNA-seq and scATAC-seq data measure different aspects of the same cell. We aim at identifying the correspondence between the two kinds of data from the same set of cells. (b) The Con-AAE framework uses two autoencoders to map the two kinds of sequence data into two low-dimensional manifolds, forcing the two spaces to be as unified as possible with the adversarial loss and latent cycle-consistency loss. We train the models without pairwise information for the alignment task but consider the data noise explicitly by utilizing self-supervised contrastive learning. We feed the annotated data for the integration task to help the model learn
Figure 2.
Figure 2.
(a) The embedding produced by the first encoder will go through the second decoder and encoder successfully to produce another cycled embedding. We can check the consistency between the original embedding and the cycled embedding. (b) The contrastive loss minimizes the distance between positive pairs and maximizes the distance between negative pairs. This loss makes our method more robust to noise
Figure 3.
Figure 3.
The figure shows the integration performance on 24 simulated datasets with various data sizes and SNR. The horizontal axis represents the SNR, and the vertical axis represents the percentage of correct integration. The Con-AAE outperforms other methods in most cases. As the SNR ratio decreases and the size of the dataset grows, the performance of all the methods degrades significantly. However, Con-AAE still has excellent performance, demonstrating its scalability and robustness
Figure 4.
Figure 4.
The box plot shows the alignment performance on 24 simulated datasets with various data sizes and SNR. In most cases, Con-AAE has almost the highest upper edge, lower edge, median, and upper and lower quartiles, which indicates that the overall performance distribution of Con-AAE is higher than that of other methods
Figure 5.
Figure 5.
Con-AAE compares with SOTA methods on the four real-world datasets. The upside is the integration performance, and the downside is alignment performance. The horizontal axis of the upside and the vertical axis of the downside are percentages. Con-AAE has the best performance on both criteria. Note that the identification of cell pairwise correspondences between single cells is termed “anchor” (Stuart et al. 2019). Cross-Modal-anchor indicates that “anchor” information is provided when training Cross-Modal

Similar articles

Cited by

References

    1. Andrew G, Arora R., Bilmes J.. et al. Deep canonical correlation analysis. In: International Conference on Machine Learning. 1247–55. PMLR, 2013.
    1. Argelaguet R, Velten B, Arnol D. et al. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 2018;14:e8124. - PMC - PubMed
    1. Argelaguet R, Arnol D, Bredikhin D. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol 2020;21:1–17. - PMC - PubMed
    1. Bersanelli M, Mosca E, Remondini D. et al. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016;17:167–77. - PMC - PubMed
    1. Bińkowski M et al. Demystifying MMD GANs. In: International Conference on Learning Representations 2018.

Publication types

-