Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):1974-80.
doi: 10.1093/bioinformatics/btv088. Epub 2015 Feb 11.

Identification of cell types from single-cell transcriptomes using a novel clustering method

Affiliations

Identification of cell types from single-cell transcriptomes using a novel clustering method

Chen Xu et al. Bioinformatics. .

Abstract

Motivation: The recent advance of single-cell technologies has brought new insights into complex biological phenomena. In particular, genome-wide single-cell measurements such as transcriptome sequencing enable the characterization of cellular composition as well as functional variation in homogenic cell populations. An important step in the single-cell transcriptome analysis is to group cells that belong to the same cell types based on gene expression patterns. The corresponding computational problem is to cluster a noisy high dimensional dataset with substantially fewer objects (cells) than the number of variables (genes).

Results: In this article, we describe a novel algorithm named shared nearest neighbor (SNN)-Cliq that clusters single-cell transcriptomes. SNN-Cliq utilizes the concept of shared nearest neighbor that shows advantages in handling high-dimensional data. When evaluated on a variety of synthetic and real experimental datasets, SNN-Cliq outperformed the state-of-the-art methods tested. More importantly, the clustering results of SNN-Cliq reflect the cell types or origins with high accuracy.

Availability and implementation: The algorithm is implemented in MATLAB and Python. The source code can be downloaded at http://bioinfo.uncc.edu/SNNCliq.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A–C) SNN graphs constructed with k = 5 (A), 8 (B) and 10 (C) for a synthetic 2D dataset containing six perceptual clusters with high-, mid- and low- densities. Edge weights are not shown for clarity. (D–F) Performance of SNN-Cliq on three synthetic 2D datasets with distinct structures. Dataset are from (Veenman et al., 2002) (D), (Gionis et al., 2007) (E) and (Fu and Medico, 2007) (F). Data points grouped in the same cluster by the algorithm are shown in the same color
Fig. 2.
Fig. 2.
The effects of parameters on the clustering results of the synthetic dataset shown in Figure 1A. (A) The number of clusters detected as a function of k. (B–E) The number of clusters and ARI (see Supplementary Text for how it is calculated) at different parameter settings
Fig. 3.
Fig. 3.
Comparison of the clustering results from different algorithms on the human cancer cell dataset (Ramsköld et al., 2012) (A), human embryonic cell dataset (Yan et al., 2013) (B) and mouse embryonic cell dataset (Deng et al., 2014). In the heatmap, each row stands for an individual cell; each column corresponds to the clustering result produced by one of the four methods. Cells that are grouped in the same cluster by a method are displayed in the same color in the column. Cells that are treated as noise or singletons by the method are shown in black in the column. The embryo origins of cells from the same stage are distinguished by the first number in the cell names
Fig. 4.
Fig. 4.
Evaluation of clustering algorithms by external validation measures, Purity, ARI and F1 score. The gold standard of classes is determined by cell types or developmental stages. For mouse embryonic cell dataset, gold standard also considers the library preparation technique (Smart-Seq or Smart-Seq2)

Similar articles

Cited by

References

    1. Beyer K., et al. (1999) When is “nearest neighbor” meaningful? In: Beeri C., Buneman P. (eds.) ICDT ’99 Proceedings of the 7th International Conference on Database Theory. p. 217–235. Springer-Verlag London, UK.
    1. Brennecke P., et al. (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat. methods, 10, 1093–1095. - PubMed
    1. Buganim Y., et al. (2012) Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell, 150, 1209–1222. - PMC - PubMed
    1. Carey V., et al. (2011) RBGL: an interface to the BOOST graph library, R package version 1.40.1.
    1. Deng Q., et al. (2014) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 343, 193–196. - PubMed

Publication types

-