Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Jul 9;99(14):9121-6.
doi: 10.1073/pnas.132656399. Epub 2002 Jun 24.

Cluster analysis of gene expression dynamics

Affiliations
Comparative Study

Cluster analysis of gene expression dynamics

Marco F Ramoni et al. Proc Natl Acad Sci U S A. .

Abstract

This article presents a Bayesian method for model-based clustering of gene expression dynamics. The method represents gene-expression dynamics as autoregressive equations and uses an agglomerative procedure to search for the most probable set of clusters given the available data. The main contributions of this approach are the ability to take into account the dynamic nature of gene expression time series during clustering and a principled way to identify the number of distinct clusters. As the number of possible clustering models grows exponentially with the number of observed time series, we have devised a distance-based heuristic search procedure able to render the search process feasible. In this way, the method retains the important visualization capability of traditional distance-based clustering and acquires an independent, principled measure to decide when two series are different enough to belong to different clusters. The reliance of this method on an explicit statistical representation of gene expression dynamics makes it possible to use standard statistical techniques to assess the goodness of fit of the resulting model and validate the underlying assumptions. A set of gene-expression time series, collected to study the response of human fibroblasts to serum, is used to identify the properties of the method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagnostic plots for the clustering model identified by the method when the autoregressive order is p = 1. The first row reports histogram of standardized residuals. The second row reports the scatter plot of fitted values vs. observed values. The third row shows the scatter plot of fitted values vs. standardized residuals. The fourth row displays, in black, the four cluster average profiles—computed as averages of the observed time series in each cluster—and, in blue, the averages of the fitted time series in each cluster. In these plots, the x axis reports time in hours.
Figure 2
Figure 2
Binary tree (dendrogram) and labeled gene expression display showing the clustering model obtained by our method on the data reported in Iyer et al. (8). The numbers on the branch points of the tree represent how many times the merging of two series renders the model more probable. The model identifies four distinct clusters containing 3 (Cluster 1), 216 (Cluster 2), 293 (Cluster 3), and 5 (Cluster 4) time series.
Figure 3
Figure 3
A zoom of the dendrogram in Fig. 2, with details of the probability of merging.

Similar articles

Cited by

References

    1. Schena M, Shalon D, Davis R W, Brown P O. Science. 1995;270:467–470. - PubMed
    1. Lockhart D J, Dong H, Byrne M C, Follettie M T, Gallo M V, Chee M S, Mittmann M, Wang C, Kobayashi M, Horton H, Brown E L. Nat Biotechnol. 1996;14:1675–1680. - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E S, Golub T R. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed
    1. Butte A J, Tamayo P, Slonim D, Golub T R, Kohane I S. Proc Natl Acad Sci USA. 2000;97:12182–12186. - PMC - PubMed
    1. Alter O, Brown P O, Botstein D. Proc Natl Acad Sci USA. 2000;97:10101–10106. - PMC - PubMed

Publication types

LinkOut - more resources

-