Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jan;16(1):123-31.
doi: 10.1101/gr.4074106. Epub 2005 Dec 12.

Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS)

Affiliations

Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS)

Gregory E Crawford et al. Genome Res. 2006 Jan.

Abstract

A major goal in genomics is to understand how genes are regulated in different tissues, stages of development, diseases, and species. Mapping DNase I hypersensitive (HS) sites within nuclear chromatin is a powerful and well-established method of identifying many different types of regulatory elements, but in the past it has been limited to analysis of single loci. We have recently described a protocol to generate a genome-wide library of DNase HS sites. Here, we report high-throughput analysis, using massively parallel signature sequencing (MPSS), of 230,000 tags from a DNase library generated from quiescent human CD4+ T cells. Of the tags that uniquely map to the genome, we identified 14,190 clusters of sequences that group within close proximity to each other. By using a real-time PCR strategy, we determined that the majority of these clusters represent valid DNase HS sites. Approximately 80% of these DNase HS sites uniquely map within one or more annotated regions of the genome believed to contain regulatory elements, including regions 2 kb upstream of genes, CpG islands, and highly conserved sequences. Most DNase HS sites identified in CD4+ T cells are also HS in CD8+ T cells, B cells, hepatocytes, human umbilical vein endothelial cells (HUVECs), and HeLa cells. However, approximately 10% of the DNase HS sites are lymphocyte specific, indicating that this procedure can identify gene regulatory elements that control cell type specificity. This strategy, which can be applied to any cell line or tissue, will enable a better understanding of how chromatin structure dictates cell function and fate.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Clustering analysis of sequences from DNase and random libraries. (A) The total number of clusters was determined for 15 different window sizes. For window sizes <5000 bp, there are larger numbers of clusters for the DNase library. For very large window sizes (100,000 bp), the majority of the DNase and random libraries cluster within only a few regions of the genome. (B) Example of how unique tags cluster into different sizes. The greatest spread between DNase and random libraries occurs at a 500-bp window. (C) Within this optimal window size, there are twice as many DNase clusters of two compared with random. Clusters of three or more are rarely found in random libraries.
Figure 2.
Figure 2.
Validation of DNase clusters by real-time PCR. Delta (Δ) Ct values represent the number of additional cycles to achieve threshold amplification from nuclear DNA treated with DNase I compared with nuclear DNA not treated with DNase I. (A) Most primer sets that flank random regions of the genome display Δ Ct values less than two. Approximately 20% of primer sets that flank sequences from the DNase library that do not cluster with other sequences display Δ Ct values greater than two. Eighty percent of primer sets that flank DNase clusters of two or more display Δ Ct values greater than two. (B) The percentage of primer sets that have Δ Ct values greater than two was determined for each cluster size (% validated sites). Clusters of two were ∼50% accurate at identifying valid HS sites, while clusters of three or more were highly accurate at identifying valid HS sites. (C) The distribution of Δ Ct values was determined for different cluster sizes. Note that the highest cluster sizes have the highest Δ Ct values.
Figure 3.
Figure 3.
Location of DNase clusters of three or more relative to the annotated genome. (A) DNase clusters were mapped to each chromosome, and the density of sites per Mb was determined (blue bars). DNase clusters are significantly overrepresented on chromosomes 17 and 19, which are known to be especially gene rich. No differences were detected when the density of DNase clusters per gene was determined for each chromosome (red bars). (B) The location of DNase clusters relative to genes was determined. Multiples represent DNase clusters that were <2 kb from more than one gene. (C) For comparison, a library of randomly chosen coordinates was also mapped relative to genes. (D) The percentage of DNase and random sites that map to annotated regions of the genome often used to search for gene regulatory elements; regions <2 kb upstream of genes, within CpG islands, and within multispecies conserved sequences (MCS). (E) A Venn diagram shows the percentage of DNase clusters that map within one or more annotated regions of the genome (each region is represented by a circle or oval). “Outside” represents the percentage of DNase HS sites that do not map to any of the three categories. (F) Most human DNase HS sites are also hypersensitive at orthologous regions (mouse cluster) in mouse. These regions displayed higher Δ Ct values than do randomly selected controls. (G) Size of DNase HS sites (in base pairs) was calculated by subtracting the start and stop positions of each DNase cluster.
Figure 4.
Figure 4.
An example of multiple genome-wide technologies used to identify gene regulatory elements. This is a screen shot from the UCSC genome browser ENCODE region Enr232 (chr9: 127,144,681-127,454,484). Shown in the DNase I-HS/NHGRI track are the locations of DNase clusters of two or more, as well as other data tracks. Names and location of exons and introns are indicated in RefSeq Gene track. The conservation track measures the degree of sequence conservation among human, chimp, mouse, rat, and chicken. The Promoter/Stanford track displays relative activity of predicted promoters in luciferase reporter assays (Trinklein et al. 2003). ChIP/L1 displays ChIP-chip data for DNA Polymerase II (Pol2) and transcription initiation factor TFIID subunit 1 (TAF1) from HeLa cells, as determined by the University of California at San Diego (Kim et al. 2005). Note the overlap of many experimental data types.
Figure 5.
Figure 5.
DNase HS sites identify genes that have higher levels of expression using microarray analyses. (A) Average expression values of genes that had a DNase HS site nearby were compared to average expression value of all genes. Genes that had a DNase HS site nearby had higher levels of gene expression in all primary tissues. In addition, the highest levels of gene expression were from peripheral blood cell types, including natural killer (NK), monocytes, B cells, and CD8+ and CD4+ T cells. (B) Average expression values of genes that are associated with different cluster sizes were determined from CD4+ T cells as well as the averaged gene expression from all primary tissues (all cell types). “All genes” represents the average expression value of all genes on the Affymetrix U133A expression array.
Figure 6.
Figure 6.
Cell type specificity of CD4+ T cell-specific DNase HS sites using real-time PCR. Red squares identify “outlier” DNase clusters that were hypersensitive in CD4+ T cells, but not hypersensitive in other cell types. Δ Ct values (both x- and y-axes) mark the relative hypersensitivity of each DNase cluster for each cell type. (A) Two independent CD4+ T cell preparations display that this method is highly reproducible. The single outlier represents rare data points that are >3 SD from the mean. (B) Only one outlier was detected between CD4+ and CD8+ T cells. (C) Five DNase clusters were identified as not hypersensitive in B cells. (D,E,F) Additional DNase clusters were identified as not hypersensitive in hepatocytes, human umbilical vein endothelial cells (HUVEC), and HeLa cells.

Similar articles

Cited by

References

    1. Barnett, V. and Lewis, T. 1998. Outliers in statistical data. John Wiley and Sons, West Sussex, UK.
    1. Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M., Ewan, M., et al. 2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18: 630-634. - PubMed
    1. Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed
    1. Crawford, G.E., Holt, I.E., Mullikin, J.C., Tai, D., National Institutes of Health Intramural Sequencing, Blakesley, R., Bouffard, G., Young, A., Masiello, C., Green, E.D., et al. 2004. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. 101: 992-997. - PMC - PubMed
    1. The Encode Consortium. 2004. The ENCODE (Encyclopedia of DNA Elements) Project. Science 306: 636-640. - PubMed

Web site references

    1. http://genome.ucsc.edu/ENCODE/; UCSC genome browser for ENCODE regions
    1. http://genome.ucsc.edu/; UCSC genome browser
    1. http://www.ncbi.nlm.nih.gov/geo; Gene Expression Omnibus
    1. http://www.bioconductor.org; Bioconductor
    1. http://research.nhgri.nih.gov/DNaseHS/; List of DNase HS sites described in this paper

Publication types

MeSH terms

-