Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 6;489(7414):83-90.
doi: 10.1038/nature11212.

An expansive human regulatory lexicon encoded in transcription factor footprints

Affiliations

An expansive human regulatory lexicon encoded in transcription factor footprints

Shane Neph et al. Nature. .

Abstract

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Parallel profiling of genomic regulatory factor occupancy across 41 cell types
a, DNaseI footprinting of K562 cells identifies the individual nucleotides within the MTPN promoter that are bound by NRF1. b, Example locus harboring eight clearly defined DNaseI footprints in Th1 and SK-N-SH_RA cells, with TRANSFAC database motif instances indicated below. c, Heatmaps showing per-nucleotide DNaseI cleavage (left) and vertebrate conservation by phyloP (right) for 4,262 NRF1 motifs within K562 DHSs ranked by the local density of DNaseI cleavages. Green ticks indicate the presence of DNaseI footprints over motif instances. Blue ticks indicate the presence of ChIP-seq peaks over the motif instances. d, Lowess regression of NRF1, USF, NFE2, and NFYA K562 ChIP-seq signal intensities versus DNaseI footprinting occupancy (footprint occupancy score) at K562 DNaseI footprints containing NRF1, USF, NFE2, and NFYA motifs.
Figure 2
Figure 2. DNaseI footprints mark sites of in vivo protein occupancy
a, Schematic and plots showing the effect of T/C SNV rs4144593 on protein occupancy and chromatin accessibility. Bar graph y-axis is the number of DNaseI cleavage events containing either the T or C allele. Middle plots show T or C allele-specific DNaseI cleavage profiles from 10 cell lines heterozygous for the T/C alleles at rs4144593. Right plots show DNaseI cleavage profiles from 18 cell lines homozygous for the C allele at rs4144593 and 1 cell line homozygous for the T allele at rs4144593. Cleavage plots are cut off at 60% cleavage height. b, The average CpG methylation within IMR90 DNaseI footprints, IMR90 DHSs (but not in footprints) and non-hypersensitive genomic regions in IMR90 cells. CpG methylation is significantly depleted in DNaseI footprints (P < 2.2×10−16, Mann-Whitney test).
Figure 3
Figure 3. Footprint structure parallels TF structure and is imprinted on the human genome
a, The co-crystal structure of Upstream Stimulatory Factor (USF) bound to its DNA ligand is juxtaposed above the average nucleotide-level DNaseI cleavage pattern (blue) at motif instances of USF in DNaseI footprints. Nucleotides that are sensitive to cleavage by DNaseI are colored as blue on the co-crystal structure. The motif logo generated from USF DNaseI footprints is displayed below the DNaseI cleavage pattern. Below is a randomly ordered heatmap showing the per-nucleotide DNaseI cleavage for each motif instance of USF in DNaseI footprints. b, The per-base DNaseI hypersensitivity (blue) and vertebrate phylogenetic conservation (red) for all DNaseI footprints in dermal fibroblasts matching three well annotated transcription factor motifs. The white box indicates width of consensus motif. The number of motif occurrences within DNaseI footprints in indicated below each graph.
Figure 4
Figure 4. A highly stereotyped chromatin structural motif marks sites of transcription initiation in human promoters
a, A 35–55 base-pair footprint is the predominant feature of many promoter DHSs and is in tight spatial coordination with the transcription start site. b, Heatmap of the per-nucleotide DNaseI cleavage pattern at 5,041 instances of this stereotypical footprint in K562 cells. c, Aggregate per-base DNaseI cleavage profile (blue line) and mean per-nucleotide conservation score (phyloP) surrounding instances of this stereotypical footprint in K562 cells (red dashed line). d, Aggregate strand corrected CAGE sequencing data (green line) and the average nearest 5’ end of a spliced EST (orange line) surrounding instances of this stereotypical footprint in K562 cells.
Figure 5
Figure 5. Distinguishing direct and indirect binding of transcription factors
Heatmap of the enrichment of pairs of transcription factors in a direct-indirect association. Direct peaks are defined by ChIP occupancy accompanied by a footprint overlapping a compatible motif. Indirect peaks do not have a compatible motif. The color of each cell is determined by the fraction of indirect peaks that co-localize with the direct peaks of another factor.
Figure 6
Figure 6. De novo motif discovery expands the human regulatory lexicon
a, Overview of de novo motif discovery using DNaseI footprints. b, Annotation of the 683 de novo-derived motif models using previously identified transcription factor motifs. 394 of these de novo-derived motifs match a motif annotated within the TRANSFAC, JASPAR or UniPROBE databases, whereas 289 are novel motifs (pie chart). The de novo consensus matching TRANSFAC, JASPAR or UniPROBE sequences cover the majority of each database (bar chart) c, Example of a DNaseI footprint found in multiple cell types that is annotated solely by one of the novel de novo-derived motifs. d, Box-andwhisker plot comparing the average nucleotide diversity at instances of the 289 novel de novoderived motif models to instances of motifs present in databases of known specificities (x-axis). The blue bar indicates the average nucleotide diversity (π) at 4-fold degenerate coding sites (width is equal to 95% confidence interval); gold bar indicates π at all coding sites (width is equal to 95% confidence interval). e, Phylogenetic conservation (red dashed) and per-base DNaseI hypersensitivity (blue) for all DNaseI footprints in dermal fibroblast cells matching two novel de novo-derived motifs. The white box indicates width of consensus motif. f, Per-nucleotide mouse liver DNaseI cleavage patterns at occurrences of the motifs in (e) at DNaseI footprints identified in mouse liver.
Figure 7
Figure 7. Multi-lineage DNaseI footprinting reveals cell-selective gene regulators
a, Comparative footprinting of the nerve growth factor gene (VGF) promoter in multiple cell types reveals both conserved (NRF1, USF and SP1) and cell-selective (NRSF) DNaseI footprints. b, Shown is a heatmap of footprint occupancy computed across 12 cell types (columns) for 89 motifs (rows), including well-characterized cell/tissue-selective regulators, and novel de novo-derived motifs (red text). The motif models for some of these novel de novo-derived motifs are indicated next to the heatmap. c, The proportion of motif instances in DNaseI footprints within distal regulatory regions for known (black) and novel (red) cell-type specific regulators in (b) is indicated. Also noted are these values for a small set of known promoter-proximal regulators (green).

Comment in

Similar articles

  • Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements.
    Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Iñiguez-Muñoz S, et al. Cell Mol Life Sci. 2024 Jun 20;81(1):274. doi: 10.1007/s00018-024-05314-z. Cell Mol Life Sci. 2024. PMID: 38902506 Review.
  • Global reference mapping of human transcription factor footprints.
    Vierstra J, Lazar J, Sandstrom R, Halow J, Lee K, Bates D, Diegel M, Dunn D, Neri F, Haugen E, Rynes E, Reynolds A, Nelson J, Johnson A, Frerker M, Buckley M, Kaul R, Meuleman W, Stamatoyannopoulos JA. Vierstra J, et al. Nature. 2020 Jul;583(7818):729-736. doi: 10.1038/s41586-020-2528-x. Epub 2020 Jul 29. Nature. 2020. PMID: 32728250 Free PMC article.
  • Genomic footprinting.
    Vierstra J, Stamatoyannopoulos JA. Vierstra J, et al. Nat Methods. 2016 Mar;13(3):213-21. doi: 10.1038/nmeth.3768. Nat Methods. 2016. PMID: 26914205 Review.
  • The accessible chromatin landscape of the human genome.
    Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Qu H, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Yan Y, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. Thurman RE, et al. Nature. 2012 Sep 6;489(7414):75-82. doi: 10.1038/nature11232. Nature. 2012. PMID: 22955617 Free PMC article.
  • An integrated encyclopedia of DNA elements in the human genome.
    ENCODE Project Consortium. ENCODE Project Consortium. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247. Nature. 2012. PMID: 22955616 Free PMC article.

Cited by

References

    1. Galas DJ, Schmitz A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 1978;5:3157–3170. - PMC - PubMed
    1. Dynan WS, Tjian R. The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell. 1983;35:79–87. - PubMed
    1. Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 1988;57:159–197. - PubMed
    1. Hesselberth JR, et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat. Methods. 2009;6:283–289. - PMC - PubMed
    1. Thanos D, Maniatis T. Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell. 1995;83:1091–1100. - PubMed

Publication types

Associated data

-