Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jan;625(7993):41-50.
doi: 10.1038/s41586-023-06661-w. Epub 2023 Dec 13.

Hold out the genome: a roadmap to solving the cis-regulatory code

Affiliations
Review

Hold out the genome: a roadmap to solving the cis-regulatory code

Carl G de Boer et al. Nature. 2024 Jan.

Abstract

Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Kim, S. & Wysocka, J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol. Cell 83, 373–392 (2023). - PubMed - DOI
    1. Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018). - PubMed - DOI
    1. Zeitlinger, J. Seven myths of how transcription factors read the cis-regulatory code. Curr. Opin. Syst. Biol. 23, 22–31 (2020). - PubMed - PMC - DOI
    1. Baralle, M. & Baralle, F. E. The splicing code. Biosystems 164, 39–48 (2018). - PubMed - DOI
    1. Morris, C., Cluet, D. & Ricci, E. P. Ribosome dynamics and mRNA turnover, a complex relationship under constant cellular scrutiny. Wiley Interdiscip. Rev. RNA 12, e1658 (2021). - PubMed - PMC - DOI

LinkOut - more resources

-