Hold out the genome: a roadmap to solving the cis-regulatory code
- PMID: 38093018
- DOI: 10.1038/s41586-023-06661-w
Hold out the genome: a roadmap to solving the cis-regulatory code
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
© 2023. Springer Nature Limited.
Similar articles
-
Information content differentiates enhancers from silencers in mouse photoreceptors.Elife. 2021 Sep 6;10:e67403. doi: 10.7554/eLife.67403. Elife. 2021. PMID: 34486522 Free PMC article.
-
Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells.Elife. 2020 Feb 11;9:e41279. doi: 10.7554/eLife.41279. Elife. 2020. PMID: 32043966 Free PMC article.
-
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.BMC Bioinformatics. 2018 May 31;19(1):202. doi: 10.1186/s12859-018-2187-1. BMC Bioinformatics. 2018. PMID: 29855387 Free PMC article.
-
Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences.Genomics. 2015 Sep;106(3):165-170. doi: 10.1016/j.ygeno.2015.06.003. Epub 2015 Jun 10. Genomics. 2015. PMID: 26072432 Free PMC article. Review.
-
Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.Nat Immunol. 2004 Aug;5(8):768-74. doi: 10.1038/ni0804-768. Nat Immunol. 2004. PMID: 15282556 Review.
Cited by
-
Circadian regulation of stereotypic chromatin conformations at enhancers.bioRxiv [Preprint]. 2024 Apr 24:2024.04.24.590818. doi: 10.1101/2024.04.24.590818. bioRxiv. 2024. PMID: 38712031 Free PMC article. Preprint.
-
Epigenomic insights into common human disease pathology.Cell Mol Life Sci. 2024 Apr 11;81(1):178. doi: 10.1007/s00018-024-05206-2. Cell Mol Life Sci. 2024. PMID: 38602535 Free PMC article. Review.
-
Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation.Bioinformatics. 2024 Mar 29;40(4):btae190. doi: 10.1093/bioinformatics/btae190. Bioinformatics. 2024. PMID: 38588559 Free PMC article.
-
Regulatory activity is the default DNA state in eukaryotes.Nat Struct Mol Biol. 2024 Mar;31(3):559-567. doi: 10.1038/s41594-024-01235-4. Epub 2024 Mar 6. Nat Struct Mol Biol. 2024. PMID: 38448573
-
Evaluation and optimization of sequence-based gene regulatory deep learning models.bioRxiv [Preprint]. 2024 Feb 17:2023.04.26.538471. doi: 10.1101/2023.04.26.538471. bioRxiv. 2024. PMID: 38405704 Free PMC article. Preprint.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials