Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
- PMID: 29773659
- PMCID: PMC6028130
- DOI: 10.1101/gr.230615.117
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
Abstract
In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts.
© 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press.
Figures
Similar articles
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
Non-AUG start codons: Expanding and regulating the small and alternative ORFeome.Exp Cell Res. 2020 Jun 1;391(1):111973. doi: 10.1016/j.yexcr.2020.111973. Epub 2020 Mar 21. Exp Cell Res. 2020. PMID: 32209305 Free PMC article. Review.
-
In silico analysis of 5'-UTRs highlights the prevalence of Shine-Dalgarno and leaderless-dependent mechanisms of translation initiation in bacteria and archaea, respectively.J Theor Biol. 2016 Aug 7;402:54-61. doi: 10.1016/j.jtbi.2016.05.005. Epub 2016 May 4. J Theor Biol. 2016. PMID: 27155047
-
Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes.BMC Genomics. 2011 Jul 12;12:361. doi: 10.1186/1471-2164-12-361. BMC Genomics. 2011. PMID: 21749696 Free PMC article.
-
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607. Nucleic Acids Res. 2001. PMID: 11410670 Free PMC article.
Cited by
-
Phylogenetic Analysis and Comparative Genomics of Brucella abortus and Brucella melitensis Strains in Egypt.J Mol Evol. 2024 Jun;92(3):338-357. doi: 10.1007/s00239-024-10173-0. Epub 2024 May 29. J Mol Evol. 2024. PMID: 38809331 Free PMC article.
-
Draft genome sequence data of methicillin-resistant Staphylococcus aureus, strain 4233.Data Brief. 2024 May 11;54:110492. doi: 10.1016/j.dib.2024.110492. eCollection 2024 Jun. Data Brief. 2024. PMID: 38799713 Free PMC article.
-
Complete genome sequence of Paraburkholderia sp. strain 22B1P capable of utilizing 3-chlorobenzoate as a carbon source.Microbiol Resour Announc. 2024 Apr 11;13(4):e0123523. doi: 10.1128/mra.01235-23. Epub 2024 Mar 15. Microbiol Resour Announc. 2024. PMID: 38488372 Free PMC article.
-
Complete genome sequences of six duckweed-associated bacterial strains for studying community assembly in synthetic plant microbiome.Microbiol Resour Announc. 2024 Apr 11;13(4):e0128023. doi: 10.1128/mra.01280-23. Epub 2024 Mar 1. Microbiol Resour Announc. 2024. PMID: 38426728 Free PMC article.
-
Pro-SMP finder-A systematic approach for discovering small membrane proteins in prokaryotes.PLoS One. 2024 Feb 29;19(2):e0299169. doi: 10.1371/journal.pone.0299169. eCollection 2024. PLoS One. 2024. PMID: 38422081 Free PMC article.
References
-
- Aivaliotis M, Gevaert K, Falb M, Tebbe A, Konstantinidis K, Bisle B, Klein C, Martens L, Staes A, Timmerman E, et al. 2007. Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis. J Proteome Res 6: 2195–2204. - PubMed
-
- Babski J, Haas KA, Nather-Schindler D, Pfeiffer F, Forstner KU, Hammelmann M, Hilker R, Becker A, Sharma CM, Marchfelder A, et al. 2016. Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-Seq (dRNA-Seq). BMC Genomics 17: 629. - PMC - PubMed
-
- Borodovsky M, McIninch J. 1993. GeneMark: parallel gene recognition for both DNA strands. Compu Chem 17: 123–133.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources