GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
- PMID: 11410670
- PMCID: PMC55746
- DOI: 10.1093/nar/29.12.2607
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Abstract
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38401.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38402.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38403a.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38403a.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38404.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38405a.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38405a.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38405a.gif)
![Figure 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38406.gif)
![Figure 7](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38407.gif)
![Figure 8](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38408.gif)
![Figure 9](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38409a.gif)
![Figure 9](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/55746/bin/gke38409a.gif)
Similar articles
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
The computational detection of functional nucleotide sequence motifs in the coding regions of organisms.Exp Biol Med (Maywood). 2008 Jun;233(6):665-73. doi: 10.3181/0704-MR-97. Epub 2008 Apr 11. Exp Biol Med (Maywood). 2008. PMID: 18408149 Review.
-
Accuracy improvement for identifying translation initiation sites in microbial genomes.Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9. Bioinformatics. 2004. PMID: 15247104
-
A novel bacterial gene-finding system with improved accuracy in locating start codons.DNA Res. 2001 Jun 30;8(3):97-106. doi: 10.1093/dnares/8.3.97. DNA Res. 2001. PMID: 11475327
-
Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874. Bioinformatics. 1999. PMID: 10743554
Cited by
-
Tracking the footsteps of Burkholderia mallei: determination of the molecular differences and potential resistance genes.Turk J Med Sci. 2023 Dec 21;54(1):16-25. doi: 10.55730/1300-0144.5761. eCollection 2024. Turk J Med Sci. 2023. PMID: 38812620 Free PMC article.
-
Characterization of Pseudomonas aeruginosa bacteriophages and control hemorrhagic pneumonia on a mice model.Front Microbiol. 2024 May 14;15:1396774. doi: 10.3389/fmicb.2024.1396774. eCollection 2024. Front Microbiol. 2024. PMID: 38808279 Free PMC article.
-
Phylogenomic Characterization of Ranavirus Isolated from Wild Smallmouth Bass (Micropterus dolomieu).Viruses. 2024 Apr 30;16(5):715. doi: 10.3390/v16050715. Viruses. 2024. PMID: 38793597 Free PMC article.
-
First Report of Endemic Frog Virus 3 (FV3)-like Ranaviruses in the Korean Clawed Salamander (Onychodactylus koreanus) in Asia.Viruses. 2024 Apr 25;16(5):675. doi: 10.3390/v16050675. Viruses. 2024. PMID: 38793557 Free PMC article.
-
Introduction of Cellulolytic Bacterium Bacillus velezensis Z2.6 and Its Cellulase Production Optimization.Microorganisms. 2024 May 13;12(5):979. doi: 10.3390/microorganisms12050979. Microorganisms. 2024. PMID: 38792808 Free PMC article.
References
-
- Borodovsky M.Y., Sprizhitskii,Y.A., Golovanov,E.I. and Aleksandrov,A.A. (1986) Statistical patterns in primary structures of functional regions in the in E. coli genome: III. Computer recognition of coding regions. Mol. Biol., 20, 1145–1150. - PubMed
-
- Borodovsky M.Y. and McIninch,J.D. (1993) GeneMark: parallel gene recognition for both DNA strands. Comput. Chem., 17, 123–153.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources