ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes
- PMID: 12626720
- PMCID: PMC152858
- DOI: 10.1093/nar/gkg254
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes
Abstract
A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/152858/bin/gkg254f1.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/152858/bin/gkg254f2.gif)
Similar articles
-
Comparative Genomics for Prokaryotes.Methods Mol Biol. 2018;1704:55-78. doi: 10.1007/978-1-4939-7463-4_3. Methods Mol Biol. 2018. PMID: 29277863 Review.
-
Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes.Proteomics. 2016 Jan;16(2):226-40. doi: 10.1002/pmic.201500263. Epub 2015 Nov 23. Proteomics. 2016. PMID: 26773550 Review.
-
ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.Nucleic Acids Res. 2015 Jul 1;43(W1):W85-90. doi: 10.1093/nar/gkv491. Epub 2015 May 14. Nucleic Acids Res. 2015. PMID: 25977299 Free PMC article.
-
ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes.BMC Bioinformatics. 2006 Jan 10;7:9. doi: 10.1186/1471-2105-7-9. BMC Bioinformatics. 2006. PMID: 16401352 Free PMC article.
-
Gene recognition from questionable ORFs in bacterial and archaeal genomes.J Biomol Struct Dyn. 2003 Aug;21(1):99-109. doi: 10.1080/07391102.2003.10506908. J Biomol Struct Dyn. 2003. PMID: 12854962
Cited by
-
Going through phages: a computational approach to revealing the role of prophage in Staphylococcus aureus.Access Microbiol. 2023 Jun 16;5(6):acmi000424. doi: 10.1099/acmi.0.000424. eCollection 2023. Access Microbiol. 2023. PMID: 37424556 Free PMC article.
-
An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa.Plants (Basel). 2022 Dec 23;12(1):71. doi: 10.3390/plants12010071. Plants (Basel). 2022. PMID: 36616201 Free PMC article.
-
Recombineering in Non-Model Bacteria.Curr Protoc. 2022 Dec;2(12):e605. doi: 10.1002/cpz1.605. Curr Protoc. 2022. PMID: 36546891 Free PMC article.
-
The genome and antigen proteome analysis of Spiroplasma mirum.Front Microbiol. 2022 Nov 2;13:996938. doi: 10.3389/fmicb.2022.996938. eCollection 2022. Front Microbiol. 2022. PMID: 36406404 Free PMC article.
-
Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction.Int J Mol Sci. 2022 Jul 26;23(15):8221. doi: 10.3390/ijms23158221. Int J Mol Sci. 2022. PMID: 35897818 Free PMC article.
References
-
- Borodovsky M. and McIninch,J. (1993) GenMark: parallel gene recognition for both DNA strands. Comput. Chem., 17, 123–133.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources