ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes

doi:10.1093/nar/gkg254

Comparative Study

. 2003 Mar 15;31(6):1780-9.

doi: 10.1093/nar/gkg254.

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes

Feng-Biao Guo¹, Hong-Yu Ou, Chun-Ting Zhang

Affiliations

PMID: 12626720
PMCID: PMC152858
DOI: 10.1093/nar/gkg254

Comparative Study

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes

Feng-Biao Guo et al. Nucleic Acids Res. 2003.

. 2003 Mar 15;31(6):1780-9.

doi: 10.1093/nar/gkg254.

Authors

Feng-Biao Guo¹, Hong-Yu Ou, Chun-Ting Zhang

Affiliation

¹ Department of Physics, Tianjin University, Tianjin 300072, China.

PMID: 12626720
PMCID: PMC152858
DOI: 10.1093/nar/gkg254

Abstract

A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/.

PubMed Disclaimer

Figures

**Figure 1**
Distributions of points of GC₃ versus GC₁ corresponding to 405 function-known genes verified experimentally (23), 1206 and 3144 genes additionally predicted by ZCURVE 1.0 and Glimmer 2.02, respectively, for the genome of *P.aeruginosa*. Here GC₃ and GC₁ denote the GC content at the third and first codon positions, respectively. Note that the points corresponding to the function-known genes verified experimentally are situated almost all at the region of GC₃ > GC₁, whereas those for the 1206 and 3144 genes additionally predicted by ZCURVE and Glimmer are situated mainly at the regions of GC₃ > GC₁ and GC₃ < GC₁, respectively. This fact indicates that most of the 3144 genes additionally predicted by Glimmer 2.02 are very unlikely to code for proteins, implying that Glimmer 2.02 has a high false positive prediction rate for this genome.

**Figure 2**
Relation between the overlapping ratio of long ORFs defined in equation 7 and the G+C content. The mean overlapping ratio averaged over 18 bacterial or archaeal genomes studied here is 52.69, whereas the mean overlapping ratio averaged over 14 bacterial or archaeal genomes with relatively lower G+C content is only 1.77. Fitting the points by an exponential curve, it is found that the curve has a turning point at about G+C = 56%, starting from which the value of p increases remarkably.

See this image and copyright information in PMC

Cited by

Going through phages: a computational approach to revealing the role of prophage in Staphylococcus aureus.
Sweet T Jr, Sindi S, Sistrom M. Sweet T Jr, et al. Access Microbiol. 2023 Jun 16;5(6):acmi000424. doi: 10.1099/acmi.0.000424. eCollection 2023. Access Microbiol. 2023. PMID: 37424556 Free PMC article.
An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa.
Sghaier N, Essemine J, Ayed RB, Gorai M, Ben Marzoug R, Rebai A, Qu M. Sghaier N, et al. Plants (Basel). 2022 Dec 23;12(1):71. doi: 10.3390/plants12010071. Plants (Basel). 2022. PMID: 36616201 Free PMC article.
Recombineering in Non-Model Bacteria.
Corts A, Thomason LC, Costantino N, Court DL. Corts A, et al. Curr Protoc. 2022 Dec;2(12):e605. doi: 10.1002/cpz1.605. Curr Protoc. 2022. PMID: 36546891 Free PMC article.
The genome and antigen proteome analysis of Spiroplasma mirum.
Liu P, Li Y, Ye Y, Chen J, Li R, Zhang Q, Li Y, Wang W, Meng Q, Ou J, Yang Z, Sun W, Gu W. Liu P, et al. Front Microbiol. 2022 Nov 2;13:996938. doi: 10.3389/fmicb.2022.996938. eCollection 2022. Front Microbiol. 2022. PMID: 36406404 Free PMC article.
Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction.
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Asim MN, et al. Int J Mol Sci. 2022 Jul 26;23(15):8221. doi: 10.3390/ijms23158221. Int J Mol Sci. 2022. PMID: 35897818 Free PMC article.

See all "Cited by" articles

References

1. Borodovsky M. and McIninch,J. (1993) GenMark: parallel gene recognition for both DNA strands. Comput. Chem., 17, 123–133.
1. Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed
1. Salzberg S.L., Delcher,A.L., Kasif,S. and White,O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544–548. - PMC - PubMed
1. Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed
1. Frishman D., Mironov,A., Mewes,H.W. and Gelfand,M. (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [published erratum appears in Nucleic Acids Res., 26, 3870]. Nucleic Acids Res., 26, 2941–2947. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
- The Lens - Patent Citations

[1] Borodovsky M. and McIninch,J. (1993) GenMark: parallel gene recognition for both DNA strands. Comput. Chem., 17, 123–133.

[2] Borodovsky M. and McIninch,J. (1993) GenMark: parallel gene recognition for both DNA strands. Comput. Chem., 17, 123–133.

[3] Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed

[4] Besemer J., Lomsadze,A. and Borodovsky,M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res., 29, 2607–2618. - PMC - PubMed

[5] Salzberg S.L., Delcher,A.L., Kasif,S. and White,O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544–548. - PMC - PubMed

[6] Salzberg S.L., Delcher,A.L., Kasif,S. and White,O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544–548. - PMC - PubMed

[7] Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed

[8] Delcher A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 4636–4641. - PMC - PubMed

[9] Frishman D., Mironov,A., Mewes,H.W. and Gelfand,M. (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [published erratum appears in Nucleic Acids Res., 26, 3870]. Nucleic Acids Res., 26, 2941–2947. - PMC - PubMed

[10] Frishman D., Mironov,A., Mewes,H.W. and Gelfand,M. (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [published erratum appears in Nucleic Acids Res., 26, 3870]. Nucleic Acids Res., 26, 2941–2947. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes

Affiliation

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources