ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
- PMID: 16401352
- PMCID: PMC1352377
- DOI: 10.1186/1471-2105-7-9
ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes
Abstract
Background: It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes.
Results: The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: http://tubic.tju.edu.cn/Zcurve_V/.
Conclusion: ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced.
Similar articles
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.Virus Res. 2018 Jan 15;244:110-115. doi: 10.1016/j.virusres.2017.10.019. Epub 2017 Nov 1. Virus Res. 2018. PMID: 29100906 Review.
-
ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.Nucleic Acids Res. 2015 Jul 1;43(W1):W85-90. doi: 10.1093/nar/gkv491. Epub 2015 May 14. Nucleic Acids Res. 2015. PMID: 25977299 Free PMC article.
-
ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes.Biochem Biophys Res Commun. 2003 Jul 25;307(2):382-8. doi: 10.1016/s0006-291x(03)01192-6. Biochem Biophys Res Commun. 2003. PMID: 12859968 Free PMC article.
-
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254. Nucleic Acids Res. 2003. PMID: 12626720 Free PMC article.
Cited by
-
Genome, biology and stability of the Thurquoise phage - A new virus from the Bastillevirinae subfamily.Front Microbiol. 2023 Mar 14;14:1120147. doi: 10.3389/fmicb.2023.1120147. eCollection 2023. Front Microbiol. 2023. PMID: 36998400 Free PMC article.
-
Genome annotation of disease-causing microorganisms.Brief Bioinform. 2021 Mar 22;22(2):845-854. doi: 10.1093/bib/bbab004. Brief Bioinform. 2021. PMID: 33537706 Free PMC article. Review.
-
Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions.BMC Genomics. 2020 Jun 16;21(1):407. doi: 10.1186/s12864-020-06818-1. BMC Genomics. 2020. PMID: 32546194 Free PMC article.
-
Vgas: A Viral Genome Annotation System.Front Microbiol. 2019 Feb 13;10:184. doi: 10.3389/fmicb.2019.00184. eCollection 2019. Front Microbiol. 2019. PMID: 30814982 Free PMC article.
-
The complete genome sequence of a third distinct baculovirus isolated from the true armyworm, Mythimna unipuncta, contains two copies of the lef-7 gene.Virus Genes. 2018 Apr;54(2):297-310. doi: 10.1007/s11262-017-1525-0. Epub 2017 Dec 4. Virus Genes. 2018. PMID: 29204787
References
-
- Badger JH, Olsen GJ. CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol. 1999;16:512–24. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous