VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families
- PMID: 33471063
- PMCID: PMC8830756
- DOI: 10.1093/bioinformatics/btab026
VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families
Abstract
Motivation: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets.
Results: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified.
Availability and implementation: The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press.
Figures
![Fig. 1.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8830756/bin/btab026f1.gif)
![Fig. 2.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8830756/bin/btab026f2.gif)
![Fig. 3.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8830756/bin/btab026f3.gif)
![Fig. 4.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8830756/bin/btab026f4.gif)
Similar articles
-
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.Nucleic Acids Res. 2023 Jan 6;51(D1):D733-D743. doi: 10.1093/nar/gkac1037. Nucleic Acids Res. 2023. PMID: 36399502 Free PMC article.
-
Computational Tools for the Analysis of Uncultivated Phage Genomes.Microbiol Mol Biol Rev. 2022 Jun 15;86(2):e0000421. doi: 10.1128/mmbr.00004-21. Epub 2022 Mar 21. Microbiol Mol Biol Rev. 2022. PMID: 35311574 Free PMC article. Review.
-
Perspective on taxonomic classification of uncultivated viruses.Curr Opin Virol. 2021 Dec;51:207-215. doi: 10.1016/j.coviro.2021.10.011. Epub 2021 Nov 12. Curr Opin Virol. 2021. PMID: 34781105 Review.
-
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation.PeerJ. 2021 Jun 14;9:e11447. doi: 10.7717/peerj.11447. eCollection 2021. PeerJ. 2021. PMID: 34178438 Free PMC article.
-
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses.Nucleic Acids Res. 2021 Jan 8;49(D1):D764-D775. doi: 10.1093/nar/gkaa946. Nucleic Acids Res. 2021. PMID: 33137183 Free PMC article.
Cited by
-
Enrichable consortia of microbial symbionts degrade macroalgal polysaccharides in Kyphosus fish.mBio. 2024 May 8;15(5):e0049624. doi: 10.1128/mbio.00496-24. Epub 2024 Mar 27. mBio. 2024. PMID: 38534158 Free PMC article.
-
Computational host range prediction-The good, the bad, and the ugly.Virus Evol. 2023 Dec 20;10(1):vead083. doi: 10.1093/ve/vead083. eCollection 2024. Virus Evol. 2023. PMID: 38361822 Free PMC article.
-
COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes.Nat Microbiol. 2024 Mar;9(3):737-750. doi: 10.1038/s41564-023-01598-2. Epub 2024 Feb 6. Nat Microbiol. 2024. PMID: 38321183 Free PMC article.
-
Large language models improve annotation of prokaryotic viral proteins.Nat Microbiol. 2024 Feb;9(2):537-549. doi: 10.1038/s41564-023-01584-8. Epub 2024 Jan 29. Nat Microbiol. 2024. PMID: 38287147
-
Viral Diversity in Benthic Abyssal Ecosystems: Ecological and Methodological Considerations.Viruses. 2023 Nov 21;15(12):2282. doi: 10.3390/v15122282. Viruses. 2023. PMID: 38140524 Free PMC article.
References
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources