Feature selection for gene prediction in metagenomic fragments
- PMID: 30026811
- PMCID: PMC6047368
- DOI: 10.1186/s13040-018-0170-z
Feature selection for gene prediction in metagenomic fragments
Abstract
Background: Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences.
Results: In this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential. We use minimum redundancy maximum relevance (mRMR) to select the most relevant features. Support vector machines (SVM) are trained using these features, and the classification score is transformed into the posterior probability of the coding class. A greedy algorithm uses the probability of overlapped candidate genes to select the final genes. Instead of using one model for all sequences, we train an ensemble of SVM models on mutually exclusive datasets based on GC content and use the appropriated model to classify candidate genes based on their read's GC content.
Conclusion: Our proposed algorithm achieves an improvement over some existing algorithms. mRMR produces promising results in gene prediction. It improves classification performance and feature interpretation. Our research serves as a basis for future studies on feature selection for gene prediction.
Keywords: Feature selection; Gene prediction; Metagenomics; ORF; Prokaryotes; mRMR.
Conflict of interest statement
Not applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures
Similar articles
-
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127. BMC Complement Altern Med. 2012. PMID: 22898352 Free PMC article.
-
CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.Interdiscip Sci. 2019 Dec;11(4):628-635. doi: 10.1007/s12539-018-0313-4. Epub 2018 Dec 27. Interdiscip Sci. 2019. PMID: 30588558 Free PMC article.
-
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147. Comb Chem High Throughput Screen. 2017. PMID: 28292249
-
A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset.PLoS One. 2023 Nov 2;18(11):e0286791. doi: 10.1371/journal.pone.0286791. eCollection 2023. PLoS One. 2023. PMID: 37917732 Free PMC article. Review.
-
Prediction Performance of Feature Selectors and Classifiers on Highly Dimensional Transcriptomic Data for Prediction of Weight Loss in Filipino Americans at Risk for Type 2 Diabetes.Biol Res Nurs. 2023 Jul;25(3):393-403. doi: 10.1177/10998004221147513. Epub 2023 Jan 4. Biol Res Nurs. 2023. PMID: 36600204 Free PMC article. Review.
Cited by
-
Ab initio gene prediction for protein-coding regions.Bioinform Adv. 2023 Aug 10;3(1):vbad105. doi: 10.1093/bioadv/vbad105. eCollection 2023. Bioinform Adv. 2023. PMID: 37638212 Free PMC article.
-
Machine learning applications in RNA modification sites prediction.Comput Struct Biotechnol J. 2021 Sep 29;19:5510-5524. doi: 10.1016/j.csbj.2021.09.025. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34712397 Free PMC article. Review.
-
Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review.Biology (Basel). 2020 Dec 9;9(12):453. doi: 10.3390/biology9120453. Biology (Basel). 2020. PMID: 33316921 Free PMC article.
References
-
- Bashir Y, Pradeep Singh S, Kumar Konwar B. Metagenomics: An application based perspective. Chin J Biol. 2014; 2014.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous