A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
- PMID: 33260643
- PMCID: PMC7730808
- DOI: 10.3390/ijms21239070
A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
Abstract
Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
Keywords: DNA sequencing; continuous bag of words; deep learning; ensemble learning; essential genetics and genomics; fastText; prediction model.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/7730808/bin/ijms-21-09070-g001.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/7730808/bin/ijms-21-09070-g002.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/7730808/bin/ijms-21-09070-g003.gif)
Similar articles
-
Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning.Int J Mol Sci. 2022 Dec 7;23(24):15490. doi: 10.3390/ijms232415490. Int J Mol Sci. 2022. PMID: 36555143 Free PMC article. Review.
-
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction.BMC Bioinformatics. 2022 Jun 8;23(1):221. doi: 10.1186/s12859-022-04756-1. BMC Bioinformatics. 2022. PMID: 35676633 Free PMC article.
-
DeepHE: Accurately predicting human essential genes based on deep learning.PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep. PLoS Comput Biol. 2020. PMID: 32936825 Free PMC article.
-
iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks.BMC Genomics. 2019 Dec 24;20(Suppl 9):951. doi: 10.1186/s12864-019-6336-3. BMC Genomics. 2019. PMID: 31874637 Free PMC article.
-
Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112. Brief Bioinform. 2020. PMID: 31714956 Review.
Cited by
-
Essential genes identification model based on sequence feature map and graph convolutional neural network.BMC Genomics. 2024 Jan 10;25(1):47. doi: 10.1186/s12864-024-09958-w. BMC Genomics. 2024. PMID: 38200437 Free PMC article.
-
A Methodology of Condition Monitoring System Utilizing Supervised and Semi-Supervised Learning in Railway.Sensors (Basel). 2023 Nov 9;23(22):9075. doi: 10.3390/s23229075. Sensors (Basel). 2023. PMID: 38005464 Free PMC article.
-
Progress of the "Molecular Informatics" Section in 2022.Int J Mol Sci. 2023 May 29;24(11):9442. doi: 10.3390/ijms24119442. Int J Mol Sci. 2023. PMID: 37298393 Free PMC article.
-
Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification.Front Genet. 2023 Apr 20;14:1154120. doi: 10.3389/fgene.2023.1154120. eCollection 2023. Front Genet. 2023. PMID: 37152988 Free PMC article.
-
Development and Validation of an Explainable Machine Learning-Based Prediction Model for Drug-Food Interactions from Chemical Structures.Sensors (Basel). 2023 Apr 13;23(8):3962. doi: 10.3390/s23083962. Sensors (Basel). 2023. PMID: 37112302 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources