Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 28;21(23):9070.
doi: 10.3390/ijms21239070.

A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification

Affiliations

A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification

Nguyen Quoc Khanh Le et al. Int J Mol Sci. .

Abstract

Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.

Keywords: DNA sequencing; continuous bag of words; deep learning; ensemble learning; essential genetics and genomics; fastText; prediction model.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Identification of essential genes at different levels of fastText n-grams. The performance of 6-g (area under the receiver operating characteristic curve (AUC) = 0.78) was better than the other levels.
Figure 2
Figure 2
Performance results of identifying essential genes in representative cross-species datasets using the proposed model. Detailed information and predictive accuracy of all species are shown in Supplementary Table S2.
Figure 3
Figure 3
Work flow of the study in identifying essential genes using sequence information. The input was comprised of genes with different lengths and containing different nucleotides. The word-embedding features were extracted by using the fastText package and then learnt by an ensemble deep neural network. After the ensemble network, the output contained binary probabilities to show whether the represented genes belonged to essential genes. Red, green triangles, green circle, blue squares and red pentagons are examples of data points.

Similar articles

Cited by

References

    1. O’Neill R.S., Clark D.V. The Drosophila melanogaster septin gene Sep2 has a redundant function with the retrogene Sep5 in imaginal cell proliferation but is essential for oogenesis. Genome. 2013;56:753–758. doi: 10.1139/gen-2013-0210. - DOI - PubMed
    1. Juhas M., Eberl L., Glass J.I. Essence of life: Essential genes of minimal genomes. Trends Cell Biol. 2011;21:562–568. doi: 10.1016/j.tcb.2011.07.005. - DOI - PubMed
    1. Koonin E.V. How many genes can make a cell: The minimal-gene-set concept. Annu. Rev. Genom. Hum. Genet. 2000;1:99–116. doi: 10.1146/annurev.genom.1.1.99. - DOI - PMC - PubMed
    1. Juhas M., Reuß D.R., Zhu B., Commichau F.M. Bacillus subtilis and Escherichia coli essential genes and minimal cell factories after one decade of genome engineering. Microbiology. 2014;160:2341–2351. doi: 10.1099/mic.0.079376-0. - DOI - PubMed
    1. Itaya M. An estimation of minimal genome size required for life. FEBS Lett. 1995;362:257–260. doi: 10.1016/0014-5793(95)00233-Y. - DOI - PubMed

LinkOut - more resources

-