EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction
- PMID: 35676633
- PMCID: PMC9178860
- DOI: 10.1186/s12859-022-04756-1
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction
Abstract
Background: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons.
Results: This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on m1A and m6A reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php ( http://47.104.130.81/EMDLP/index.php ).
Conclusions: We developed a predictor for m1A and m6A methylation sites.
Keywords: Deep learning; Natural language processing; Predictor; RNA modification site.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
![Fig. 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig1_HTML.gif)
![Fig. 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig2_HTML.gif)
![Fig. 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig3_HTML.gif)
![Fig. 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig4_HTML.gif)
![Fig. 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig5_HTML.gif)
![Fig. 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig6_HTML.gif)
![Fig. 7](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig7_HTML.gif)
![Fig. 8](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/9178860/bin/12859_2022_4756_Fig8_HTML.gif)
Similar articles
-
MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction.BMC Bioinformatics. 2024 Jan 17;25(1):32. doi: 10.1186/s12859-024-05649-1. BMC Bioinformatics. 2024. PMID: 38233745 Free PMC article.
-
Mini-review: Recent advances in post-translational modification site prediction based on deep learning.Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35860402 Free PMC article. Review.
-
EDLm6APred: ensemble deep learning approach for mRNA m6A site prediction.BMC Bioinformatics. 2021 May 29;22(1):288. doi: 10.1186/s12859-021-04206-4. BMC Bioinformatics. 2021. PMID: 34051729 Free PMC article.
-
PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles.BMC Bioinformatics. 2021 May 12;22(Suppl 3):242. doi: 10.1186/s12859-020-03870-2. BMC Bioinformatics. 2021. PMID: 33980138 Free PMC article.
-
Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112. Brief Bioinform. 2020. PMID: 31714956 Review.
Cited by
-
PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites.Biology (Basel). 2024 Apr 8;13(4):248. doi: 10.3390/biology13040248. Biology (Basel). 2024. PMID: 38666860 Free PMC article.
-
Role of Post-Transcriptional Regulation in Learning and Memory in Mammals.Genes (Basel). 2024 Mar 5;15(3):337. doi: 10.3390/genes15030337. Genes (Basel). 2024. PMID: 38540396 Free PMC article. Review.
-
Time series-based hybrid ensemble learning model with multivariate multidimensional feature coding for DNA methylation prediction.BMC Genomics. 2023 Dec 11;24(1):758. doi: 10.1186/s12864-023-09866-5. BMC Genomics. 2023. PMID: 38082253 Free PMC article.
-
EnsembleDL-ATG: Identifying autophagy proteins by integrating their sequence and evolutionary information using an ensemble deep learning framework.Comput Struct Biotechnol J. 2023 Sep 29;21:4836-4848. doi: 10.1016/j.csbj.2023.09.036. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 37854634 Free PMC article.
-
DeepMethylation: a deep learning based framework with GloVe and Transformer encoder for DNA methylation prediction.PeerJ. 2023 Sep 25;11:e16125. doi: 10.7717/peerj.16125. eCollection 2023. PeerJ. 2023. PMID: 37780374 Free PMC article.
References
-
- Song ZT, Huang DY, Song BW, Chen KQ, Song YY, Liu G, Su JL, de Magalhaes JP, Rigden DJ, Meng J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nat Commun. 2021;12(1):1–11. doi: 10.1038/s41467-020-20314-w. - DOI - PMC - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources