Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 8;23(1):221.
doi: 10.1186/s12859-022-04756-1.

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Affiliations

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Honglei Wang et al. BMC Bioinformatics. .

Abstract

Background: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons.

Results: This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on m1A and m6A reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php ( http://47.104.130.81/EMDLP/index.php ).

Conclusions: We developed a predictor for m1A and m6A methylation sites.

Keywords: Deep learning; Natural language processing; Predictor; RNA modification site.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Chemical structures of modifications. a m1A modification. b m6A modification
Fig. 2
Fig. 2
Performance of the different models through fivefold cross-validation. The models are CNNRGloVe, DCNNRGloVe, BiLSTMRGloVe, and DCBRGloVe, respectively. "CNNRGloVe" employs the CNN model in Deeppromise; "DCBRGloVe" represents a self-built DCB model, including the DCNN and the BiLSTM stage; "DCNNRGloVe" denotes the DCBRGloVe removing the BiLSTM stage; "BiLSTMRGloVe" represents the DCBRGloVe without the DCNN stage
Fig. 3
Fig. 3
Performance of the DCB model based on One-hot encoding, RNA word embedding, Word2vec, and RGloVe
Fig. 4
Fig. 4
Performance of EMDLP and other methods on the independent test
Fig. 5
Fig. 5
Boxplot of eight metrics for comparative performance assessment of the four methods based on the pAerformance of 100 replications of four methods. a for the m1A independent dataset. b for the m6A independent dataset
Fig. 6
Fig. 6
Screenshot of EMDLP webserver. a Site input interface of EMDLP. b The prediction result returned by EMDLP
Fig. 7
Fig. 7
structure of our computational framework based on RGloVe, DCNN, and BiLSTM neural network to predict m1A methylation site
Fig. 8
Fig. 8
Structure of EMDLP predictor. The diagrams depicted our method's architecture. Three different DL classifiers predicted the methylation sequences and decided the final finding by a soft vote

Similar articles

Cited by

References

    1. Song ZT, Huang DY, Song BW, Chen KQ, Song YY, Liu G, Su JL, de Magalhaes JP, Rigden DJ, Meng J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nat Commun. 2021;12(1):1–11. doi: 10.1038/s41467-020-20314-w. - DOI - PMC - PubMed
    1. Boccaletto P, Machnicka MA, Purta E, Piatkowski P, Baginski B, Wirecki TK, de Crecy-Lagard V, Ross R, Limbach PA, Kotter A, et al. MODOMICS: a database of RNA modification pathways 2017 update. Nucleic Acids Res. 2018;46(D1):303–307. doi: 10.1093/nar/gkx1030. - DOI - PMC - PubMed
    1. Sun WJ, Li JH, Liu S, Wu J, Zhou H, Qu LH, Yang JH. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 2016;44(D1):259–265. doi: 10.1093/nar/gkv1036. - DOI - PMC - PubMed
    1. Xuan JJ, Sun WJ, Lin PH, Zhou KR, Liu S, Zheng LL, Qu LH, Yang JH. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2018;46(D1):327–334. doi: 10.1093/nar/gkx934. - DOI - PMC - PubMed
    1. Dunn DB. The occurence of 1-methyladenine in ribonucleic acid. Biochem Biophys Acta. 1961;46(1):198–200. doi: 10.1016/0006-3002(61)90668-0. - DOI - PubMed

LinkOut - more resources

-