Recurrent Neural Networks to Automatically Identify Rare Disease Epidemiologic Studies from PubMed
- PMID: 34457147
- PMCID: PMC8378621
Recurrent Neural Networks to Automatically Identify Rare Disease Epidemiologic Studies from PubMed
Abstract
Rare diseases affect between 25 and 30 million people in the United States, and understanding their epidemiology is critical to focusing research efforts. However, little is known about the prevalence of many rare diseases. Given a lack of automated tools, current methods to identify and collect epidemiological data are managed through manual curation. To accelerate this process systematically, we developed a novel predictive model to programmatically identify epidemiologic studies on rare diseases from PubMed. A long short-term memory recurrent neural network was developed to predict whether a PubMed abstract represents an epidemiologic study. Our model performed well on our validation set (precision = 0.846, recall = 0.937, AUC = 0.967), and obtained satisfying results on the test set. This model thus shows promise to accelerate the pace of epidemiologic data curation in rare diseases and could be extended for use in other types of studies and in other disease domains.
©2021 AMIA - All rights reserved.
Figures
![Figure 1:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f1.gif)
![Figure 2:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f2.gif)
![Figure 3:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f3.gif)
![Figure 4:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f4.gif)
![Figure 5:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f5.gif)
![Figure 6:](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8378621/bin/3475589f6.gif)
Similar articles
-
Precision information extraction for rare disease epidemiology at scale.J Transl Med. 2023 Feb 28;21(1):157. doi: 10.1186/s12967-023-04011-y. J Transl Med. 2023. PMID: 36855134 Free PMC article.
-
Epidemiologic Research of Rare Cancers: Trends, Resources, and Challenges.Cancer Epidemiol Biomarkers Prev. 2021 Jul;30(7):1305-1311. doi: 10.1158/1055-9965.EPI-20-1796. Epub 2021 Apr 1. Cancer Epidemiol Biomarkers Prev. 2021. PMID: 33795213 Free PMC article. Review.
-
Rare Diseases: Joining Mainstream Research and Treatment Based on Reliable Epidemiological Data.Adv Exp Med Biol. 2017;1031:3-21. doi: 10.1007/978-3-319-67144-4_1. Adv Exp Med Biol. 2017. PMID: 29214563 Review.
-
Construction of biological networks from unstructured information based on a semi-automated curation workflow.Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057. Database (Oxford). 2015. PMID: 26200752 Free PMC article.
-
Epidemiology of Rare Lung Diseases: The Challenges and Opportunities to Improve Research and Knowledge.Adv Exp Med Biol. 2017;1031:419-442. doi: 10.1007/978-3-319-67144-4_24. Adv Exp Med Biol. 2017. PMID: 29214586 Review.
Cited by
-
Precision information extraction for rare disease epidemiology at scale.J Transl Med. 2023 Feb 28;21(1):157. doi: 10.1186/s12967-023-04011-y. J Transl Med. 2023. PMID: 36855134 Free PMC article.
-
Spectrum of Genetic Diseases in Tunisia: Current Situation and Main Milestones Achieved.Genes (Basel). 2021 Nov 19;12(11):1820. doi: 10.3390/genes12111820. Genes (Basel). 2021. PMID: 34828426 Free PMC article.
References
-
- Rare Diseases Act of 2002 Congress 107th Sess. 2002.
-
- Hassell KL. Population estimates of sickle cell disease in the US. American Journal of Preventive Medicine. 2010;38(4):S512–S21. - PubMed
-
- Jansen Type Metaphyseal Chondrodysplasia: NORD - National Organization for Rare Disorders 2018. [Available from: https://rarediseases.org/rare-diseases/jansen-type-metaphyseal-chondrody...
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical