Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 17:2021:325-334.
eCollection 2021.

Recurrent Neural Networks to Automatically Identify Rare Disease Epidemiologic Studies from PubMed

Affiliations

Recurrent Neural Networks to Automatically Identify Rare Disease Epidemiologic Studies from PubMed

Jennifer N John et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

Rare diseases affect between 25 and 30 million people in the United States, and understanding their epidemiology is critical to focusing research efforts. However, little is known about the prevalence of many rare diseases. Given a lack of automated tools, current methods to identify and collect epidemiological data are managed through manual curation. To accelerate this process systematically, we developed a novel predictive model to programmatically identify epidemiologic studies on rare diseases from PubMed. A long short-term memory recurrent neural network was developed to predict whether a PubMed abstract represents an epidemiologic study. Our model performed well on our validation set (precision = 0.846, recall = 0.937, AUC = 0.967), and obtained satisfying results on the test set. This model thus shows promise to accelerate the pace of epidemiologic data curation in rare diseases and could be extended for use in other types of studies and in other disease domains.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Text normalization example (the abstract is from)
Figure 2:
Figure 2:
The RNN model architecture. "Params" indicates the number of trainable parameters in the layer, and "units" indicates the number of basic computational nodes.
Figure 3:
Figure 3:
Stepwise results for the preparation of the positive dataset.
Figure 4:
Figure 4:
Stepwise results for the preparation of the negative dataset.
Figure 5:
Figure 5:
ROC curve for the holdout validation set.
Figure 6:
Figure 6:
Predictive results generated for the case study of Tay-Sachs disease.

Similar articles

Cited by

References

    1. Rare Diseases Act of 2002 Congress 107th Sess. 2002.
    1. Dawkins HJ, Draghia-Akli R, Lasko P, et al. Progress in rare diseases research 2010–2016: an IRDiRC perspective. Clinical and Translational Science. 2018;11(1):11. - PMC - PubMed
    1. Griggs RC, Batshaw M, Dunkle M, et al. Clinical research for rare disease: opportunities, challenges, and solutions. Molecular Genetics and Metabolism. 2009;96(1):20–6. - PMC - PubMed
    1. Hassell KL. Population estimates of sickle cell disease in the US. American Journal of Preventive Medicine. 2010;38(4):S512–S21. - PubMed
    1. Jansen Type Metaphyseal Chondrodysplasia: NORD - National Organization for Rare Disorders 2018. [Available from: https://rarediseases.org/rare-diseases/jansen-type-metaphyseal-chondrody...

Publication types

LinkOut - more resources

-