Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 5;110(10):1661-1672.
doi: 10.1016/j.ajhg.2023.08.018. Epub 2023 Sep 22.

Literature-based predictions of Mendelian disease therapies

Affiliations

Literature-based predictions of Mendelian disease therapies

Cole A Deisseroth et al. Am J Hum Genet. .

Abstract

In the effort to treat Mendelian disorders, correcting the underlying molecular imbalance may be more effective than symptomatic treatment. Identifying treatments that might accomplish this goal requires extensive and up-to-date knowledge of molecular pathways-including drug-gene and gene-gene relationships. To address this challenge, we present "parsing modifiers via article annotations" (PARMESAN), a computational tool that searches PubMed and PubMed Central for information to assemble these relationships into a central knowledge base. PARMESAN then predicts putatively novel drug-gene relationships, assigning an evidence-based score to each prediction. We compare PARMESAN's drug-gene predictions to all of the drug-gene relationships displayed by the Drug-Gene Interaction Database (DGIdb) and show that higher-scoring relationship predictions are more likely to match the directionality (up- versus down-regulation) indicated by this database. PARMESAN had more than 200,000 drug predictions scoring above 8 (as one example cutoff), for more than 3,700 genes. Among these predicted relationships, 210 were registered in DGIdb and 201 (96%) had matching directionality. This publicly available tool provides an automated way to prioritize drug screens to target the most-promising drugs to test, thereby saving time and resources in the development of therapeutics for genetic disorders.

Keywords: NLP; drug repurposing; drug screening; literature search; natural language processing; pathway analysis; text mining.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests H.Y.Z. collaborates with UCB Pharma to modify levels of MAPT and ATXN1. R.S.D. is a paid consultant of AstraZeneca.

Figures

Figure 1
Figure 1
Summary sentence construction PARMESAN constructs summary sentences from sentences in the format “action modifier predicate target,” where the predicate is the effect the modifier has on the target gene (such as increase or decrease), and the action is what is being done to the modifier to achieve this effect (whether the modifier should be increased or decreased to have the mentioned effect on the target). “Action” is the only optional part of the summary sentence. We use an example extracted from Monteiro et al.
Figure 2
Figure 2
Accuracy of extracted modifier relationships (A) We compare PARMESAN’s relationship extractions to the manually curated databases Reactome (for gene-gene relationships) and DGIdb (for drug-gene relationships). “MCDB” means “manually curated database.” At different score thresholds, we plot the consistency with the manually curated database (Y axis) against the total number of extracted relationships above that score threshold (X axis). To measure the consistency, we isolate the relationships that overlap with the manually curated database, and among them, determine the percent of the relationships that had matching directionality (whether drug A positively or negatively regulates gene B). As we lower our score threshold, the consistency declines, suggesting that these scores are indicative of the consistency with the manually curated databases. (B) In four different trials, we randomly collected 100 relationships extracted by PARMESAN and checked the supporting articles to see whether at least one of them truly signified the relationship. “Correct” relationships were confirmed, with correct directionality, in at least one of the supporting articles—for example, PARMESAN states that gene A negatively regulates gene B, and a supporting article indicates that gene A negatively regulates gene B. “Misdirected” relationships were confirmed by at least one supporting article, but never with correct directionality—for example, PARMESAN states that gene A negatively regulates gene B, and the supporting articles indicate that gene A positively regulates gene B. “Incorrect” relationships were not confirmed by any of the supporting articles. For genes and drugs, we randomly select from (1) the full set of extracted relationships (“unfiltered”) and (2) the set of all predictions scoring over a power-analysis-defined threshold—4 for genes and 3 for drugs (“high score”). The error bars represent the 95% confidence intervals for a binomial distribution. In both the manual evaluation and the comparison to the manually curated relationships, PARMESAN’s accuracy improves when relationships are limited to those with a score above the given threshold. This suggests that these scores are a promising measure of confidence in an automatically extracted relationship.
Figure 3
Figure 3
Discovery of pre-2012 predictions over time (A) We test PARMESAN’s long-term predictive capability with a time-capsule test, where PARMESAN makes predictions of modifiers using articles from before a given year, and we then compare those predictions to modifiers reported on after the given year. We use the example of MID1’s indirect effect on Huntingtin, putatively through its effect on MTOR (MIM: 601231).,, (B and C) We used PARMESAN to predict gene-gene (B) and drug-gene (C) relationships using only articles from before 2012, and show the fraction of those predictions that were consistent with relationships extracted before 2012 (black solid line) and before 2022 (black dashed line). The difference between these values (shown in red) represents the change in the fraction of predictions that were consistent with identified relationships over the decade after the predictions were made.
Figure 4
Figure 4
Comparison of PARMESAN’s gene-gene and drug-gene relationship predictions to manually curated relationships We compared all of PARMESAN’s predictions to manually curated databases of drug-gene and gene-gene relationships. The accuracy, or “percent consistent,” has the same definition as it does in Figure 2A. We generated predictions from five knowledge bases: PARMESAN’s extractions from PubMed, PARMESAN’s extractions from PubMed Central, PARMESAN’s combined extractions from PubMed and PubMed Central, SemMedDB’s extractions, and the combined extractions from PARMESAN (PubMed and PubMed Central) and SemMedDB. (A) The drug-gene relationship predictions were compared to the relationships presented in DGIdb. We take the top n predictions for a given number n (X axis) and observe the consistency in directionality with DGIdb. For example, PARMESAN (using PubMed alone) generated 453,892 predictions with scores above 2. Among the 255 predictions that scored above 2 and overlapped with DGIdb, 204 (80%) matched the directionality displayed by DGIdb. Therefore, the orange “PARMESAN (PubMed)” line contains the point at X = 453,892, Y = 0.8. The best predictions came from combining the extractions from PARMESAN and SemMedDB, although in this trial, the difference from using PARMESAN alone was not statistically significant. (B) Gene-gene relationship predictions were compared to the gene-gene relationships presented in Reactome. This panel is formatted in the same way as (A). All prediction sets demonstrated increased accuracy with higher scores. In this setting, the combination of PARMESAN and SemMedDB showed the best predictive ability. Its differences from the other knowledge bases tested were all statistically significant. (C) We compared PARMESAN’s genetic modifier predictions (using extractions from PubMed and PubMed Central combined) for ATXN1 and MAPT to corresponding modifier screens, and the consistent predictions outnumbered the contradicted ones at higher score thresholds.

Similar articles

References

    1. Amir R.E., Van den Veyver I.B., Wan M., Tran C.Q., Francke U., Zoghbi H.Y. Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nat. Genet. 1999;23:185–188. doi: 10.1038/13810. - DOI - PubMed
    1. Orr H.T., Chung M.Y., Banfi S., Kwiatkowski T.J., Servadio A., Beaudet A.L., McCall A.E., Duvick L.A., Ranum L.P., Zoghbi H.Y. Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nat. Genet. 1993;4:221–226. doi: 10.1038/ng0793-221. - DOI - PubMed
    1. Stark C., Breitkreutz B.-J., Reguly T., Boucher L., Breitkreutz A., Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–D539. doi: 10.1093/nar/gkj109. - DOI - PMC - PubMed
    1. Snel B., Lehmann G., Bork P., Huynen M.A. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 2000;28:3442–3444. doi: 10.1093/nar/28.18.3442. - DOI - PMC - PubMed
    1. Huttlin E.L., Ting L., Bruckner R.J., Gebreab F., Gygi M.P., Szpyt J., Tam S., Zarraga G., Colby G., Baltier K., et al. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell. 2015;162:425–440. doi: 10.1016/j.cell.2015.06.043. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-