Drug-induced liver injury (DILI) is a frequent cause for the termination of drug development programs and a leading reason of drug withdrawal from the marketplace. Unfortunately, the current preclinical testing strategies, including the regulatory-required animal toxicity studies or simple in vitro tests, are insufficiently powered to predict DILI in patients reliably. Notably, the limited predictive power of such testing strategies is mostly attributed to the complex nature of DILI, a poor understanding of its mechanism, a scarcity of human hepatotoxicity data and inadequate bioinformatics capabilities. With the advent of high-content screening assays, toxicogenomics and bioinformatics, multiple end points can be studied simultaneously to improve prediction of clinically relevant DILIs. This review focuses on the current state of efforts in developing predictive models from diverse data sources for potential use in detecting human hepatotoxicity, and also aims to provide perspectives on how to further improve DILI prediction.
BackgroundThe Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive.MethodIn this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs.ResultsThe results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics.ConclusionsThe successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.
Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.
BackgroundDrug repositioning offers an opportunity to revitalize the slowing drug discovery pipeline by finding new uses for currently existing drugs. Our hypothesis is that drugs sharing similar side effect profiles are likely to be effective for the same disease, and thus repositioning opportunities can be identified by finding drug pairs with similar side effects documented in U.S. Food and Drug Administration (FDA) approved drug labels. The safety information in the drug labels is usually obtained in the clinical trial and augmented with the observations in the post-market use of the drug. Therefore, our drug repositioning approach can take the advantage of more comprehensive safety information comparing with conventional de novo approach.MethodA probabilistic topic model was constructed based on the terms in the Medical Dictionary for Regulatory Activities (MedDRA) that appeared in the Boxed Warning, Warnings and Precautions, and Adverse Reactions sections of the labels of 870 drugs. Fifty-two unique topics, each containing a set of terms, were identified by using topic modeling. The resulting probabilistic topic associations were used to measure the distance (similarity) between drugs. The success of the proposed model was evaluated by comparing a drug and its nearest neighbor (i.e., a drug pair) for common indications found in the Indications and Usage Section of the drug labels.ResultsGiven a drug with more than three indications, the model yielded a 75% recall, meaning 75% of drug pairs shared one or more common indications. This is significantly higher than the 22% recall rate achieved by random selection. Additionally, the recall rate grows rapidly as the number of drug indications increases and reaches 84% for drugs with 11 indications. The analysis also demonstrated that 65 drugs with a Boxed Warning, which indicates significant risk of serious and possibly life-threatening adverse effects, might be replaced with safer alternatives that do not have a Boxed Warning. In addition, we identified two therapeutic groups of drugs (Musculo-skeletal system and Anti-infective for systemic use) where over 80% of the drugs have a potential replacement with high significance.ConclusionTopic modeling can be a powerful tool for the identification of repositioning opportunities by examining the adverse event terms in FDA approved drug labels. The proposed framework not only suggests drugs that can be repurposed, but also provides insight into the safety of repositioned drugs.
Background Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. Results In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5–100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. Conclusion These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. Several studies have observed this phenomenon by conducting surveys on human subjects. These studies have concluded that new ties are formed between similar individuals. This phenomenon has been used to explain several socio-psychological concepts such as segregation, community development, social mobility, etc. However, due to the nature of these studies and limitations because of involvement of human subjects, conclusions from these studies are not easily extensible in online social media. Social media, which is becoming the infinite space for interactions, has exceeded all the expectations in terms of growth, for reasons beyond human comprehension. New ties are formed in social media in the same way that they emerge in the real world. However, given the differences between real-world and online social media, do the same factors that govern the construction of new ties in the real world also govern the construction of new ties in social media? In other words, does homophily exist in social media? In this chapter, the authors study this highly significant question and propose a systematic approach by studying two online social media sites, BlogCatalog and Last.fm, and report our findings along with some interesting observations.
The lack of sensitive and specific biomarkers for the early detection of mild cognitive impairment (MCI) and Alzheimer’s disease (AD) is a major hurdle to improving patient management. A targeted, quantitative metabolomics approach using both 1H NMR and mass spectrometry was employed to investigate the performance of urine metabolites as potential biomarkers for MCI and AD. Correlation-based feature selection (CFS) and least absolute shrinkage and selection operator (LASSO) methods were used to develop biomarker panels tested using support vector machine (SVM) and logistic regression models for diagnosis of each disease state. Metabolic changes were investigated to identify which biochemical pathways were perturbed as a direct result of MCI and AD in urine. Using SVM, we developed a model with 94% sensitivity, 78% specificity, and 78% AUC to distinguish healthy controls from AD sufferers. Using logistic regression, we developed a model with 85% sensitivity, 86% specificity, and an AUC of 82% for AD diagnosis as compared to cognitively healthy controls. Further, we identified 11 urinary metabolites that were significantly altered to include glucose, guanidinoacetate, urocanate, hippuric acid, cytosine, 2- and 3-hydroxyisovalerate, 2-ketoisovalerate, tryptophan, trimethylamine N oxide, and malonate in AD patients, which are also capable of diagnosing MCI, with a sensitivity value of 76%, specificity of 75%, and accuracy of 81% as compared to healthy controls. This pilot study suggests that urine metabolomics may be useful for developing a test capable of diagnosing and distinguishing MCI and AD from cognitively healthy controls.
Objective To interrogate the pathogenesis of intrauterine growth restriction (IUGR) and apply Artificial Intelligence (AI) techniques to multi-platform i.e. nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) based metabolomic analysis for the prediction of IUGR. Materials and methods MS and NMR based metabolomic analysis were performed on cord blood serum from 40 IUGR (birth weight < 10 th percentile) cases and 40 controls. Three variable selection algorithms namely: Correlation-based feature selection (CFS), Partial least squares regression (PLS) and Learning Vector Quantization (LVQ) were tested for their diagnostic performance. For each selected set of metabolites and the panel consists of metabolites common in three selection algorithms so-called overlapping set (OL), support vector machine (SVM) models were developed for which parameter selection was performed busing 10-fold cross validations. Area under the receiver operating characteristics curve (AUC), sensitivity and specificity values were calculated for IUGR diagnosis. Metabolite set enrichment analysis (MSEA) was performed to identify which metabolic pathways were perturbed as a direct result of IUGR in cord blood serum. Results All selected metabolites and their overlapping set achieved statistically significant accuracies in the range of 0.78–0.82 for their optimized SVM models. The model utilizing all metabolites in the dataset had an AUC = 0.91 with a sensitivity of 0.83 and specificity equal to 0.80. CFS and OL (Creatinine, C2, C4, lysoPC.a.C16.1, lysoPC.a.C20.3, lysoPC.a.C28.1, PC.aa.C24.0) showed the highest performance with sensitivity (0.87) and specificity (0.87), respectively. MSEA revealed significantly altered metabolic pathways in IUGR cases. Dysregulated pathways include: beta oxidation of very long fatty acids, oxidation of branched chain fatty acids, phospholipid biosynthesis, lysine degradation, urea cycle and fatty acid metabolism. Conclusion A systematically selected panel of metabolites was shown to accurately detect IUGR in newborn cord blood serum. Significant disturbance of hepatic function and energy generating pathways were found in IUGR cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.