Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide
- PMID: 29314757
- PMCID: PMC6030513
- DOI: 10.1002/jrsm.1287
Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide
Abstract
Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.
© 2018 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g001.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g002.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g003.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g004.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g005.gif)
![Figure 6](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g006.gif)
![Figure 7](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g007.gif)
![Figure 8](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g008.gif)
![Figure 9](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/6492157/bin/JRSM-9-602-g009.gif)
Similar articles
-
Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews.J Clin Epidemiol. 2021 May;133:140-151. doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7. J Clin Epidemiol. 2021. PMID: 33171275 Free PMC article.
-
Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.J Am Med Inform Assoc. 2017 Nov 1;24(6):1165-1168. doi: 10.1093/jamia/ocx053. J Am Med Inform Assoc. 2017. PMID: 28541493 Free PMC article.
-
Machine learning algorithms to identify cluster randomized trials from MEDLINE and EMBASE.Syst Rev. 2022 Oct 25;11(1):229. doi: 10.1186/s13643-022-02082-4. Syst Rev. 2022. PMID: 36284336 Free PMC article.
-
Identifying additional studies for a systematic review of retention strategies in randomised controlled trials: making contact with trials units and trial methodologists.Syst Rev. 2017 Aug 22;6(1):167. doi: 10.1186/s13643-017-0549-9. Syst Rev. 2017. PMID: 28830570 Free PMC article. Review.
-
Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies.J Clin Epidemiol. 2006 Mar;59(3):234-40. doi: 10.1016/j.jclinepi.2005.07.014. J Clin Epidemiol. 2006. PMID: 16488353 Review.
Cited by
-
Noise or sound management in the neonatal intensive care unit for preterm or very low birth weight infants.Cochrane Database Syst Rev. 2024 May 30;5(5):CD010333. doi: 10.1002/14651858.CD010333.pub4. Cochrane Database Syst Rev. 2024. PMID: 38813836 Review.
-
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18. Int ACM SIGIR Conf Res Dev Inf Retr. 2023. PMID: 38690157 Free PMC article.
-
Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach.PeerJ Comput Sci. 2024 Mar 20;10:e1940. doi: 10.7717/peerj-cs.1940. eCollection 2024. PeerJ Comput Sci. 2024. PMID: 38660183 Free PMC article.
-
Therapeutic Vaccines for Follicular Lymphoma: A Systematic Review.Pharmaceuticals (Basel). 2024 Feb 21;17(3):272. doi: 10.3390/ph17030272. Pharmaceuticals (Basel). 2024. PMID: 38543058 Free PMC article. Review.
-
Mobile health (m-health) smartphone interventions for adolescents and adults with overweight or obesity.Cochrane Database Syst Rev. 2024 Feb 20;2(2):CD013591. doi: 10.1002/14651858.CD013591.pub2. Cochrane Database Syst Rev. 2024. PMID: 38375882 Review.
References
-
- Chalmers I, Enkin M, Keirse MJNC. Preparing and updating systematic reviews of randomized controlled trials of health care. Milbank Q. 1993;71(3):411‐437. - PubMed
-
- Lefebvre C, Manheimer E, Glanville J. Searching for studies In: Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. 5.1.0 ed. The Cochrane Collaboration; 2011.
-
- Joachims T. Text categorization with support vector machines: learning with many relevant features In: Nédellec C, Rouveirol C, eds. Machine Learning: ECML‐98. Vol.1398 Lecture Notes in Computer Science Berlin, Heidelberg: Springer Berlin Heidelberg; 1998.
-
- McCallum, Andrew , Nigam Kamal, and Others . 1998. “A comparison of event models for naive Bayes text classification.” In AAAI‐98 Workshop on Learning for Text Categorization, 752:41–48 Citeseer
-
- Goldberg, Yoav . 2015. “A primer on neural network models for natural language processing.” arXiv [cs.CL] . arXiv.http://arxiv.org/abs/1510.00726.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources