Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

doi:10.1002/jrsm.1287

. 2018 Dec;9(4):602-614.

doi: 10.1002/jrsm.1287. Epub 2018 Feb 7.

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

Iain J Marshall¹, Anna Noel-Storr², Joël Kuiper³, James Thomas⁴, Byron C Wallace⁵

Affiliations

¹ King's College London, London, UK.
² University of Oxford, Oxford, UK.
³ Doctor Evidence, Santa Monica, CA, USA.
⁴ UCL, London, UK.
⁵ Northeastern University, Boston, MA, USA.

PMID: 29314757
PMCID: PMC6030513
DOI: 10.1002/jrsm.1287

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

Iain J Marshall et al. Res Synth Methods. 2018 Dec.

. 2018 Dec;9(4):602-614.

doi: 10.1002/jrsm.1287. Epub 2018 Feb 7.

Authors

Iain J Marshall¹, Anna Noel-Storr², Joël Kuiper³, James Thomas⁴, Byron C Wallace⁵

Affiliations

¹ King's College London, London, UK.
² University of Oxford, Oxford, UK.
³ Doctor Evidence, Santa Monica, CA, USA.
⁴ UCL, London, UK.
⁵ Northeastern University, Boston, MA, USA.

PMID: 29314757
PMCID: PMC6030513
DOI: 10.1002/jrsm.1287

Abstract

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.

PubMed Disclaimer

Figures

**Figure 1**
Tree diagram: The false positive burden associated with using a high‐sensitivity search compounded by RCTs being a minority class. Illustrative figures, assuming that 1.6% of all articles are RCTs (based on PubMed search; approximately 423 000 in total), and a search filter with 98.4% sensitivity and 77.9% specificity (the performance of the Cochrane HSSS based on data from McKibbon et al9). The 2 blue shaded boxes together represent the search retrieval. The search filter thus retrieves a total of 6 201 349 articles, of which only 416 232 (or 6.7%) are actually RCTs (being the *precision* statistic) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 2**
Receiver operating characteristic scatterplot for conventional database filters (based on data published by McKibbon et al,9 with the 2 comparator strategies from this analysis labeled. RCT PT tag, the single‐term strategy based on the manually applied PT tag (the high‐precision comparator); Cochrane HSSS, the Cochrane Highly Sensitive Search Strategy (the high‐sensitivity comparator) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 3**
The Cochrane Crowd/EMBASE project pipeline. Source articles (titles and abstracts) are identified via a sensitive database search filter. Articles already tagged as being RCTs (via Emtree PT tag) are sent directly to CENTRAL. Articles predicted to have <10% probability of being RCTs via an SVM classifier are directly excluded. The remaining articles are considered by the crowd

**Figure 4**
Schematic illustrating separating plane in support vector machines, here depicted in 2 dimensions. Here, the separating plane (a straight line in this two‐dimensional case) is depicted as the black line and the *margin* are depicted in gray. The instances nearest to the margin (*support vectors*) are highlighted in white

**Figure 5**
Schematic illustrating convolutional neural network architecture for text classification. Here, y _i is the label (RCT or not) for document i, w is a weight vector associated with the classification layer, and x _i is the vector representation of document i induced by the CNN

**Figure 6**
Receiver operating characteristics of the machine learning algorithms trained on plain text alone, (1) support vector machine, (2) convolutional neural network both single model, and *bagged* result of 10 models (each trained on all RCTs and a different random sample of non‐RCTs). The points depict the 3 conventional database filters, which use plain text only and do not require use of MeSH/PT tags. The blue shaded area in the left part of the figure is enlarged on the right‐side bottom section

**Figure 7**
Left: Receiver operating characteristics curve (zoomed to accentuate variance); effects of balanced sampling: The individual models are depicted in light blue; the magenta curve depicts the performance of the consensus classification (the mean probability score of being an RCT from the component models). Right: Cumulative performance (area under receiver operating characteristics curve) of *bagging* multiple models trained on balanced samples. Performance increases until approximately 6 models are included, then is static afterwards

**Figure 8**
Receiver operating characteristics curve: Hybrid/ensembled models including use of the manually applied PT tag. The area bounded by the blue shaded area on the left‐hand plot is enlarged on the right to illustrate differences between models and conventional database filters. Note that the RCT PT tag has become more sensitive from 20099 to 2017 (the reanalysis conducted here), reflecting the late application of the tag to missed RCTs including through data provided to PubMed by the Cochrane Collaboration11

**Figure 9**
PubMed PT information is used where present; where not, the best‐performing text‐alone approach is automatically used, with a modest reduction in accuracy

See this image and copyright information in PMC

Cited by

Noise or sound management in the neonatal intensive care unit for preterm or very low birth weight infants.
Sibrecht G, Wróblewska-Seniuk K, Bruschettini M. Sibrecht G, et al. Cochrane Database Syst Rev. 2024 May 30;5(5):CD010333. doi: 10.1002/14651858.CD010333.pub4. Cochrane Database Syst Rev. 2024. PMID: 38813836 Review.
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.
Kartchner D, Al-Hussaini I, Turner H, Deng J, Lohiya S, Bathala P, Mitchell C. Kartchner D, et al. Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18. Int ACM SIGIR Conf Res Dev Inf Retr. 2023. PMID: 38690157 Free PMC article.
Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach.
Jacaruso L. Jacaruso L. PeerJ Comput Sci. 2024 Mar 20;10:e1940. doi: 10.7717/peerj-cs.1940. eCollection 2024. PeerJ Comput Sci. 2024. PMID: 38660183 Free PMC article.
Therapeutic Vaccines for Follicular Lymphoma: A Systematic Review.
Suponin A, Zhelnov P, Potanin A, Chekalov A, Lomazov A, Vladimirova K, Lepik K, Muslimov A. Suponin A, et al. Pharmaceuticals (Basel). 2024 Feb 21;17(3):272. doi: 10.3390/ph17030272. Pharmaceuticals (Basel). 2024. PMID: 38543058 Free PMC article. Review.
Mobile health (m-health) smartphone interventions for adolescents and adults with overweight or obesity.
Metzendorf MI, Wieland LS, Richter B. Metzendorf MI, et al. Cochrane Database Syst Rev. 2024 Feb 20;2(2):CD013591. doi: 10.1002/14651858.CD013591.pub2. Cochrane Database Syst Rev. 2024. PMID: 38375882 Review.

See all "Cited by" articles

References

1. Chalmers I, Enkin M, Keirse MJNC. Preparing and updating systematic reviews of randomized controlled trials of health care. Milbank Q. 1993;71(3):411‐437. - PubMed
1. Lefebvre C, Manheimer E, Glanville J. Searching for studies In: Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. 5.1.0 ed. The Cochrane Collaboration; 2011.
1. Joachims T. Text categorization with support vector machines: learning with many relevant features In: Nédellec C, Rouveirol C, eds. Machine Learning: ECML‐98. Vol.1398 Lecture Notes in Computer Science Berlin, Heidelberg: Springer Berlin Heidelberg; 1998.
1. McCallum, Andrew , Nigam Kamal, and Others . 1998. “A comparison of event models for naive Bayes text classification.” In AAAI‐98 Workshop on Learning for Text Categorization, 752:41–48 Citeseer
1. Goldberg, Yoav . 2015. “A primer on neural network models for natural language processing.” arXiv [cs.CL] . arXiv.http://arxiv.org/abs/1510.00726 .

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Chalmers I, Enkin M, Keirse MJNC. Preparing and updating systematic reviews of randomized controlled trials of health care. Milbank Q. 1993;71(3):411‐437. - PubMed

[2] Chalmers I, Enkin M, Keirse MJNC. Preparing and updating systematic reviews of randomized controlled trials of health care. Milbank Q. 1993;71(3):411‐437. - PubMed

[3] Lefebvre C, Manheimer E, Glanville J. Searching for studies In: Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. 5.1.0 ed. The Cochrane Collaboration; 2011.

[4] Lefebvre C, Manheimer E, Glanville J. Searching for studies In: Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. 5.1.0 ed. The Cochrane Collaboration; 2011.

[5] Joachims T. Text categorization with support vector machines: learning with many relevant features In: Nédellec C, Rouveirol C, eds. Machine Learning: ECML‐98. Vol.1398 Lecture Notes in Computer Science Berlin, Heidelberg: Springer Berlin Heidelberg; 1998.

[6] Joachims T. Text categorization with support vector machines: learning with many relevant features In: Nédellec C, Rouveirol C, eds. Machine Learning: ECML‐98. Vol.1398 Lecture Notes in Computer Science Berlin, Heidelberg: Springer Berlin Heidelberg; 1998.

[7] McCallum, Andrew , Nigam Kamal, and Others . 1998. “A comparison of event models for naive Bayes text classification.” In AAAI‐98 Workshop on Learning for Text Categorization, 752:41–48 Citeseer

[8] McCallum, Andrew , Nigam Kamal, and Others . 1998. “A comparison of event models for naive Bayes text classification.” In AAAI‐98 Workshop on Learning for Text Categorization, 752:41–48 Citeseer

[9] Goldberg, Yoav . 2015. “A primer on neural network models for natural language processing.” arXiv [cs.CL] . arXiv.http://arxiv.org/abs/1510.00726 .

[10] Goldberg, Yoav . 2015. “A primer on neural network models for natural language processing.” arXiv [cs.CL] . arXiv.http://arxiv.org/abs/1510.00726 .

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

Affiliations

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources