Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

doi:10.1016/j.jclinepi.2020.11.003

. 2021 May:133:140-151.

doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

James Thomas¹, Steve McDonald², Anna Noel-Storr³, Ian Shemilt⁴, Julian Elliott⁵, Chris Mavergames⁶, Iain J Marshall⁷

Affiliations

¹ EPPI-Centre, UCL Social Research Institute, University College London, London, UK. Electronic address: james.thomas@ucl.ac.uk.
² Cochrane Australia, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
³ Radcliffe Department of Medicine, University of Oxford, Oxford, UK; Cochrane, London, UK.
⁴ EPPI-Centre, UCL Social Research Institute, University College London, London, UK.
⁵ Department of Infectious Diseases, Monash University and Alfred Hospital, Melbourne, Australia.
⁶ Cochrane, London, UK.
⁷ School of Population Health & Environmental Sciences, Kings College London, London, UK.

PMID: 33171275
PMCID: PMC8168828
DOI: 10.1016/j.jclinepi.2020.11.003

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

James Thomas et al. J Clin Epidemiol. 2021 May.

. 2021 May:133:140-151.

doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.

Authors

James Thomas¹, Steve McDonald², Anna Noel-Storr³, Ian Shemilt⁴, Julian Elliott⁵, Chris Mavergames⁶, Iain J Marshall⁷

Affiliations

¹ EPPI-Centre, UCL Social Research Institute, University College London, London, UK. Electronic address: james.thomas@ucl.ac.uk.
² Cochrane Australia, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia.
³ Radcliffe Department of Medicine, University of Oxford, Oxford, UK; Cochrane, London, UK.
⁴ EPPI-Centre, UCL Social Research Institute, University College London, London, UK.
⁵ Department of Infectious Diseases, Monash University and Alfred Hospital, Melbourne, Australia.
⁶ Cochrane, London, UK.
⁷ School of Population Health & Environmental Sciences, Kings College London, London, UK.

PMID: 33171275
PMCID: PMC8168828
DOI: 10.1016/j.jclinepi.2020.11.003

Abstract

Objectives: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews.

Methods: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the "Cochrane RCT Classifier"), with the algorithm trained using a data set of title-abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification.

Results: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98-0.99) and precision of 0.08 (95% confidence interval 0.06-0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published.

Conclusions: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production.

Keywords: Automation; Cochrane Library; Crowdsourcing; Information retrieval; Machine learning; Methods/methodology; Randomized controlled trials; Searching; Study classifiers; Systematic reviews.

PubMed Disclaimer

Figures

**Fig. 1**
The Cochrane Evidence Pipeline workflow, depicting the flow of records from the centralized search service, through machine and crowd classification services to the CENTRAL database.

**Fig. 2**
Development and evaluation of the classifier, showing where the various data sets were used in the classifier development process.

**Fig. 3**
Calibration plot showing bootstrap estimates of predicted vs. observed probabilities of an article being an RCT in Clinical Hedges dataset (each blue point represents an estimate of a model generated from one bootstrap sample) and the performance of the final model (orange). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

**Fig. 4**
Distribution of classification scores for RCTs and non-RCTs in Clinical Hedges data set. RCT, randomized controlled trials.

**Fig. 5**
RCTs “lost” by the classifier per 1,000 published, by year of publication, showing that the risk of “losing” a publication decreases over time.

See this image and copyright information in PMC

Cited by

Noise or sound management in the neonatal intensive care unit for preterm or very low birth weight infants.
Sibrecht G, Wróblewska-Seniuk K, Bruschettini M. Sibrecht G, et al. Cochrane Database Syst Rev. 2024 May 30;5(5):CD010333. doi: 10.1002/14651858.CD010333.pub4. Cochrane Database Syst Rev. 2024. PMID: 38813836 Review.
Nailing precision: a systematic review and meta-analysis of randomized controlled trials comparing piriformis and trochanteric entry points for femoral antegrade nailing.
Acevedo D, Suarez A, Checkley T, Fakhoury I, Reyes M, Constantinescu D, Hernandez GM. Acevedo D, et al. Arch Orthop Trauma Surg. 2024 Jun;144(6):2527-2538. doi: 10.1007/s00402-024-05359-6. Epub 2024 May 14. Arch Orthop Trauma Surg. 2024. PMID: 38744693
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.
Kartchner D, Al-Hussaini I, Turner H, Deng J, Lohiya S, Bathala P, Mitchell C. Kartchner D, et al. Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18. Int ACM SIGIR Conf Res Dev Inf Retr. 2023. PMID: 38690157 Free PMC article.
Value of preclinical systematic reviews and meta-analyses in pediatric research.
Romantsik O, Bank M, Menon JML, Malhotra A, Bruschettini M. Romantsik O, et al. Pediatr Res. 2024 Apr 13. doi: 10.1038/s41390-024-03197-1. Online ahead of print. Pediatr Res. 2024. PMID: 38615075 Review.
Syndesmotic screws, unscrew them, or leave them? A systematic review and meta-analysis of randomized controlled trials.
Acevedo D, Suarez A, Kaur K, Checkley T, Jimenez P, MacMahon A, Vulcano E, Aiyer AA. Acevedo D, et al. J Orthop. 2024 Mar 22;54:136-142. doi: 10.1016/j.jor.2024.03.012. eCollection 2024 Aug. J Orthop. 2024. PMID: 38567192 Review.

See all "Cited by" articles

References

1. Cochrane Cochrane Library. 2019. https://www.cochranelibrary.com/ Available at.
1. Lefebvre C., Glanville J., Briscoe S., Littlewood A., Marshall C., Metzendorf M. Chapter 4: searching for and selecting studies. In: Higgins J., Thomas J., Chandler J., Cumpston M., Li T., Page M., editors. Cochrane Handbook for Systematic Reviews of Interventions. 2nd ed. John Wiley & Sons; Chichester, UK: 2019. pp. 67–99.
1. Bastian H., Glasziou P., Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. - PMC - PubMed
1. Shojania K.G., Sampson M., Ansari M.T., Ji J., Doucette S. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147:224–233. - PubMed
1. Macleod M.R., Michie S., Roberts I., Dirnagl U., Chalmers I., Ioannidis J.P.A. Biomedical research: increasing value, reducing waste. Lancet. 2014;383:101–104. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Cochrane Cochrane Library. 2019. https://www.cochranelibrary.com/ Available at.

[2] Cochrane Cochrane Library. 2019. https://www.cochranelibrary.com/ Available at.

[3] Lefebvre C., Glanville J., Briscoe S., Littlewood A., Marshall C., Metzendorf M. Chapter 4: searching for and selecting studies. In: Higgins J., Thomas J., Chandler J., Cumpston M., Li T., Page M., editors. Cochrane Handbook for Systematic Reviews of Interventions. 2nd ed. John Wiley & Sons; Chichester, UK: 2019. pp. 67–99.

[4] Lefebvre C., Glanville J., Briscoe S., Littlewood A., Marshall C., Metzendorf M. Chapter 4: searching for and selecting studies. In: Higgins J., Thomas J., Chandler J., Cumpston M., Li T., Page M., editors. Cochrane Handbook for Systematic Reviews of Interventions. 2nd ed. John Wiley & Sons; Chichester, UK: 2019. pp. 67–99.

[5] Bastian H., Glasziou P., Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. - PMC - PubMed

[6] Bastian H., Glasziou P., Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. - PMC - PubMed

[7] Shojania K.G., Sampson M., Ansari M.T., Ji J., Doucette S. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147:224–233. - PubMed

[8] Shojania K.G., Sampson M., Ansari M.T., Ji J., Doucette S. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147:224–233. - PubMed

[9] Macleod M.R., Michie S., Roberts I., Dirnagl U., Chalmers I., Ioannidis J.P.A. Biomedical research: increasing value, reducing waste. Lancet. 2014;383:101–104. - PubMed

[10] Macleod M.R., Michie S., Roberts I., Dirnagl U., Chalmers I., Ioannidis J.P.A. Biomedical research: increasing value, reducing waste. Lancet. 2014;383:101–104. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

Affiliations

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources