Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review
- PMID: 37819910
- PMCID: PMC10566734
- DOI: 10.1371/journal.pdig.0000347
Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review
Abstract
Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.
Copyright: © 2023 Sedlakova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Similar articles
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881. Med J Aust. 2020. PMID: 33314144
-
The future of Cochrane Neonatal.Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
-
Tuberculosis.In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. PMID: 30212088 Free Books & Documents. Review.
-
Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics [Internet].Washington (DC): Department of Veterans Affairs (US); 2014 May. Washington (DC): Department of Veterans Affairs (US); 2014 May. PMID: 27606391 Free Books & Documents. Review.
Cited by
-
BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices.Front Public Health. 2024 Apr 23;12:1392180. doi: 10.3389/fpubh.2024.1392180. eCollection 2024. Front Public Health. 2024. PMID: 38716250 Free PMC article.
References
-
- Unstructured Data—an overview | ScienceDirect Topics. [cited 21 Aug 2023]. https://www.sciencedirect.com/topics/computer-science/unstructured-data
-
- Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, et al.. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. European Heart Journal. 2018;39: 1481–1495. doi: 10.1093/eurheartj/ehx487 - DOI - PMC - PubMed
-
- Stephenson D, Alexander R, Aggarwal V, Badawy R, Bain L, Bhatnagar R, et al.. Precompetitive Consensus Building to Facilitate the Use of Digital Health Technologies to Support Parkinson Disease Drug Development through Regulatory Science. Digital biomarkers. 2020;4: 28–49. doi: 10.1159/000512500 - DOI - PMC - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous