Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 11;2(10):e0000347.
doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.

Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review

Affiliations

Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review

Jana Sedlakova et al. PLOS Digit Health. .

Abstract

Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. PRISMA flowchart.

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.
    Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.
    Osborne SR, Alston LV, Bolton KA, Whelan J, Reeve E, Wong Shee A, Browne J, Walker T, Versace VL, Allender S, Nichols M, Backholer K, Goodwin N, Lewis S, Dalton H, Prael G, Curtin M, Brooks R, Verdon S, Crockett J, Hodgins G, Walsh S, Lyle DM, Thompson SC, Browne LJ, Knight S, Pit SW, Jones M, Gillam MH, Leach MJ, Gonzalez-Chica DA, Muyambi K, Eshetie T, Tran K, May E, Lieschke G, Parker V, Smith A, Hayes C, Dunlop AJ, Rajappa H, White R, Oakley P, Holliday S. Osborne SR, et al. Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881. Med J Aust. 2020. PMID: 33314144
  • The future of Cochrane Neonatal.
    Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Tuberculosis.
    Bloom BR, Atun R, Cohen T, Dye C, Fraser H, Gomez GB, Knight G, Murray M, Nardell E, Rubin E, Salomon J, Vassall A, Volchenkov G, White R, Wilson D, Yadav P. Bloom BR, et al. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. PMID: 30212088 Free Books & Documents. Review.
  • Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics [Internet].
    Peterson K, McCleery E. Peterson K, et al. Washington (DC): Department of Veterans Affairs (US); 2014 May. Washington (DC): Department of Veterans Affairs (US); 2014 May. PMID: 27606391 Free Books & Documents. Review.

Cited by

References

    1. Kong H-J. Managing Unstructured Big Data in Healthcare System. Healthcare informatics research. 2019;25: 1–2. doi: 10.4258/hir.2019.25.1.1 - DOI - PMC - PubMed
    1. Unstructured Data—an overview | ScienceDirect Topics. [cited 21 Aug 2023]. https://www.sciencedirect.com/topics/computer-science/unstructured-data
    1. Badawy R, Hameed F, Bataille L, Little MA, Claes K, Saria S, et al.. Metadata Concepts for Advancing the Use of Digital Health Technologies in Clinical Research. Digital biomarkers. 2019;3: 116–132. doi: 10.1159/000502951 - DOI - PMC - PubMed
    1. Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, et al.. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. European Heart Journal. 2018;39: 1481–1495. doi: 10.1093/eurheartj/ehx487 - DOI - PMC - PubMed
    1. Stephenson D, Alexander R, Aggarwal V, Badawy R, Bain L, Bhatnagar R, et al.. Precompetitive Consensus Building to Facilitate the Use of Digital Health Technologies to Support Parkinson Disease Drug Development through Regulatory Science. Digital biomarkers. 2020;4: 28–49. doi: 10.1159/000512500 - DOI - PMC - PubMed

Grants and funding

This work was supported by a Digital Society Initiative Health Community Grant to VVW (grant number not applicable). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

-