A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection
- PMID: 19390099
- PMCID: PMC2705265
- DOI: 10.1197/jamia.M3095
A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection
Abstract
OBJECTIVE Free-text clinical reports serve as an important part of patient care management and clinical documentation of patient disease and treatment status. Free-text notes are commonplace in medical practice, but remain an under-used source of information for clinical and epidemiological research, as well as personalized medicine. The authors explore the challenges associated with automatically extracting information from clinical reports using their submission to the Integrating Informatics with Biology and the Bedside (i2b2) 2008 Natural Language Processing Obesity Challenge Task. DESIGN A text mining system for classifying patient comorbidity status, based on the information contained in clinical reports. The approach of the authors incorporates a variety of automated techniques, including hot-spot filtering, negated concept identification, zero-vector filtering, weighting by inverse class-frequency, and error-correcting of output codes with linear support vector machines. MEASUREMENTS Performance was evaluated in terms of the macroaveraged F1 measure. RESULTS The automated system performed well against manual expert rule-based systems, finishing fifth in the Challenge's intuitive task, and 13(th) in the textual task. CONCLUSIONS The system demonstrates that effective comorbidity status classification by an automated system is possible.
Figures
![Figure 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2705265/bin/590.S1067502709000723.gr1.gif)
![Figure 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2705265/bin/590.S1067502709000723.gr2.gif)
![Figure 3](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2705265/bin/590.S1067502709000723.gr3.gif)
![Figure 4](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2705265/bin/590.S1067502709000723.gr4.gif)
![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/2705265/bin/590.S1067502709000723.gr5.gif)
Similar articles
-
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30. J Digit Imaging. 2010. PMID: 19484309 Free PMC article. Review.
-
A text mining approach to the prediction of disease status from clinical discharge summaries.J Am Med Inform Assoc. 2009 Jul-Aug;16(4):596-600. doi: 10.1197/jamia.M3096. Epub 2009 Apr 23. J Am Med Inform Assoc. 2009. PMID: 19390098 Free PMC article.
-
Recognizing obesity and comorbidities in sparse data.J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23. J Am Med Inform Assoc. 2009. PMID: 19390096 Free PMC article.
-
Extracting information from textual documents in the electronic health record: a review of recent research.Yearb Med Inform. 2008:128-44. Yearb Med Inform. 2008. PMID: 18660887 Review.
-
Five-way smoking status classification using text hot-spot identification and error-correcting output codes.J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18. J Am Med Inform Assoc. 2008. PMID: 17947623 Free PMC article.
Cited by
-
Comparison of nomogram and machine-learning methods for predicting the survival of non-small cell lung cancer patients.Cancer Innov. 2022 Aug 30;1(2):135-145. doi: 10.1002/cai2.24. eCollection 2022 Aug. Cancer Innov. 2022. PMID: 38090651 Free PMC article.
-
Machine learning in rare disease.Nat Methods. 2023 Jun;20(6):803-814. doi: 10.1038/s41592-023-01886-z. Epub 2023 May 29. Nat Methods. 2023. PMID: 37248386 Review.
-
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14. J Biomed Inform. 2019. PMID: 31622801 Free PMC article.
-
CLASH: Complementary Linkage with Anchoring and Scoring for Heterogeneous biomolecular and clinical data.BMC Med Inform Decis Mak. 2016 Jul 25;16 Suppl 3(Suppl 3):72. doi: 10.1186/s12911-016-0315-2. BMC Med Inform Decis Mak. 2016. PMID: 27454118 Free PMC article.
-
Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.BMJ Open. 2016 Mar 1;6(3):e008721. doi: 10.1136/bmjopen-2015-008721. BMJ Open. 2016. PMID: 26932138 Free PMC article.
References
-
- Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries J Biomed Inform 2001;34:301-310. - PubMed
-
- Dietterich TG. Ensemble methods in machine learning Lecture Notes in Computer Science 2000(1857):1-15.
-
- Ghani R. Using error-correcting output codes for text classificationIn: Langely P, editor. Proceedings of the 17th International Conference on Maching Learning (ICML)-2000, San Francisco. United States: Morgan Kaufmann Publishers; 2000. pp. 303-310.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical