A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection

doi:10.1197/jamia.M3095

. 2009 Jul-Aug;16(4):590-5.

doi: 10.1197/jamia.M3095. Epub 2009 Apr 23.

A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection

Kyle H Ambert¹, Aaron M Cohen

Affiliations

PMID: 19390099
PMCID: PMC2705265
DOI: 10.1197/jamia.M3095

A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection

Kyle H Ambert et al. J Am Med Inform Assoc. 2009 Jul-Aug.

. 2009 Jul-Aug;16(4):590-5.

doi: 10.1197/jamia.M3095. Epub 2009 Apr 23.

Authors

Kyle H Ambert¹, Aaron M Cohen

Affiliation

¹ Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA. ambertk@ohsu.edu

PMID: 19390099
PMCID: PMC2705265
DOI: 10.1197/jamia.M3095

Abstract

OBJECTIVE Free-text clinical reports serve as an important part of patient care management and clinical documentation of patient disease and treatment status. Free-text notes are commonplace in medical practice, but remain an under-used source of information for clinical and epidemiological research, as well as personalized medicine. The authors explore the challenges associated with automatically extracting information from clinical reports using their submission to the Integrating Informatics with Biology and the Bedside (i2b2) 2008 Natural Language Processing Obesity Challenge Task. DESIGN A text mining system for classifying patient comorbidity status, based on the information contained in clinical reports. The approach of the authors incorporates a variety of automated techniques, including hot-spot filtering, negated concept identification, zero-vector filtering, weighting by inverse class-frequency, and error-correcting of output codes with linear support vector machines. MEASUREMENTS Performance was evaluated in terms of the macroaveraged F1 measure. RESULTS The automated system performed well against manual expert rule-based systems, finishing fifth in the Challenge's intuitive task, and 13(th) in the textual task. CONCLUSIONS The system demonstrates that effective comorbidity status classification by an automated system is possible.

PubMed Disclaimer

Figures

**Figure 1**
Diagrammatic example of our automated hot-spot filtering procedure. In this example, the information gain associated with the word asthma identifies it as a hot-spot feature, so a 100-character window around it is extracted as the hot-spot passage and passed on to the tokenization and vector modeling steps.

**Figure 2**
Macro-averaged F1 scores across comorbidities for cross-validation studies on the training document collection (black), and training on the training collection, and testing on the test collection (gray), for both the textual (top) and intuitive (bottom) tasks. Bars for which only one color is visible indicate that the difference between training and testing performance was not significant. Abbreviations: AST—Asthma, CAD—Coronary Artery Disease, CHF—Congestive Heart Failure, DEP—Depression, DIA—Diabetes, GST—Gallstones, GRD—Gastroesophogeal Reflux Disease, GT—Gout, HCH—Hypercholesterolemia, HRT—Hypertension, HTR—Hypertriglyceridemia, OA—Osteoarthritis, OBS—Obesity, OSA—Obstructive Sleep Apnea, PVD—Post-viral Depression, VI—Venous Insufficiency.

**Figure 3**
Macro-averaged F1 scores by comorbidity for 2-, 4-, and 8-way cross-validation using the combined training and testing document collections in both the textual (black) and intuitive (gray) tasks. For most comorbidities, performance decreased with smaller datasets, for a few it remained invariant.

**Figure 4**
Macro-averaged F1 for the AutoHP (light gray), AutoHP+ NegEx (dark gray), and None (black) preprocessing procedures across comorbidities for the textual (top) and intuitive (bottom) classification tasks. The addition of NegEx only provided small improvement in performance over and above that provided by AutoHP for a few topics, which showed consistent improvements over the system having no pre-processing procedure. *See* ▶ *for abbreviation definitions*.

**Figure 5**
Error rate for the plain NegEx (solid line) regular expressions and Enhanced using Support Vector Machine (SVM) (dashed line) procedures across comorbidities and varying window sizes during 2-way cross-validation on the combined training and testing documents collections for the textual task. For all but one comorbidity, the Automated Negation Finder tended to extract fewer falsely negated terms (negated terms not actually associated with the negative class). For the Hypertriglyceridemia and Venous Insufficiency comorbidities, no NegEx features were found.

See this image and copyright information in PMC

Cited by

Comparison of nomogram and machine-learning methods for predicting the survival of non-small cell lung cancer patients.
Lei H, Li X, Ma W, Hong N, Liu C, Zhou W, Zhou H, Gong M, Wang Y, Wang G, Wu Y. Lei H, et al. Cancer Innov. 2022 Aug 30;1(2):135-145. doi: 10.1002/cai2.24. eCollection 2022 Aug. Cancer Innov. 2022. PMID: 38090651 Free PMC article.
Machine learning in rare disease.
Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Banerjee J, et al. Nat Methods. 2023 Jun;20(6):803-814. doi: 10.1038/s41592-023-01886-z. Epub 2023 May 29. Nat Methods. 2023. PMID: 37248386 Review.
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.
Hong N, Wen A, Stone DJ, Tsuji S, Kingsbury PR, Rasmussen LV, Pacheco JA, Adekkanattu P, Wang F, Luo Y, Pathak J, Liu H, Jiang G. Hong N, et al. J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14. J Biomed Inform. 2019. PMID: 31622801 Free PMC article.
CLASH: Complementary Linkage with Anchoring and Scoring for Heterogeneous biomolecular and clinical data.
Nam Y, Kim M, Lee K, Shin H. Nam Y, et al. BMC Med Inform Decis Mak. 2016 Jul 25;16 Suppl 3(Suppl 3):72. doi: 10.1186/s12911-016-0315-2. BMC Med Inform Decis Mak. 2016. PMID: 27454118 Free PMC article.
Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.
Perera G, Broadbent M, Callard F, Chang CK, Downs J, Dutta R, Fernandes A, Hayes RD, Henderson M, Jackson R, Jewell A, Kadra G, Little R, Pritchard M, Shetty H, Tulloch A, Stewart R. Perera G, et al. BMJ Open. 2016 Mar 1;6(3):e008721. doi: 10.1136/bmjopen-2015-008721. BMJ Open. 2016. PMID: 26932138 Free PMC article.

See all "Cited by" articles

References

1. Cohen A. Five-way smoking status classification using text hot-spot identification and error-correcting output codes J Am Med Inform Assoc 2008;15:32-35. - PMC - PubMed
1. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries J Biomed Inform 2001;34:301-310. - PubMed
1. Aronson AR. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap Program Proc AMIA Symp. 2001. pp. 17-21. - PMC - PubMed
1. Dietterich TG. Ensemble methods in machine learning Lecture Notes in Computer Science 2000(1857):1-15.
1. Ghani R. Using error-correcting output codes for text classificationIn: Langely P, editor. Proceedings of the 17th International Conference on Maching Learning (ICML)-2000, San Francisco. United States: Morgan Kaufmann Publishers; 2000. pp. 303-310.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
Medical
- MedlinePlus Health Information

[1] Cohen A. Five-way smoking status classification using text hot-spot identification and error-correcting output codes J Am Med Inform Assoc 2008;15:32-35. - PMC - PubMed

[2] Cohen A. Five-way smoking status classification using text hot-spot identification and error-correcting output codes J Am Med Inform Assoc 2008;15:32-35. - PMC - PubMed

[3] Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries J Biomed Inform 2001;34:301-310. - PubMed

[4] Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries J Biomed Inform 2001;34:301-310. - PubMed

[5] Aronson AR. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap Program Proc AMIA Symp. 2001. pp. 17-21. - PMC - PubMed

[6] Aronson AR. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap Program Proc AMIA Symp. 2001. pp. 17-21. - PMC - PubMed

[7] Dietterich TG. Ensemble methods in machine learning Lecture Notes in Computer Science 2000(1857):1-15.

[8] Dietterich TG. Ensemble methods in machine learning Lecture Notes in Computer Science 2000(1857):1-15.

[9] Ghani R. Using error-correcting output codes for text classificationIn: Langely P, editor. Proceedings of the 17th International Conference on Maching Learning (ICML)-2000, San Francisco. United States: Morgan Kaufmann Publishers; 2000. pp. 303-310.

[10] Ghani R. Using error-correcting output codes for text classificationIn: Langely P, editor. Proceedings of the 17th International Conference on Maching Learning (ICML)-2000, San Francisco. United States: Morgan Kaufmann Publishers; 2000. pp. 303-310.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection

Affiliation

A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical