Pre-test Prediction of Non-ischemic Cardiomyopathies using Time-Series EHR Data

Kary Ishwaran; Bryan Q. Abadie; Po-Hao Chen; Michael Bolen; Tara Karamlou; Richard Grimm; W.H. Wilson Tang; Christopher Nguyen; Deborah Kwon; David Chen

AMIA Jt Summits Transl Sci Proc. 2024; 2024: 239–248.

Published online 2024 May 31.

PMCID: PMC11141858

Pre-test Prediction of Non-ischemic Cardiomyopathies using Time-Series EHR Data

Kary Ishwaran,¹ Bryan Q. Abadie, MD,¹ Po-Hao Chen, MD, MBA,² Michael Bolen, MD,^1
,² Tara Karamlou, MD,¹ Richard Grimm, DO,¹ W.H. Wilson Tang, MD,^1
,³ Christopher Nguyen, PhD,^1
,^2
,³ Deborah Kwon, MD,^1
,^2
,³ and David Chen, PhD^1
,³

Author information Copyright and License information PMC Disclaimer

Abstract

Clinical imaging is an important diagnostic test to diagnose non-ischemic cardiomyopathies (NICM). However, accurate interpretation of imaging studies often requires readers to review patient histories, a time consuming and tedious task. We propose to use time-series analysis to predict the most likely NICMs using longitudinal electronic health records (EHR) as a pseudo-summary of EHR records. Time-series formatted EHR data can provide temporality information important towards accurate prediction of disease. Specifically, we leverage ICD-10 codes and various recurrent neural network architectures for predictive modeling. We trained our models on a large cohort of NICM patients who underwent cardiac magnetic resonance imaging (CMR) and a smaller cohort undergoing echocardiogram. The performance of the proposed technique achieved good micro-area under the curve (0.8357), F1 score (0.5708) and precision at 3 (0.8078) across all models for cardiac magnetic resonance imaging (CMR) but only moderate performance for transthoracic echocardiogram (TTE) of 0.6938, 0.4399 and 0.5864 respectively. We show that our model has the potential to provide accurate pre-test differential diagnosis, thereby potentially reducing clerical burden on physicians.

Introduction

Non-ischemic cardiomyopathies (NICM) are a serious set of diseases afflicting the heart(1). The presentation and disease course are highly varied, even within a single etiology(2). Common among all NICMs are the high risk for heart failure and need for heart transplantation. Early detection through clinical imaging is critical for effective patient management(3). However, accurate interpretation of imaging studies are at least partly dependent on having succinct and relevant patient history available at the time of interpretation(4). The requirement for such information can be difficult given the long and potentially varied symptoms associated with these diseases. Furthermore, the amount of data available in the electronic health record (EHR) is increasing with time. Therefore, reviewing patient history for such important information increases both clerical and standard-of-care responsibilities on the readers, who are often already faced with increasing workloads.

One method to push patient information to readers is summarization of clinical notes. Several groups have proposed methods to automatically produce discharge summaries(5,6), as discharge summaries provide a natural dataset for which to learn pertinent information. Alsentzer and Kim proposed an extractive model using a long short term memory LSTM) model to identify relevant entities to include in discharge notes, achieving a high F1 score of 0.88(5). More recently, Searle et al combined extractive summarization with abstractive summarization (producing free text) using pre-trained large language models to produce full notes(6). Unfortunately, the results fell well short compared to summarization in natural domain data, achieving F1 scores well below 0.5. The poor result signifies the difficulty in generating clinical free-text summarization. Similar techniques have also been applied to radiological ordering. For instance, Kalra et al recently used machine learning models to classify term frequency-inverse document frequency (TF-IDF) features of free-text orders into unique imaging protocols(7). The results achieved good accuracies as high as 0.84 for certain classes of protocols in their focused task. However, such automated ordering does not use the full clinical narrative and therefore, cannot inform readers at the point of interpretation of information potentially pertinent for diagnosis. Although summarization of patient histories would be an ideal tool to solve this issue, but the current methods do not yet achieve satisfactory performance(8).

We propose to simplify this task by providing pre-test probability of disease to the reader as a concise proxy for the full clinical history using time-series models. Casting this problem offers three primary advantages: 1) alleviates the need to create and annotate a dataset of relevant patient history, 2) leverage a large amount of patient history information efficiently, and 3) implicitly allows us to integrate temporal information into our models. Therefore, we use time-series EHR data of patient problem-list encoded as ICD-10 codes to predict the diagnosis rendered by the imaging study.

Time-series modeling of EHR data is a well-established field. Rahimian et al demonstrated that usage of machine learning and temporal information increased performance over traditional models for risk prediction using EHR data(9). Hidden Markov Models have been widely applied to time series modeling, although with the assumption that the probability of change in the hidden state is dependent on time between observations(10). Bayesian networks have also been used to model time series with the assumption that the graphical model represents the conditional dependencies between a set of inputs(11). However, neither of these statistical frameworks are robust to irregularly spaced events common to EHR data.

Recently, recurrent neural networks (RNN) have been applied to a variety temporal data analytics tasks in healthcare(12). Lipton et al first cast irregular data as a missing data problem(13). They found simple RNN architectures to generalize well to time series predictive tasks even without complex imputation strategies. Subsequent innovations with RNNs include architectural modifications to the RNN cells which explicitly learn the impact of missingness (14,15). Transformer frameworks have also been applied to time series prediction. For example, Zerveas et al proposed a transformer-based framework that took first place on 12 popular datasets at time of publication(16). Such approaches are particularly well-suited for EHR data analysis, due to their ability to capture deep hierarchical features and long-range dependencies.

In this work, we leverage various deep learning architectures to learn from sequential ICD-10 diagnosis codes in order to predict final disease diagnosis from an cardiac imaging study. Our objective is to develop a model which prompts readers of clinical images of the most likely disease diagnosis at the time of the imaging study. Such a deep learning architecture has the potential to aggregate a large amount of temporally sparse information while mitigating the temporal uncertainty associated with diagnosis codes. We demonstrate the ability of our proposed method to provide accurate differential diagnosis as abstracted from patient charts in a cohort of patients undergoing transthoracic echocardiogram (TTE) and/or cardiac magnetic resonance imaging (CMR), both clinical imaging staples for diagnosing and prognosticating NICM.

Data and Methods

Data

This study was approved by the Cleveland Clinic institutional review board. Our data was taken from multiple sites from a single hospital system. The overall distribution of sites is heavily skewed towards a single campus. Therefore, we combine all data into a single comprehensive bucket, although with the knowledge that clinical practice and code standards may differ dramatically at different sites. The dataset was constructed via convenience sampling. The NICM cohort was appropriated from another study which was constructed from adult patients who underwent a CMR exam between 2002 and 2021 at a Cleveland Clinic site.

All patients were reviewed for definitive diagnosis through chart review by a clinical research fellow using the relevant guidelines(17–22). Accuracy of the annotations were then confirmed by a level 3 board-certified cardiologist. Specifically, cardiac amyloidosis was determined through characteristic pattern of late gadolinium enhancement on CMR with a large subset of patients also having positive confirmatory testing(23,24). Hypertrophic cardiomyopathy (HCM) was determined through CMR biomarkers of left ventricular wall thickness >15mm, absence of abnormal loading conditions, and absence of infiltrative cardiomyopathies(17). Diagnosis of ischemic cardiomyopathy (ICM) was determined by review of patient history for history of revascularization, myocardial infarction, or multi-vessel disease and ejection fraction < 40%(25). Non-differentiated NICM was determined to be any patient suffering from heart failure without a specific etiology(26). Definitive diagnosis of cardiac sarcoidosis was determined by either positive histopathology for granulomatous inflammation and/or electrocardiographic abnormalities combined with reduced systolic function(21,27). Cases of suspected myocarditis were found from CMR with myocardial dysfunction, and diffuse late gadolinium enhancement. All cases were then validated using endomyocardial biopsy(28). Dilated cardiomyopathy (DCM) was determined by a left ventricular end-diastolic volume index or diameter >2 and ejection fraction <50%(22). A patient could have multiple diagnosis at the same time (e.g. DCM stemming from ICM).

There are a total of 1738 CMR studies included in this dataset; 756 NICM, 318 ischemic cardiomyopathy (ICM), 231 cardiac amyloidosis (AMYL), 79 HCM, 238 sarcoidosis, 239 myocarditis, and 131 dilated cardiomyopathy (DCM). The mean age at the time of CMR was 56.57±15.40. Within the 1,742 patients, 574 are female and 1,168 are male.

In addition, we evaluate the applicability of this methodology on other clinical imaging modalities given the distribution of available longitudinal data will be very different. Therefore, we also investigate TTE. We found all patients in this cohort with an echocardiogram done at one of the sites in the hospital system within 3 years of the CMR with the assumption that the final diagnosis and disease severity would not have significantly changed in this time. This dataset includes a total of 330 TTE studies with a distribution of 122 NICM, 64 ICM, 40 AMYL, 19 HCM, 52 sarcoidosis, 60 myocarditis, and 13 dilated cardiomyopathy (DCM).

For longitudinal analysis, we pulled all ICD-10 diagnosis codes associated with each patient/event. This resulted in a dataset of approximately 1.3 million individual codes from 2707 different diagnosis. Unsurprisingly, this data source has a long tail of rare codes which greatly increases the sparsity of our feature space. Therefore, we remove all codes which appear in less than 1% of patients. This reduces the number of unique codes features to 186.

Preprocessing

We formalize our problem as follows. A patient observation refers to a pair $x_{i} = (Δ t_{i}, d_{i})$ , in which i dictates the observation’s temporal order, ∆t_i is the difference in time between the observation and index event, and $d_{i} = [d_{i, 1}, d_{i, 2}, ..., d_{i, p}]$ is a multi-hot vector for all p possible diagnosis codes in our dataset such that $d_{i, j} = 1$ represents the presence of diagnosis j during observation i. Then, the vector $X_{l} = [x_{1}, x_{2}, ..., x_{n l}]$ represents the time-series EHR of patient l, where n_l is the number of observations for patient l. Associated with this is the multi-hot vector y_l=[c₁, c₂, ... , c₇] representing the cardiomyopathies at the index event for patient l.

Therefore, an imaging event is defined as an index event for which there is a confirmation of at least one of seven cardiomyopathies. The diagnosis codes occur at highly asynchronous points depending on the patient. We address this issue by normalizing the relevant time window and the time-intervals represented by each observation point in the time-series. First, we constrained diagnosis codes which were recorded within the 182 days preceding the given patient’s index event. Second, we bin the observation interval into 7 day periods. For patients with data shorter than the 182 day period, we zero-padded (or NaN padded depending on the model architecture) the time-series. Lastly, we recognize that although each diagnosis code is recorded at a single discrete time, it often represents an on-going disease state. Therefore, last observation carried forward (LOCF) imputation was applied. The imputation also helps to mitigate the sparsity. An schematic of the setup of our data is shown in Figure 1.

An external file that holds a picture, illustration, etc.
Object name is 2046f1.jpg

Open in a separate window

Figure 1:

Schematic of the data setup.

Our sequence data is summarized in Table 1. The average patient had a modest amount of diagnoses, but the diagnosis was spread over a long time window - further incentivizing a time-series approach. The largest population in our dataset was patients with undifferentiated non-ischemic cardiomyopathy, serving as our control. While all classes are minority classes relative to the output space, none are infrequent enough to require further data manipulation.

Table 1:

Distribution of diseases, sequence length, and number of diagnosis per patient.

		Before preprocessing			After preprocessing
Modality	Disease	Number of studies	Average Encounters	Average Unique Diagnosis Codes	Number of studies	Average Encounters	Average Unique Diagnosis Codes
CMR	Totals	1870	5.0	9.43	1738	4.8	5.97
	NICM	795	4.8	9	756	3.9	5.56
	ICM	345	5.1	10.27	318	4.4	7.00
	AMYL	247	6.5	11.79	231	5.7	7.44
	HCM	83	4.0	7.36	79	3.3	5.11
	Sarcoidosis	249	5.3	9.92	238	4.3	5.88
	Myocarditis	277	4.4	8.59	239	3.8	5.51
	DCM	141	4.4	9.60	131	3.8	5.60
Echo	Totals	439	5.1	9.48	330	4.8	6.28
	NICM	169	4.5	8.43	122	3.8	5.42
	ICM	82	5.4	11.26	64	4.9	7.54
	AMYL	51	6.7	10.86	40	5.4	6.47
	HCM	20	3.0	5.95	19	3	4.63
	Sarcoidosis	64	6.3	11.67	52	5.2	7.24
	Myocarditis	86	4.4	8.20	60	3.8	5.51
	DCM	20	3.7	6.5	13	3	4.33

Open in a separate window

Models

We explored several time-series deep learning models including variants of recurrent neural networks (RNNs) and transformer models. Specifically, we looked into simple RNNs, LSTMs, bidirectional GRUs, and transformers, which have been shown to be suitable for time-series EHR prediction(29,30). Contrasting traditional feedforward networks, the aforementioned models have intrinsic structures to harness antecedent temporal data. In particular, simple RNNs process inputs sequentially, LSTMs and GRUs introduce gating mechanisms to regulate information flow, while transformers use parallel attention mechanisms, foregoing sequential processing. These primary architectures are referenced herein as RNN, LSTM, GRU, and TransformerModel, respectively. Furthermore, our exploration incorporated two additional model categories: 1) CellAttention models use their particular cell for feature abstraction, subsequently channeling this extracted information into a transformer encoder. Within this category are the TST, RNNAttention, LSTMAttention, and GRUAttention models. Notably, the TST model’s cell is linear and is a PyTorch implementation of the work by Zerveas et al (16). 2) Conversely, the TransformerCell models employ an antithetical approach to the CellAttention models. They begin with a transformer encoder which is fed forward into their cell layers. We note that ‘TransformerModel’ is actually the TransformerCell analogue of the TST; routing their transformer encoder’s outputs into a linear layer -- which is the standard architecture of an attention transformer, as the model was introduced. The implementation of these models were sourced from tsai(31).

We compared the time-series model to a single-time point random forest model as a baseline using the last time point in the LOCF dataset as our features. The RandomForestSRC package was used specifically for their ‘imbalanced’ function, which is designed to handle the two-class imbalanced problem using a cost-weighted Bayes classifier (32). This was especially pertinent considering that each cardiomyopathy was characterized by a sparse number of positive instances. Consequently, a distinct univariate model was trained for each cardiomyopathy, and performance was measured globally for the forest aggregated multi-label predictions.

All experiments were conducted using a training/validation/testing split of 70/15/15. The models were only exposed to the testing set once hyperparameter tuning was finished. Hyperparameters were tuned using a grid-search approach. Following training, probabilities were calibrated by fitting the training data with isotonic regression, and classification thresholds were determined for each cardiomyopathy by taking 100 evenly spaced splits from 0 to 1 and maximizing the F1 score over the training data. These were validated to improve metrics over the validation set, and were eventually used to attain the final model results over the test set. For evaluation metrics, we chose to focus on micro-averaged area under the receiver operating curve (AUC), micro-averaged F1 Score, and precision at 3 (P@3). The P@3 metric is a reflection of the expected class being within the top 3 highest probability class. Finally, we evaluated the importance of LOCF with respect to our data structure by measuring the impact of the preprocessing step on the F1 score of the model for the CMR dataset.

Data preprocessing was done in R using the data.table 1.13 library(33). All experiments were coded in python 3.9 using pyTorch 1.12(34). Each model was ran using the ‘1cycle policy’, the time series AI implementation(35). All models were trained on a 32GB nVidia V100 GPU. The code will be made public on Github once accepted for publication.

Results

RNN model can provide pre-test probability of disease in CMR

Table 2 shows classification outcomes of all models trained using CMR index events. The overall results were promising, with the best performing model achieving an AUC of 0.8446, F1 of 0.5873, P@3 of 0.8739. There was no clear best model as all time series models achieved AUCs between 0.8214 and 0.8463 and F1 between 0.5593 and 0.5873. The P@3 had a larger distribution between 0.7644 and 0.8739. The standard RNN model had comparable performance with more complex models, although on average models with some sort of attention mechanism achieved higher performance. For comparison, the random forest model produced marginally higher AUC compared to the time series models but posted much lower retrieval metrics.

Table 2:

The predict performance for pre-CMR disease prediction

Model	AUC	F1	Recall	Prec	P@3
RNN	0.8287	0.5709	0.5066	0.6538	0.8198
LSTM	0.8214	0.5620	0.5099	0.6260	0.7798
GRU	0.8315	0.5694	0.5298	0.6154	0.7642
TransformerRNN	0.8396	0.5736	0.5033	0.6667	0.8222
TransformerLSTM	0.8453	0.5593	0.5000	0.6345	0.8011
TransformerGRU	0.8446	0.5873	0.5066	0.6986	0.8739
Transformer	0.8295	0.5719	0.5464	0.6000	0.7644
TST	0.8383	0.5853	0.5397	0.6392	0.7778
RNNAttention	0.8463	0.5651	0.5033	0.6441	0.8018
LSTMAttention	0.8376	0.5693	0.4967	0.6667	0.8417
GRUAttention	0.8302	0.5645	0.5000	0.6481	0.8391
Random Forest	0.8559	0.4730	0.7119	0.3542	0.5342

Open in a separate window

The performance heavily differs by class

The average predictive ability of our longitudinal approach for predicting the broad spectrum of NICMs is moderate. However, there were significant differences in the discriminative ability for individual etiologies, as presented in Table 3. There was a difference of 0.2267 in AUC between the most accurate disease class (ICM) and the least accurate disease class (myocarditis). The vast difference is also reflected in the F1 scores with a difference of 0.6130 between AMYL and DCM. The recall of myocarditis and DCM are also extremely low, which suggests difficulty predicting these diseases from the given data sources. The results somewhat trend with the number of studies in each class (Table 1).

Table 3:

Median metrics by class for CMR.

Model	AUC	F1	Recall	Prec
NICM	0.7676	0.6694	0.6694	0.6747
ICM	0.8617	0.6585	0.6098	0.7205
AMYL	0.8216	0.6832	0.6023	0.7690
HCM	0.8377	0.5433	0.4815	0.6583
Sarcoidosis	0.8008	0.5167	0.4444	0.6125
Myocarditis	0.6350	0.1177	0.0909	0.1835
DCM	0.6791	0.0702	0.0417	0.2250

Open in a separate window

Results for TTEs are comparatively worse

We also developed a model to provide pre-test disease predictions for TTEs. The results for overall model performance and median metrics by class are shown in Table 4 and Table 5 respectively. The models on average achieved 0.1419 pts lower AUC and 0.1309 lower F1 score. The median metrics by class reflect the lower performance with the model almost failing to identify sarcoidosis, HCM, myocarditis, and DCM.

Table 4:

The predict performance for pre-echocardiogram disease prediction

Model	AUC	F1	Recall	Prec	P@3
RNN	0.6605	0.3659	0.3333	0.4054	0.5631
LSTM	0.6792	0.4222	0.4222	0.4222	0.5931
GRU	0.7140	0.4471	0.4222	0.4750	0.6323
TransformerRNN	0.7598	0.5435	0.5556	0.5319	0.6671
TransformerLSTM	0.7207	0.4396	0.4444	0.4348	0.5691
TransformerGRU	0.7246	0.5376	0.5556	0.5208	0.6117
Transformer	0.7005	0.4742	0.5111	0.4423	0.5974
TST	0.7050	0.3810	0.3556	0.4103	0.5777
RNNAttention	0.6416	0.3913	0.4000	0.3830	0.5111
LSTMAttention	0.7015	0.4706	0.4444	0.5000	0.6477
GRUAttention	0.6246	0.3656	0.3778	0.3542	0.4801
Random Forest	0.7533	0.4878	0.4444	0.5405	0.6882

Open in a separate window

Table 5:

Median metrics by class for echocardiogram.

Model	AUC	F1	Recall	Prec
NICM	0.7143	0.5881	0.6786	0.5278
ICM	0.6939	0.4143	0.5000	0.3542
AMYL	0.6944	0.5714	0.4444	0.8000
HCM	0.7518	0.3333	0.5000	0.2500
Sarcoidosis	0.9662	0.2823	1.0000	0.1833
Myocarditis	0.5268	0.2500	0.2222	0.2857
DCM	0.5000	0.0000	0.0000	0.0000

Open in a separate window

Last observation carried forward is important for model accuracy

Additionally, model training was repeated without carrying observations forward. The evaluated metrics of this choice for the CMR cohort is displayed in Figure 2, which show a patterned decrease in performance. The models utilizing attention heads for feature extraction fare better, as they naturally portion the feature space and reduce sparsity. In contrast, the models that rely on sequential feature extraction have greatly reduced results. The biggest differences occurred for the RNN, LSTM and GRU, which were all evaluated to have F1 scores under 0.500. This diminished performance may be attributed to a compromised learning efficacy during training: their inability to efficiently navigate the feature space leads to overfitting prior to nearing global minima. Regardless, all models without the LOCF were outperformed by their counterparts from Table 2.

An external file that holds a picture, illustration, etc.
Object name is 2046f2.jpg

Open in a separate window

Figure 2:

F1 scores with and without LOCF for pre-CMR disease prediction.

Discussion

In this work, we demonstrated a deep learning based time-series modeling paradigm for delivering pre-test disease prediction for CMR and echocardiography. Radiology exams are most useful when answering a specific clinical question(4). However, the quality of requisitions is often lacking(36). The overall accuracy was not perfect for any specific model but are encouraging given that a clinician would not expect to know the definitive diagnosis prior to the ordered imaging study. Rather, clinical guidelines leverage imaging to provide more definitive evidence of a specific diagnosis in each of these diseases. The relatively high P@3 would suggest that this model could be used to augment clinical histories on radiological requisitions for the radiologists or cardiologists interpreting CMR or echocardiography studies.

Despite the good mean AUC, F1, and P@3 metrics of the models, there was a wide distribution of disease-specific measurements. ICM achieved the highest AUC among the disease classes, while DCM achieved the lowest. ICM often has a specific clinical course reflecting ischemic diseases. There are several important clinical events including myocardial infarction or stroke which could be unique identifiers for this disease among this cohort. Similarly, patients with CA often have long diagnostic pathways(18,24). This would result in a uniquely lengthy and densely filled data dimension compared to the other classes as shown in Table 1.

On the other hand, our models had difficulty accurately detecting myocarditis and DCM. Myocarditis is often an acute event which means there is naturally less data associated with each case. Although there are presentations which are chronic or produce chronic problems, the volume of prior diagnostic codes for myocarditis is the small compared to other classes. The relative number of DCM patients in our dataset is even smaller, with the counts of only the HCM cohort comparable. The lower number of events and acuity makes detecting patterns difficult. However, it is not the only reason for poor performance as evidenced by HCM, which has fewer number of CMR studies compared to the other classes. We hypothesize it’s good performance in CMR is due to the fact that it is often already suspected via other imaging tests. CMR is usually ordered as a confirmatory test(17). Conversely, for the other five cardiomyopathies, the models yielded commendable results with impressive AUC scores and a balanced trade-off between recall and precision. The elevated precision metrics underscore the propensity of these models to make accurate positive predictions.

Also reflecting the issue of less data is predicting disease for echocardiography is significantly less accurate compared to for CMR. First, the TTE cohort was substantially limited in size: comprising just 330 patients. This makes learning patterns associated with disease harder. Second, TTE is often one of the first cardiac imaging tests ordered when a cardiac disease is suspected. In comparison, 79.5% of CMR cases had a cardiovascular related ICD10 code compared to 73.8% for TTE. Therefore, there is often less information which can be leveraged for any kind of prediction task and may not be useful in emergent or outpatient referral situations. Rather, current clinical practice relies on nursing or technicians at the point of care to record useful clinical history. Changes to the way we record patient histories either via patient provided information or more extensive documentation at the point of care maybe needed to better inform decision making through AI models.

Deep learning based time-series models seem to be capable of learning uncertainty of variables through time(37) unlike conventional single-time-point models as evident by the poor predictive power of the random forest model. Regardless of clinical imaging modality, the sparsity of diagnostic codes is a consistent feature of healthcare datasets not often seen in other time-series prediction tasks. Consequently, both zero imputation and LOCF can and does introduce extensive errors in our time-series data.

The inherent sparsity of diagnostic codes present a unique challenge in healthcare datasets, setting them apart from other time series prediction tasks. One key reason is the temporal limitation of diagnostic codes; they are typically entered at a single point in time, with little or no follow-up information on whether conditions have been resolved or if they continue to impact the patient. This lack of temporal information puts the reliability of diagnostic codes in question when they are used for time series analysis.

Moreover, the very nature of healthcare practice adds another layer of complexity. Diagnosis codes are often generated only when there is a patient encounter dedicated for a particular set of symptoms. For instance, a cardiologist might only record codes relevant to heart issues, even if the patient may have multiple co-morbidities. This means that the absence of a diagnostic code does not necessarily indicate the absence of disease. Additionally, our healthcare institution is a quaternary care center, dealing primarily with severe illnesses. On one hand, this means we have a unique population of serious illness. Conversely, many patients do not have established history of primary care here, thereby reducing available time series data.

Incorporating EHR free-text can be one way to accommodate the complexity of healthcare. Generation of diagnosis code for billing purposes lends itself to minimizing clinical nuance, often poorly reflecting the true disease state. Codes also introduce a lot of variability in documentation practice between different institutions, such as different time lag which may impact decision support timeliness in an ambulatory setting. Therefore, methods to augment codes through free text will be necessary to mitigate such variability.

Limitations

This study contains several limitations. First, our models were trained only on a cohort of patients with cardiomyopathies. Although clinical imaging is strongly recommended for patients with suspected cardiomyopathies, they represent only a small portion of the possible cardiac disease spectrum. This subset of diseases also tends to have long diagnostic pathways, which may influence the accuracy of our model. Further studies inclusive of additional cardiac diseases is warranted. Second, this cohort also represents only positively identified patients. This introduces a selection bias caused by the fact that our cohort was built on patients with CMR and a positive diagnosis of several diseases. However, there is a need to include patients who are suspected of disease but have negative findings. Finally, not all disease labels here have equally certain disease states. AMYL and HCM disease labels were positively labeled only after tertiary testing such as myocardial biopsy. However, the DCM disease label cannot be as precisely found as there are no strict guidelines regarding disease diagnosis.

Conclusion

The evolving landscape of clinical diagnostics has brought with it an exponential increase in available data. Extracting actionable insights remains challenging, and the necessary tools still immature. This study set out to leverage methodologies for time-series analysis to improve clinical inference. We demonstrated that deep learning time-series models have the potential to greatly reduce the need to rely on often imprecise orders or manual review of patient history for interpreting clinical imaging studies by providing the most likely diseases prior to the imaging study. However, such an approach is still limited by the realities of medical practice where evidence of disease is highly variable due to the acuity of disease and the fragmented nature of medical data. Methods to incorporate free text and improve access to patient data would be beneficial towards better AI decision support models for image interpretation.

Figures & Table

References

1. Maron BJ, Towbin JA, Thiene G, Antzelevitch C, Corrado D, Arnett D, et al. Contemporary Definitions and Classification of the Cardiomyopathies. Circulation. 2006 Apr 11;113(14):1807–16. [PubMed] [Google Scholar]

2. Ng ACC, Sindone AP, Wong HSP, Freedman SB. Differences in management and outcome of ischemic and non-ischemic cardiomyopathy. International Journal of Cardiology. 2008 Sep 26;129(2):198–204. [PubMed] [Google Scholar]

3. Kristen Arnt V., Brokbals Eva, aus dem Siepen Fabian, Bauer Ralf, Hein Selina, Aurich Matthias, et al. Cardiac Amyloid Load. Journal of the American College of Cardiology. 2016 Jul 5;68(1):13–24. [PubMed] [Google Scholar]

4. Gunderman RB, Phillips MD, Cohen MD. Improving Clinical Histories on Radiology Requisitions. Academic Radiology. 2001 Apr 1;8(4):299–303. [PubMed] [Google Scholar]

5. Alsentzer E, Kim A. Extractive summarization of ehr discharge notes. arXiv preprint arXiv:181012085. 2018.

6. Searle T, Ibrahim Z, Teo J, Dobson RJ. Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models. Journal of Biomedical Informatics. 2023;141:104358. [PubMed] [Google Scholar]

7. Kalra A, Chakraborty A, Fine B, Reicher J. Machine Learning for Automation of Radiology Protocols for Quality and Efficiency Improvement. Journal of the American College of Radiology. 2020 Sep 1;17(9):1149–58. [PubMed] [Google Scholar]

8. Gao Y, Dligach D, Miller T, Churpek MM, Afshar M. Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients’ Active Diagnoses and Problems from Electronic Health Record Progress Notes. arXiv preprint arXiv:230605270. 2023. [PMC free article] [PubMed]

9. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS medicine. 2018;15(11):e1002695. [PMC free article] [PubMed] [Google Scholar]

10. Eddy SR. What is a hidden Markov model? Nature biotechnology. 2004;22(10):1315–6. [PubMed] [Google Scholar]

11. Xiao Q, Chaoqin C, Li Z. Time series prediction using dynamic Bayesian network. Optik. 2017;135:98–103. [Google Scholar]

12. Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor ai: Predicting clinical events via recurrent neural networks. In: Machine learning for healthcare conference. PMLR. 2016. pp. 301–18. [PMC free article] [PubMed]

13. Lipton ZC, Kale D, Wetzel R. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In: Machine learning for healthcare conference. PMLR. 2016. pp. 253–70.

14. Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports. 2018 Apr 17;8(1):6085. [PMC free article] [PubMed] [Google Scholar]

15. Chen D, Jiang J, Fu S, Demuth G, Liu S, Schaeferle GM, et al. Early Detection of Post-Surgical Complications using Time-series Electronic Health Records. AMIA Jt Summits Transl Sci Proc. 2021;2021:152–60. [PMC free article] [PubMed] [Google Scholar]

16. Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C. A transformer-based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021. pp. 2114–24.

17. Gersh BJ, Maron BJ, Bonow RO, Dearani JA, Fifer MA, Link MS, et al. 2011 ACCF/AHA guideline for the diagnosis and treatment of hypertrophic cardiomyopathy: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines. Circulation. 2011;124(24):e783–831. [PubMed] [Google Scholar]

18. Agha AM, Parwani P, Guha A, Durand JB, Iliescu CA, Hassan S, et al. Role of cardiovascular imaging for the diagnosis and prognosis of cardiac amyloidosis. Open heart. 2018;5(2):e000881. [PMC free article] [PubMed] [Google Scholar]

19. Bluemke DA. MRI of nonischemic cardiomyopathy. AJR Am J Roentgenol. 2010;195(4):935–40. [PMC free article] [PubMed] [Google Scholar]

20. Skouri HN, Dec GW, Friedrich MG, Cooper LT. Noninvasive imaging in myocarditis. Journal of the American College of Cardiology. 2006;48(10):2085–93. [PubMed] [Google Scholar]

21. Ungprasert P, Ryu JH, Matteson EL. Clinical manifestations, diagnosis, and treatment of sarcoidosis. Mayo Clinic Proceedings: Innovations, Quality & Outcomes. 2019;3(3):358–75. [PMC free article] [PubMed] [Google Scholar]

22. Japp Alan G., Gulati Ankur, Cook Stuart A., Cowie Martin R., Prasad Sanjay K. The Diagnosis and Evaluation of Dilated Cardiomyopathy. Journal of the American College of Cardiology. 2016 Jun 28;67(25):2996–3010. [PubMed] [Google Scholar]

23. Dorbala S, Cuddy S, Falk Rodney H. How to Image Cardiac Amyloidosis. JACC: Cardiovascular Imaging. 2020 Jun 1;13(6):1368–83. [PMC free article] [PubMed] [Google Scholar]

24. Kittleson MM, Maurer MS, Ambardekar AV, Bullock-Palmer RP, Chang PP, Eisen HJ, et al. Cardiac amyloidosis: evolving diagnosis and management: a scientific statement from the American Heart Association. Circulation. 2020;142(1):e7–22. [PubMed] [Google Scholar]

25. Felker GM, Shaw Linda K, O’Connor Christopher M. A standardized definition of ischemic cardiomyopathy for use in clinical research. Journal of the American College of Cardiology. 2002 Jan 16;39(2):210–8. [PubMed] [Google Scholar]

26. McKenna WJ, Maron BJ, Thiene G. Thiene, epidemiology, and global burden of cardiomyopathies. Circulation Research. 2017;121(7):722–30. [PubMed] [Google Scholar]

27. Birnie DH, Nery P, Ha AC, Beanlands RSB. Cardiac Sarcoidosis. Journal of the American College of Cardiology. 2016 Jul 26;68(4):411–21. [PubMed] [Google Scholar]

28. Biesbroek PS, Beek AM, Germans T, Niessen HWM, van Rossum AC. Diagnosis of myocarditis: Current state and future perspectives. International Journal of Cardiology. 2015 Jul 15;191:211–9. [PubMed] [Google Scholar]

29. Amirahmadi A, Ohlsson M, Etminani K. Deep learning prediction models based on EHR trajectories: A systematic review. Journal of Biomedical Informatics. 2023 Jun 26. p. 104430. [PubMed]

30. Si Y, Du J, Li Z, Jiang X, Miller T, Wang F, et al. Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review. Journal of biomedical informatics. 2021;115:103671. [PubMed] [Google Scholar]

31. Oguiza I. tsai - A state-of-the-art deep learning library for time series and sequential data [Internet] 2022. Available from: https://github.com/timeseriesAI/tsai.

32. O’Brien R, Ishwaran H. A random forests quantile classifier for class imbalanced data. Pattern recognition. 2019;90:232–49. [PMC free article] [PubMed] [Google Scholar]

33. Dowle M, Srinivasan A, Gorecki J, Chirico M, Stetsenko P, Short T, et al. Package ‘data. table.’ Extension of ‘data frame. 2019;596 [Google Scholar]

34. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, et al. In Long Beach. CA, USA: 2017. Automatic differentiation in pytorch. [Google Scholar]

35. Smith LN, Topin N. Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi-domain operations applications. SPIE. 2019. pp. 369–86.

36. Cohen MD, Curtin S, Lee R. Evaluation of the Quality of Radiology Requisitions for Intensive Care Unit Patients. Academic Radiology. 2006 Feb 1;13(2):236–40. [PubMed] [Google Scholar]

37. Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, et al. Deep Learning and Alternative Learning Strategies for Retrospective Real-World Clinical Data. npj Digital Medicine. 2019 May 30;2(1):43. [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association