An Explainable Artificial Intelligence-enabled ECG Framework for the Prediction of Subclinical Coronary Atherosclerosis

Changho Han; Dukyong Yoon

AMIA Jt Summits Transl Sci Proc. 2024; 2024: 535–544.

Published online 2024 May 31.

PMCID: PMC11141849

PMID: 38827057

An Explainable Artificial Intelligence-enabled ECG Framework for the Prediction of Subclinical Coronary Atherosclerosis

Changho Han, MD, MS¹ and Dukyong Yoon, MD, PhD¹

Author information Copyright and License information PMC Disclaimer

Abstract

Coronary artery calcium (CAC) as assessed by computed tomography (CT) is a marker of subclinical coronary atherosclerosis. However, routine application of CAC scoring via CT is limited by high costs and accessibility. An electrocardiogram (ECG) is a widely-used, sensitive, cost-effective, non-invasive, and radiation-free diagnostic tool. Considering this, if artificial intelligence (AI)-enabled electrocardiograms (ECGs) could opportunistically detect CAC, it would be particularly beneficial for the asymptomatic or subclinical populations, acting as an initial screening measure, paving the way for further confirmatory tests and preventive strategies, a step ahead of conventional practices. With this aim, we developed an AI-enabled ECG framework that not only predicts a CAC score ≥400 but also offers a visual explanation of the associated potential morphological ECG changes, and tested its efficacy on individuals undergoing health checkups, a group primarily comprising healthy or subclinical individuals. To ensure broader applicability, we performed external validation at a separate institution.

Introduction

Atherosclerosis is a chronic, progressive inflammatory disease. Its natural history unfolds over an extended subclinical phase before manifesting symptoms. As coronary atherosclerosis advances, it results in the gradual narrowing of the coronary arteries, known as coronary heart disease (CHD). Often, the first presentation of CHD in previously asymptomatic patients is acute coronary syndrome (ACS). ACS results from plaque rupture, which triggers a thrombotic cascade, culminating in total or near-total occlusion of the coronary lumen¹.

In recent decades, the severe repercussions of ACS have intensified the focus on detecting early and subclinical atherosclerotic lesions. Early identification of subclinical atherosclerosis can guide preventive measures, encompassing lifestyle modifications and pharmacological interventions such as aspirin, antihypertensives, and lipid-lowering agents. As a result, various imaging techniques, both invasive and noninvasive, have been devised for early detection. Notably, coronary artery calcification (CAC) as assessed by computed tomography (CT) has gained significant attention². Numerous studies validate that CAC is indicative of subclinical atherosclerosis^3,4, and its ability to predict cardiovascular events has been extensively corroborated^5-7. Several guidelines advocate the utilization of CAC scoring for intermediate or borderline-risk individuals when a decision regarding statin therapy is ambiguous following traditional risk evaluation^8,9. However, the routine application of CAC scoring is constrained by factors such as the elevated cost associated with CT scans, potential radiation exposure, and limited accessibility.

An electrocardiogram (ECG) stands as a widely used diagnostic tool that is sensitive, cost-effective, non-invasive, and radiation-free. Increasingly, advanced artificial intelligence (AI) methodologies demonstrate the capability to discern nuanced patterns in ECGs that are not immediately evident through conventional interpretation, allowing for the identification of conditions previously deemed elusive via ECGs¹⁰. If AI technology can predict CAC by analyzing ECG data, individuals undergoing ECGs for routine health exams or other medical evaluations could gain important insights into their risks for potential CHD. Implementing AI-enhanced ECGs for opportunistic CAC detection could be especially impactful for asymptomatic or subclinical individuals, serving as a preliminary screening method prompting subsequent diagnostic procedures and preventative measures. Conversely, for patients already manifesting symptoms or diagnosed with CHD, comprehensive assessments surpassing the scope of ECGs are intrinsically essential, thus diminishing the relevance of AI-augmented ECG predictions for this demographic.

Thus, in this study, we developed an AI-enabled ECG framework that integrates CAC score prediction with visual explanation, and validated its efficacy on individuals undergoing health checkups, as these individuals are predominantly healthy or in a subclinical state. As a brief overview, we developed our AI model predicting CAC score ≥400 using an extensive dataset comprising over 194,000 ECGs annotated with CAC scores. We then tested the model on a health screening dataset comprising individuals who had both ECG and CT measurements of CAC as part of their health checkup evaluations. We then evaluated the model’s applicability in distinct yet potentially analogous populations by conducting external validation. Additionally, in order to provide model explainability and further enhance performance, we integrated the FactorECG technique, as detailed in studies by van de Leur et al. (2022) and Wouters et al. (2023)^11,12.

Methods

Ethics approval

The Institutional Review Boards (IRB) of Severance Hospital (SH) and Yongin Severance Hospital (YSH) approved this study and waived the requirement for informed consent because only anonymized data were used retrospectively (IRB no. 4-2022-1299 [SH], 9-2021-0023 and 9-2022-0183 [YSH]).

Data sources and labeling

The standard 12-lead ECG data and electronic medical records (EMR) from SH were utilized for AI model development and internal testing (Figure 1). The 12-lead ECG database from SH is sourced from the General Electric (GE) Healthcare MUSE^TM system and includes data from all departments of the hospital, including health checkups. This database consists of raw waveforms (one-dimensional ECG signal), measurement metrics like heart rate, PR interval, and QT interval, along with automatic ECG interpretations generated by the GE ECG machine. Each ECG recording has a duration of 10 seconds with sampling rates of either 500 Hz or 250 Hz. The GE ECG algorithm constructs and stores a median waveform for each ECG recording, spanning 1.2 seconds. This is achieved by aligning all QRS complexes of identical shape and deriving a representative QRS complex using the median voltage.

An external file that holds a picture, illustration, etc.
Object name is 2008f1.jpg

Open in a separate window

Figure 1.

Patient flow diagram (SH). SH: Severance Hospital, CAC: coronary artery calcium, CT: computed tomography

From the EMR database, we retrieved CT readings of heart-related scans conducted between November 2005 and August 2022 for patients aged 18 and above. Notably, CT scans undertaken during health checkups were available from December 2010 to August 2022. To extract the CAC scores from these CT readings, we employed regular expressions, using a comprehensive range of search terms such as “calcium score” and “CAC score”.

ECGs recorded during health checkups were extracted if a corresponding CAC measurement via CT was performed during the same visit. These ECGs were subsequently labeled with the corresponding CAC score. These ECGs were designated as the health screening hold-out test dataset. ECGs not recorded during health checkups were extracted if their recordings fell within a 30-day period surrounding the CAC measurements, either preceding or following them, and these ECGs were subsequently labeled with the respective CAC scores. In instances where multiple CAC measurements were taken within this 30-day period relative to an ECG, the ECG was labeled with the CAC score from the closest date. These ECGs constituted the model development dataset. ECGs bearing automatic interpretations that included any of the following phrases were excluded: “lead reversal,” suggesting potential lead misplacement; “poor quality,” signifying the presence of artifacts; and “pacemaker,” indicating the potential presence of an artificial pacemaker. To prevent data leakage and overestimation of performance, ECGs from patients present in the health screening test dataset were additionally excluded from the model development dataset. The model development dataset was then partitioned into training and validation datasets at an 80:20 ratio, while ensuring no overlap of patients between the two.

We conducted external validation of our AI model using data from YSH health checkups (Figure 2). We retrieved CAC scores from CT reports produced during health checkups between April 2020 and August 2022. ECGs recorded during these health checkups were selected and labeled with the respective CAC scores if a corresponding CT scan from the same visit was available. Each ECG recording had a duration of 10 seconds, with a sampling rate set at 500 Hz. Each database entry also includes a median waveform, lasting 1.2 seconds. The exclusion criteria based on automatic interpretation phrases, as used for the SH database, were also applied to the ECGs from YSH.

An external file that holds a picture, illustration, etc.
Object name is 2008f2.jpg

Open in a separate window

Figure 2.

Patient flow diagram (YSH). YSH: Yongin Severance Hospital, CAC: coronary artery calcium, CT: computed tomography

Data preprocessing

ECGs with a sampling rate of 250 Hz underwent upsampling to 500 Hz using linear interpolation, ensuring a uniform 500 Hz rate across all ECGs. Each waveform was standardized with z-score normalization, bringing the mean to 0 and the standard deviation to 1. According to the Einthoven law and Goldberger equation, only two of the six limb leads (leads I, II, III, aVR, aVL, aVF) are needed to calculate the other four^13,14. Therefore, using any two limb leads provides the same information as all six. We thus used eight leads (leads I, II, V1-V6) from the 12 available as input.

For every training epoch, we randomly chose a distinct 2.5-second segment from the 10-second ECG, introducing slight variations in the data across epochs to emulate data augmentation effectively. For the internal validation, internal testing, and external validation datasets, the 10-second ECGs were segmented into four non-overlapping 2.5-second intervals, and all segments were evaluated for consistency.

AI model development

We utilized the raw waveforms of the ECGs as input and adopted the 1-dimensional variant of EfficientNet-B0 for our model architecture (Table 1)¹⁵. We trained our EfficientNet model without leveraging any pretrained weights.

Table 1.

Neural network architecture summary.

EfficientNet-B0				FactorECG
Stage	Operator	Output shape	Layers	Stage	Operator	Output shape	Layers
Input		8 × 1250		Input		8 × 600
1	Conv1d (k=3)	32 × 625	1	1	CausalConvolutionBlock	128 × 600	7
2	SepConv (k=3)	16 × 1625	1	2	CausalConvolutionBlock	64 × 600	1
3	MBConv (k=3)	24 × 313	2	3	AvgPool	64	1
4	MBConv (k=5)	40 × 157	2	4: latent space	Linear, Softplus	μ: 48 σ: 48	1
5	MBConv (k=3)	80 × 79	3	5	Reparameterization	48	1
6	MBConv (k=5)	112 × 79	3	6	Linear	64	1
7	MBConv (k=5)	192 × 40	4	7	Linear	38400	1
8	MBConv (k=3)	320 × 40	1	8	Reshape	64 × 600	1
9	Conv1d (k=1)	1280 × 40	1	9	CausalConvolutionBlock	128 × 600	7
10	AvgPool	1280	1	10	CausalConvolutionBlock	8 × 600	1
11	Linear	2	1	11: output	Flatten, Linear, Softplus, Reshape	μ: 8 × 600, σ: 8 × 600	1

Open in a separate window

Given that our dataset exhibited class imbalance—a known factor that can adversely affect classification performance—we implemented widely recognized techniques, such as oversampling of the minority class and undersampling of the majority class, to counterbalance its effects¹⁶. In each training epoch, we adjusted the training dataset by randomly oversampling the minority class and randomly undersampling the majority class, so that both classes were of equal size while preserving the original training dataset’s total size. Hyperparameter optimization was achieved through comprehensive empirical tests and grid search, leading us to select a batch size of 512, a learning rate of 0.01, and the Adam optimizer. The choice to deploy the EfficientNet-B0 architecture arose from these hyperparameter optimization trials: Among various network scales, kernel sizes, and strides of EfficientNet explored, the default 1-dimensional version of EfficientNet-B0 demonstrated superior performance. To guard against over-fitting, we implemented early stopping during training, contingent upon observed validation loss.

To provide model explainability and further enhance performance, we integrated the FactorECG technique, as detailed in studies by van de Leur et al. (2022) and Wouters et al. (2023)^11,12. FactorECG presents an innovative approach that employs a variational auto-encoder (VAE) architecture to learn the intrinsic factors influencing median beat ECG morphology in an unsupervised manner (Table 1). The VAE comprises two primary components: the encoder, which translates the input ECG data into a condensed latent space, termed ECG factors, and the decoder, which interprets points from this latent space (ECG factors) to approximate the initial data space, aiming to reconstruct the original input data as closely as possible. The VAE’s training objective encompasses a balanced summation of two loss metrics with an appropriate ratio. The first loss metric (reconstruction loss), measures how well the decoded data matches the original data. The second loss metric (Kullback-Leibler Divergence loss) quantifies the deviation of the encoded distribution (ECG factors) from a predetermined distribution, typically a standard Gaussian. By decoding the ECG factors and delineating their impact on median beat ECG morphology, individual ECG factor interpretability becomes feasible. The unsupervised training nature of VAEs allows for capitalizing on expansive datasets and provides an automated method to unveil inherent data structures efficiently. In essence, FactorECG efficiently compresses any ECG to a set number of descriptive, independent factors and can also reproduce or create ECGs using these factors.

We trained FactorECG using the median waveforms from the entire standard 12-lead ECG database of YSH (Figure 2). We divided this dataset (approximately 222,000 ECGs) in a 9:1 ratio to create the training and validation sets for FactorECG development. We explored the essential hyperparameters outlined by van de Leur et al. (2022): the summation ratio (β) between the two loss components and the number of ECG factors¹¹. In our current experiment settings, we found that a β value of 16 and 48 ECG factors yield the most optimal FactorECG model during factor traversal assessments. Consequently, we adopted the model trained with these hyperparameters. To guard against over-fitting, we implemented early stopping during training, contingent upon observed validation loss.

Outcomes

We developed an EfficientNet model to predict CAC score ≥400, a binary classification task. This threshold was selected because CAC score ≥400 is clinically recognized as signifying a high risk of a cardiovascular event, providing a benchmark for early intervention and risk stratification^6,17. Our primary outcome was the performance of the EfficientNet model in the health screening test dataset and the health screening external validation dataset.

Subsequently, ECG factors from FactorECG were employed to construct an XGBoost model to predict a CAC score ≥400. This model was trained, validated, and tested using the same corresponding datasets as the EfficientNet model. Using SHapley Additive exPlanations (SHAP) analysis, we determined which ECG factors had the greatest impact on the prediction. We applied the SHAP method to the test dataset. For the interpretation of the top contributing ECG factors, we utilized a method termed “factor traversals”^11,12: By modulating the values of an individual ECG factor from -4.5 to 4.5, advancing in increments of 1.5 units, and then using the decoder part of the FactorECG to reconstruct the ECG, we were able to overlay these reconstructed ECGs on a single plot. This visualization allowed us to comprehend the variations in ECG morphology attributable to each individual ECG factor.

Furthermore, we evaluated whether incorporating the ECG factors from the VAE into the final linear layer of the EfficientNet could enhance its performance (i.e. The 48 ECG factors derived from the pretrained FactorECG are concatenated with the 1280 features from the AvgPool layer of the EfficientNet, resulting in a 1328-length vector that is passed to the final Linear layer [Table 1]).

Performance evaluation and statistical analysis

We generated receiver operating characteristic (ROC) curves and precision-recall (PR) curves for our AI model, subsequently evaluating the area under the ROC curve (AUROC) and the area under the PR curve (AUPRC). We identified the optimal cutoff point in the validation dataset by maximizing the Youden J statistics. This cutoff was then applied to both the test and external validation datasets to compute metrics such as accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the F1 score.

We compared dataset characteristics between the model development dataset, test dataset, and external validation dataset. We evaluated the normality of continuous data using the Shapiro-Wilk test, and given that none of the continuous data followed a normal distribution, we employed the Kruskal-Wallis test for comparisons. Categorical data were compared using the chi-square test. Statistical significance was set at P <0.05 for all tests.

Results

Dataset characteristics

Figure 1 outlines the patient flow for the model development and test dataset derived from SH. Between November 2005 and August 2022, a total of 78,549 CAC measurements via CT, not associated with health checkups, were performed on 69,657 distinct patients. Within a 30-day window surrounding these CAC measurements, 207,199 ECGs were extracted, pertaining to 59,825 patients. After applying the exclusion criteria, the model development dataset included 194,964 ECGs from 57,019 patients. Between December 2010 and August 2022, a total of 14,695 CT scans with CAC measurements, taken during health checkups, were performed on 13,394 distinct patients. ECGs were extracted if they were measured during the same visits, and after applying the exclusion criteria, the test dataset comprised 14,242 ECGs from 12,924 patients. Figure 2 illustrates the patient flow diagram for the external validation dataset from YSH. Between April 2020 and August 2022, 729 ECGs from 710 patients that were recorded during health checkups were extracted and labeled with the respective CAC scores.

Table 2 illustrates the characteristics of the datasets. The test dataset and the external validation dataset, having been extracted from health checkup data, represented a healthier spectrum of individuals, characterized by younger ages (53.1 ± 10.0 and 57.1 ± 10.7 vs. 61.8 ± 13.3), lower CAC scores (59.0 ± 212.2 and 81.5 ± 274.8 vs. 295.1 ± 920.9), and a lower proportion of CAC score ≥400 (3.8% and 6.0% vs. 17.8%), relative to the model development dataset.

Table 2.

Dataset characteristics.

	Model development dataset N = 194,963	Health screening test dataset N = 14,242	Health screening external validation dataset N = 729	P-Value
Number of patients	57,019	12,926	710
Sex, male	108,828 (55.8%)	8,502 (59.7%)	427 (58.6%)	<0.001
Age	61.8 ± 13.3	53.1 ± 10.0	57.1 ± 10.7	<0.001
CAC score	295.1 ± 920.9	59.0 ± 212.2	81.5 ± 274.8	<0.001
CAC score ≥ 400	34,637 (17.8%)	542 (3.8%)	44 (6.0%)	<0.001

Open in a separate window

Model performance and interpretation

Figure 3 displays the ROC curves of the models. Our EfficientNet model exhibited strong performance in predicting CAC score ≥400, with AUROCs of 0.780 and 0.807 in the validation and test datasets, respectively. The respective AUPRCs stood at 0.464 and 0.174. The ROC curves of the EfficientNet model when the ECG factors are incorporated into the last linear layer are labeled as “(+FactorECG)” in Figure 3. This integration resulted in a marginal improvement in performance, with AUROCs of 0.786 and 0.815 in the validation and test datasets, respectively. Table 3 presents the performance of the EfficientNet model when the Youden J index reached its maximum. The accuracies were 0.686 and 0.879, sensitivities were 0.737 and 0.451, specificities were 0.674 and 0.896, and PPVs were 0.331 and 0.141, while maintaining high NPVs of 0.922 and 0.976 for the validation and test datasets, respectively. The diminished PPV in the test dataset can likely be attributed to its reduced prevalence of CAC score ≥400.

An external file that holds a picture, illustration, etc.
Object name is 2008f3.jpg

Open in a separate window

Figure 3.

SHAP summary plot.

Table 3.

Peformances at maximum Youden J index.

Metric	Validation dataset	Test dataset	External validation dataset
Accuracy	0.686	0.879	0.824
Sensitivity	0.737	0.451	0.580
Specificity	0.674	0.896	0.839
PPV	0.331	0.146	0.188
NPV	0.922	0.976	0.969
F1 score	0.457	0.221	0.284

Open in a separate window

In the external validation, the model maintained its efficacy, achieving an AUROC of 0.779, AUPRC of 0.247, accuracy of 0.824, sensitivity of 0.580, specificity of 0.839, and a PPV of 0.188, while maintaining a high NPV of at 0.969. This underscores the model’s generalizability to external environments.

The XGBoost model, constructed with ECG factors derived from FactorECG, demonstrated an AUROC of 0.705 in the validation dataset and 0.730 in the test dataset. Figure 4 shows the top 10 ECG factors with the highest feature importance as determined by the SHAP method in the test dataset. Figure 5 provides a visualization of the top three ECG factors (20, 48 and 36), using factor traversals. In Figure 4, we observe that higher values of ECG factor 20, which correspond to inferolateral T-wave inversion as depicted in Figure 5, are associated with an increased predicted risk. Similarly, lower values of ECG factor 48, which correspond to a longer PR interval and ST depression, and higher values of ECG factor 36, which correspond to an increased QRS amplitude and T-wave alterations, are associated with an increased predicted risk.

An external file that holds a picture, illustration, etc.
Object name is 2008f4.jpg

Open in a separate window

Figure 4.

ROC curves of the AI models.

An external file that holds a picture, illustration, etc.
Object name is 2008f5.jpg

Open in a separate window

Figure 5.

Factor traversals of important ECG factors. In each graph, the corresponding ECG factor is modulated from -4.5 (represented in blue) to 4.5 (represented in red), advancing in increments of 1.5 units. The line width diminishes as the absolute value of the ECG factor decreases, and the central black line represents an ECG factor value of zero.

Discussion

In this study, we developed an AI-enabled ECG framework that integrates CAC score prediction with visual explanations and tested its efficacy on individuals undergoing health checkups. Our EfficientNet model showed strong performance in predicting a CAC score ≥400 within the health screening test dataset, achieving an AUROC of 0.807, an accuracy of 0.879, and an NPV of 0.976. This performance was further validated using a health screening external validation dataset, where the model yielded an AUROC of 0.779, an accuracy of 0.824, and an NPV of 0.969. Using SHAP analysis of our XGBoost model, which was constructed based on ECG factors from FactorECG, we pinpointed the key ECG factors in the prediction. Additionally, we provided visual interpretations of these significant ECG factors through factor traversals. Furthermore, we demonstrated that incorporating the ECG factors into the final linear layer of the EfficientNet enhanced its performance.

The recent application of AI techniques to ECGs has facilitated the automatic classification or diagnosis of various cardiac diseases, such as arrhythmia and ischemia^18-20. Moreover, with the leverage of deep convolutional neural networks on ECGs, numerous AI models have emerged, proficiently identifying diseases and conditions that were previously undetectable through conventional ECG interpretation¹⁹. Importantly, many of these advanced AI models have demonstrated their effectiveness through rigorous prospective validations and clinical trials. For example, Attia et al., (2019) developed an AI-enabled ECG algorithm capable of identifying patients with atrial fibrillation during normal sinus rhythm, while Noseworthy et al., (2022) found in a prospective trial that the AI-guided targeted screening of atrial fibrillation with ECGs actually resulted in a significant increase in atrial fibrillation detection rates, particularly among those classified as high-risk by the algorithm^21,22. Moreover, Attia et al., (2019) developed an AI-enabled ECG algorithm capable of identifying patients at a high likelihood of low ejection fraction, while Yao et al., (2021) found in a pragmatic randomized clinical trial that the usage of this AI-powered clinical decision support tool significantly improved the early diagnosis of patients with low ejection fraction in routine primary care settings^23,24.

In this context, in a prior study, we pursued objectives similar to those of the current research, specifically developing an AI model to predict CAC using only ECGs²⁵. However, a major drawback of the previous study was that the model was neither developed nor validated in a truly subclinical or healthy population. Consequently, it remains uncertain whether the model is equally effective across a diverse range of individuals (including those who are subclinical or healthy), or if its performance is biased toward patients with advanced disease states, such as those already diagnosed with CHD or those undergoing CACS measurements via CT for specific indications (e.g., symptomatic individuals). Opportunistic CAC detection offered by AI-enhanced ECGs would only be beneficial for asymptomatic or subclinical populations, because it can act as an initial screening tool further leading to confirmatory tests and preventative strategies. However, for those already exhibiting symptoms or diagnosed with CHD, detailed evaluations that go beyond ECGs, such as coronary CT scans, coronary angiograms or exercise stress tests, are inherently crucial, thereby reducing the significance of AI-based ECG insights for such a population. To address this limitation, in the current study, we tested our AI model on a health screening dataset comprising individuals who had both ECG and CT measurements of CAC as part of their health checkup evaluations. Health checkups embody the principle of preventive medicine and predominantly target asymptomatic and seemingly healthy individuals, serving as a proactive screening measure to detect and identify potential underlying diseases at their nascent stages, thereby facilitating timely intervention and management. The CT measurements of CAC during health checkups are not indication-based, implying that they are not performed due to the presence of concerning signs or symptoms but rather as a part of a screening to ensure health and wellness. The fact that our AI model was validated with high performance in this health screening cohort underscores its potential utility. With our model, there’s a newfound capability to detect previously unrecognized subclinical coronary artery disease, thereby offering an early risk stratification. This proactive approach could be pivotal in preventing catastrophic outcomes, such as ACS, in those who might otherwise remain undiagnosed until advanced stages of the disease.

Another limitation of our prior study was the absence of interpretability. We had not incorporated any methodology to elucidate the AI model’s predictions. To address this limitation, in the current study, we adopted the FactorECG technique that utilizes a VAE architecture to condense any ECG data into a predetermined number of descriptive, independent factors^11,12. This approach allowed us to mitigate the “black box” issue prevalent in traditional end-to-end deep learning techniques, enabling us to provide quantifiable visual interpretations of the temporal and morphological ECG changes linked to our prediction task. Through our factor traversal analysis, we found potential ECG changes such as inferolateral T-wave inversion, extended PR interval, ST depression, elevated QRS amplitude, and T-wave alterations that might be associated with CAC score ≥400. Rather than a single ECG factor playing a dominant role in the prediction, it is plausible that multiple ECG factors collectively influence the outcome. Hence, further consensus and discussion are necessary when interpreting factor traversal results. Analyzing factor traversal at the individual level is essential for providing tailored interpretations of ECG factors, taking into account each patient’s unique medical circumstances. Additionally, we showed that when these ECG factors were integrated into the training of EfficientNet (specifically in the last linear layer), there was a marginal improvement in its performance. Further research is needed on how to better integrate these ECG factors to optimize performance more effectively.

A further limitation of our prior study was the limited dataset size used for model training and validation. In our previous study, the dataset was considerably smaller, with only 8,178 ECGs used for training and validation. In contrast, the current research employed a much larger dataset, encompassing over 200,000 ECGs. The usage of such an expanded dataset can have multifaceted benefits for AI model development. Firstly, a larger dataset reduces the risk of overfitting, which means the model will likely generalize better to unseen data. Moreover, it allows for the representation of a broader range of variabilities in the ECGs, capturing subtle nuances and patterns that might be overlooked in smaller datasets²⁰. Furthermore, a comprehensive dataset enhances the robustness of the model, making it more reliable in diverse real-world settings. The increased confidence in model performance, stemming from a large dataset, can therefore facilitate more widespread clinical adoption and trust.

CHD is a leading cause of death globally. In 2020, approximately 20.5 million adults in the United States (US) were diagnosed with CHD, resulting in over 380,000 deaths². The estimated annual incidence of myocardial infarction in the US includes 605,000 new cases and 200,000 recurrent cases². Recently, there has been an increased focus on early detection and prevention of CHD. Of the various imaging modalities researched for this purpose, the role of CAC as assessed by CT in cardiovascular risk management is notably well-established². CAC serves as a marker for coronary artery atherosclerosis due to its strong association with atherosclerotic plaque formation. Many studies confirm that CAC is indicative of subclinical atherosclerosis^3,4. Particularly, Bergström et al. (2021) observed CAC scores in individuals without known CHD and discovered that those with a CAC score >400 all exhibited atherosclerosis, with 45.7% showing significant stenosis³. The predictive power of CAC for cardiovascular events has been widely corroborated^5-7. Specifically, patients with a CAC score of 1 to 100 exhibited a hazard ratio of 3.61 for a coronary event when compared to those with a CAC score of zero (p <0.001)⁶. Scores exceeding 100 had an even greater hazard ratio of 7.73 (p <0.001)⁶. Adjusting for factors such as age, sex, and traditional cardiovascular risk factors, asymptomatic individuals with a CAC score ≥1000 faced a 5-fold higher CHD mortality risk compared to those with CAC =0⁷. For individuals with a CAC score of 0, the 10-year coronary event risk is between 1.1-1.7%⁵. However, this risk increases to 22.5-28.6% for scores of 400 or higher and surges to 37.0% for scores beyond 1000⁵. Several guidelines recommend CAC scoring for those at intermediate or borderline risk when decisions regarding statin therapy remain uncertain after traditional risk assessment^8,9. Still, the consistent use of CAC scoring faces challenges such as the high costs of CT scans, potential radiation exposure, and limited accessibility. Our model’s capability to predict CAC using only ECGs, an affordable and ubiquitous screening tool, holds significant clinical value in identifying previously undetected subclinical CAC.

In our research, for model development, ECGs were utilized if they were taken within a 30-day interval surrounding CAC measurements. We chose this 30-day window since CAC is a gradually progressing condition, making it unlikely for significant changes to be observed within a month²⁶. While a longer time window might provide more samples, it also increases the potential for substantial variation in the CAC score. Thus, we chose a time window of 30 days to balance between these trade-offs.

Our AI-enhanced ECG framework has numerous strengths. First, it solely requires ECGs without additional clinical data, optimizing its real-world applicability. Second, we included all ECGs, irrespective of medical anomalies like arrhythmia or ischemia, with exclusions only for lead misplacements, unwanted artifacts, and artificial pacemakers. This inclusion criterion ensures our model’s broad applicability. Third, our model has demonstrated strong performances in external validation, indicating its reliability in diverse settings.

Our study has some limitations. First, our model cannot be considered a definitive test. However, it can serve as a preliminary screening tool guiding subsequent confirmatory tests, promoting early detection. Decisions should be tailored, integrating our model’s predictions with the patient’s unique medical context. Second, our model currently predicts only one CAC level (CAC score ≥400). Since different CAC levels signify varying cardiovascular risks, future work should aim to predict multiple thresholds or even the exact CAC score. Third, it remains uncertain if our model’s predictions can act as independent risk factors for upcoming cardiovascular events. Future research should evaluate this in healthy and subclinical individuals by incorporating longitudinal data and employing statistical methods such as cox regression analysis. Fourth, the dataset used for FactorECG training might have been too small. In the studies by van de Leur et al. (2022) and Wouters et al. (2023), a total of 1,144,331 median beat ECGs from 251,473 unique patients were used for FactorECG training^11,12. However, at the time we conducted our study, we only had access to approximately 222,000 ECGs for FactorECG training. Due to the relatively small size of the dataset used for FactorECG training, the interpretation of some ECG factor might have been ambiguous, and the performance of the XGBoost built on ECG factors might have been relatively low. We plan to retrain FactorECG using a dataset that is at least ten times larger in future research. This will allow us to further explore model explainability. Moreover, we intend to delve deeper into researching how the extracted ECG factors can be integrated with existing end-to-end algorithms, such as EfficientNet, to optimize performance.

Conclusion

In conclusion, we developed an AI-enabled ECG framework that integrates CAC score prediction with visual explanations and tested its efficacy on individuals undergoing health checkups. Our AI model reliably predicted a CAC score ≥400 within the health screening dataset. This reliability was further validated using a health screening external validation dataset, confirming the model’s adaptability. Through SHAP analysis, we identified critical ECG factors in the prediction, and we also offered visual interpretations using factor traversals. Further investigation is required to determine whether AI-enabled CAC predictions can act as independent risk factors for cardiovascular events in healthy and subclinical individuals.

Figures & Table

References

1. Bentzon JF, Otsuka F, Virmani R, Falk E. Mechanisms of plaque formation and rupture. Circ Res. 2014 Jun 6;114(12):1852–66. [PubMed] [Google Scholar]

2. Tsao CW, Aday AW, Almarzooq ZI, Anderson CAM, Arora P, Avery CL, et al. Heart Disease and Stroke Statistics-2023 Update: A Report From the American Heart Association. Circulation. 2023 Feb 21;147(8):e93–621. [PubMed] [Google Scholar]

3. Bergström G, Persson M, Adiels M, Björnson E, Bonander C, Ahlström H, et al. Prevalence of Subclinical Coronary Artery Atherosclerosis in the General Population. Circulation. 2021 Sep 21;144(12):916–29. [PMC free article] [PubMed] [Google Scholar]

4. Gatto L, Prati F. Subclinical atherosclerosis: how and when to treat it? Eur Heart J Suppl. 2020 Jun;22(Suppl E):E87–90. [PMC free article] [PubMed] [Google Scholar]

5. Hecht HS. Coronary artery calcium scanning: past, present, and future. JACC Cardiovasc Imaging. 2015 May;8(5):579–96. [PubMed] [Google Scholar]

6. Detrano R, Guerci AD, Carr JJ, Bild DE, Burke G, Folsom AR, et al. Coronary calcium as a predictor of coronary events in four racial or ethnic groups. N Engl J Med. 2008 Mar 27;358(13):1336–45. [PubMed] [Google Scholar]

7. Peng AW, Mirbolouk M, Orimoloye OA, Osei AD, Dardari Z, Dzaye O, et al. Long-Term All-Cause and Cause-Specific Mortality in Asymptomatic Patients With CAC ≥1000: Results From the CAC Consortium. JACC Cardiovasc Imaging. 2020 Jan;13(1 Pt 1):83–93. [PMC free article] [PubMed] [Google Scholar]

8. Grundy SM, Stone NJ, Bailey AL, Beam C, Birtcher KK, Blumenthal RS, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA Guideline on the Management of Blood Cholesterol: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2019 Jun 18;139(25):e1082–143. [PMC free article] [PubMed] [Google Scholar]

9. Arnett DK, Blumenthal RS, Albert MA, Buroker AB, Goldberger ZD, Hahn EJ, et al. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2019 Sep 10;140(11):e596–646. [PMC free article] [PubMed] [Google Scholar]

10. Yoon D, Jang JH, Choi BJ, Kim TY, Han CH. Discovering hidden information in biosignals from patients using artificial intelligence. Korean J Anesthesiol. 2020 Aug;73(4):275–84. [PMC free article] [PubMed] [Google Scholar]

11. van de Leur RR, Bos MN, Taha K, Sammani A, Yeung MW, van Duijvenboden S, et al. Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders. Eur Heart J Digit Health. 2022 Sep;3(3):390–404. [PMC free article] [PubMed] [Google Scholar]

12. Wouters PC, van de Leur RR, Vessies MB, van Stipdonk AMW, Ghossein MA, Hassink RJ, et al. Electrocardiogram-based deep learning improves outcome prediction following cardiac resynchronization therapy. Eur Heart J. 2023 Feb 21;44(8):680–92. [PMC free article] [PubMed] [Google Scholar]

13. Goldberger E. A simple, indifferent, electrocardiographic electrode of zero potential and a technique of obtaining augmented, unipolar, extremity leads. Am Heart J. 1942 Apr;23(4):483–92. [Google Scholar]

14. Einthoven W. Einthoven über das Elektrokardiogramm. Pflugers Arch. 1908 May;122(12):517–84. [Google Scholar]

15. Tan M, Le QV. EfficientNet: Rethinking model scaling for convolutional Neural Networks. 2019. Available from: https://arxiv.org/abs/1905.11946.

16. Buda M, Maki A, Mazurowski MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018 Oct;106:249–59. [PubMed] [Google Scholar]

17. Budoff MJ, Shaw LJ, Liu ST, Weinstein SR, Mosler TP, Tseng PH, et al. Long-term prognosis associated with coronary calcification: observations from a registry of 25,253 patients. J Am Coll Cardiol. 2007 May 8;49(18):1860–70. [PubMed] [Google Scholar]

18. Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun. 2020 Apr 9;11(1):1760. [PMC free article] [PubMed] [Google Scholar]

19. Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021 Jul;18(7):465–78. [PMC free article] [PubMed] [Google Scholar]

20. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019 Jan;25(1):65–9. [PMC free article] [PubMed] [Google Scholar]

21. Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019 Sep 7;394(10201):861–7. [PubMed] [Google Scholar]

22. Noseworthy PA, Attia ZI, Behnken EM, Giblon RE, Bews KA, Liu S, et al. Artificial intelligence-guided screening for atrial fibrillation using electrocardiogram during sinus rhythm: a prospective non-randomised interventional trial. Lancet. 2022 Oct 8;400(10359):1206–12. [PubMed] [Google Scholar]

23. Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med. 2019 Jan;25(1):70–4. [PubMed] [Google Scholar]

24. Yao X, Rushlow DR, Inselman JW, McCoy RG, Thacher TD, Behnken EM, et al. Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial. Nat Med. 2021 May;27(5):815–9. [PubMed] [Google Scholar]

25. Han C, Kang KW, Kim TY, Uhm JS, Park JW, Jung IH, et al. Artificial Intelligence-Enabled ECG Algorithm for the Prediction of Coronary Artery Calcification. Front Cardiovasc Med. 2022 Apr 6;9:849223. [PMC free article] [PubMed] [Google Scholar]

26. Shen YW, Wu YJ, Hung YC, Hsiao CC, Chan SH, Mar GY, et al. Natural course of coronary artery calcium progression in Asian population with an initial score of zero. BMC Cardiovasc Disord. 2020 May 6;20(1):212. [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association