Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2017 Oct 13;121(9):1092-1101.
doi: 10.1161/CIRCRESAHA.117.311312. Epub 2017 Aug 9.

Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis

Affiliations
Randomized Controlled Trial

Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis

Bharath Ambale-Venkatesh et al. Circ Res. .

Abstract

Rationale: Machine learning may be useful to characterize cardiovascular risk, predict outcomes, and identify biomarkers in population studies.

Objective: To test the ability of random survival forests, a machine learning technique, to predict 6 cardiovascular outcomes in comparison to standard cardiovascular risk scores.

Methods and results: We included participants from the MESA (Multi-Ethnic Study of Atherosclerosis). Baseline measurements were used to predict cardiovascular outcomes over 12 years of follow-up. MESA was designed to study progression of subclinical disease to cardiovascular events where participants were initially free of cardiovascular disease. All 6814 participants from MESA, aged 45 to 84 years, from 4 ethnicities, and 6 centers across the United States were included. Seven-hundred thirty-five variables from imaging and noninvasive tests, questionnaires, and biomarker panels were obtained. We used the random survival forests technique to identify the top-20 predictors of each outcome. Imaging, electrocardiography, and serum biomarkers featured heavily on the top-20 lists as opposed to traditional cardiovascular risk factors. Age was the most important predictor for all-cause mortality. Fasting glucose levels and carotid ultrasonography measures were important predictors of stroke. Coronary Artery Calcium score was the most important predictor of coronary heart disease and all atherosclerotic cardiovascular disease combined outcomes. Left ventricular structure and function and cardiac troponin-T were among the top predictors for incident heart failure. Creatinine, age, and ankle-brachial index were among the top predictors of atrial fibrillation. TNF-α (tissue necrosis factor-α) and IL (interleukin)-2 soluble receptors and NT-proBNP (N-Terminal Pro-B-Type Natriuretic Peptide) levels were important across all outcomes. The random survival forests technique performed better than established risk scores with increased prediction accuracy (decreased Brier score by 10%-25%).

Conclusions: Machine learning in conjunction with deep phenotyping improves prediction accuracy in cardiovascular event prediction in an initially asymptomatic population. These methods may lead to greater insights on subclinical disease markers without apriori assumptions of causality.

Clinical trial registration: URL: http://www.clinicaltrials.gov. Unique identifier: NCT00005487.

Keywords: atrial fibrillation; cardiovascular disease; coronary heart disease; heart failure; machine learning; mortality; stroke.

PubMed Disclaimer

Figures

Figure 1
Figure 1. A flowchart describing the general framework of the study
Models were built using the training dataset, and the test dataset was used for computing the C-index and the Brier Score shown in Table 4.
Figure 2
Figure 2. Plots showing Lowess curves (for continuous variables) and box plots (for categorical variables) of the survival probability vs variable values for the top-5 predictors for each of the outcomes at 12 years
The y-axis represents survival probability calculated from the RF-20 algorithm (range: 0 to 1). The x-axis spans the range (or categories) of the variable of interest. Abbreviations: NT pro-BNP = N-terminal pro-Brain Natriuretic peptide, TNF-α SR = tissue necrosis factor- α soluble receptor, IL2 SR = interleukin-2 soluble receptor, CAC = coronary artery calcium score, LVESV = left ventricle end-systolic volume. Units for each variable: NT pro-BNP – pg/ml, TNF-α SR – pg/ml, IL2 SR – pg/ml, CAC – Agatston’s units, cardiac troponin T – ng/ml, ABI – ratio, age – years, fasting glucose – mg/dl.
Figure 3
Figure 3. Plots showing the variable importance for each of the 735 variables used in analysis
The color of the dots represents the category or type of measurement. The legend on the right provides the phenotype category ordered from left-to-right on the individual plots. The variable importance is measured using the minimum depth of the maximal subtree, with lower values representing greater importance of corresponding variable. Abbreviations: NT pro-BNP = N-terminal pro-Brain Natriuretic peptide, TNF-α SR = tissue necrosis factor- α soluble receptor, IL2 SR = interleukin-2 soluble receptor, CAC = coronary artery calcium score, ABI = ankle-brachial index, IMT = intima media thickness, SBP = systolic blood pressure.
Figure 4
Figure 4. The concordance index for each of the models tested over time
The full models (models with all 735 variables) did not converge for the LASSO-Cox, AIC-Cox and the Cox PHM models, and hence are not shown here. The prediction ability of conventional risk scores for heart failure (MESA HF risk score), cardiovascular disease (AHA/ASCVD risk score) and coronary heart disease (Framingham CHD risk score) are also shown (yellow curve). In general, the C-index for all variables decreased over time.

Comment in

Similar articles

Cited by

References

    1. Lloyd-Jones DM. Cardiovascular Risk Prediction: Basic Concepts, Current Status, and Future Directions. Circulation. 2010;121:1768–1777. - PubMed
    1. Wong ND. Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol. 2014;11:276–289. - PubMed
    1. Gorodeski EZ, Ishwaran H, Kogalur UB, Blackstone EH, Hsich E, Zhang Z-m, Vitolins MZ, Manson JE, Curb JD, Martin LW. Use of Hundreds of Electrocardiographic Biomarkers for Prediction of Mortality in Postmenopausal Women The Women’s Health Initiative. Circulation: Cardiovascular Quality and Outcomes. 2011 CIRCOUTCOMES.110.959023. - PMC - PubMed
    1. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacob DR, Jr, Kronmal R, Liu K, Nelson JC, O’Leary D, Saad MF, Shea S, Szklo M, Tracy RP. Multi-ethnic study of atherosclerosis: objectives and design. Am J Epidemiol. 2002;156:871–81. - PubMed
    1. Akaike H. Likelihood of a model and information criteria. Journal of econometrics. 1981;16:3–14.

Publication types

Associated data

-