Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 13;13(1):15139.
doi: 10.1038/s41598-023-42338-0.

Machine learning approaches for biomarker discovery to predict large-artery atherosclerosis

Affiliations

Machine learning approaches for biomarker discovery to predict large-artery atherosclerosis

Ting-Hsuan Sun et al. Sci Rep. .

Abstract

Large-artery atherosclerosis (LAA) is a leading cause of cerebrovascular disease. However, LAA diagnosis is costly and needs professional identification. Many metabolites have been identified as biomarkers of specific traits. However, there are inconsistent findings regarding suitable biomarkers for the prediction of LAA. In this study, we propose a new method integrates multiple machine learning algorithms and feature selection method to handle multidimensional data. Among the six machine learning models, logistic regression (LR) model exhibited the best prediction performance. The value of area under the receiver operating characteristic curve (AUC) was 0.92 when 62 features were incorporated in the external validation set for the LR model. In this model, LAA could be well predicted by clinical risk factors including body mass index, smoking, and medications for controlling diabetes, hypertension, and hyperlipidemia as well as metabolites involved in aminoacyl-tRNA biosynthesis and lipid metabolism. In addition, we found that 27 features were present among the five adopted models that could provide good results. If these 27 features were used in the LR model, an AUC value of 0.93 could be achieved. Our study has demonstrated the effectiveness of combining machine learning algorithms with recursive feature elimination and cross-validation methods for biomarker identification. Moreover, we have shown that using shared features can yield more reliable correlations than either model, which can be valuable for future identification of LAA.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The flowchart of ML models used for the prediction of LAA. AUC area under the receiver operating characteristic curve; CV cross-validation, LAA large-artery atherosclerosis; ML machine learning; RFECV recursive feature elimination with cross-validation; SVM support vector machine; XGBoost extreme gradient boosting.
Figure 2
Figure 2
The flowchart of recursive feature elimination with cross-validation (RFECV) method.
Figure 3
Figure 3
Receiver operating characteristic curves for the 6 machine learning models evaluated with the external validation set using 3 scales of input features: (A) clinical factors, (B) metabolites, and (C) combination of clinical factors and metabolites. SVM support vector machine, XGBoost extreme gradient boosting.
Figure 4
Figure 4
RFECV curves for the 6 adopted ML models. The red dot-line represents the number of features required to attain the highest AUC value. AUC area under the receiver operating characteristic curve; ML machine learning; RFECV recursive feature elimination with cross-validation; SVM support vector machine; XGBoost extreme gradient boosting.
Figure 5
Figure 5
Feature selection using the RFECV method for the LR algorithm (62 features): (A) receiver operating characteristic curves for tenfold cross-validation on the training set, (B) receiver operating characteristic curves on the external validation set, and (C) confusion matrix for the external validation set. FN false negative; FP false positive; LR logistic regression; NPV negative predictive value; RFECV recursive feature elimination with cross-validation; TN true negative; TP true positive.
Figure 6
Figure 6
Comparison of features shared among 5 machine learning models. SVM support vector machine; XGBoost extreme gradient boosting.
Figure 7
Figure 7
Performance of the 6 predictive models when using the 27 shared features for training: (A) receiver operating characteristic curves of the five models for the external validation set and (B) confusion matrix for the external validation set when using the LR model. FN false negative; FP false positive; NPV negative predictive value; SVM support vector machine; TN true negative; TP true positive; XGBoost extreme gradient boosting.

Similar articles

Cited by

References

    1. Ko, Y., et al. MRI-based Algorithm for Acute Ischemic Stroke Subtype Classification. 2014(2287–6391 (Print)). - PMC - PubMed
    1. Cole, J. W. Large Artery Atherosclerotic Occlusive Disease. 2017(1538–6899 (Electronic)). - PMC - PubMed
    1. Young, J. L., U. Libby P Fau-Schönbeck, & U. Schönbeck. Cytokines in the pathogenesis of atherosclerosis. 2002(0340–6245 (Print)). - PubMed
    1. Chapman, M. J. From pathophysiology to targeted therapy for atherothrombosis: a role for the combination of statin and aspirin in secondary prevention. 2007(0163–7258 (Print)). - PubMed
    1. Stoll, G., & Bendszus, M. Inflammation and atherosclerosis: Novel insights into plaque formation and destabilization. 2006(1524–4628 (Electronic)). - PubMed

Publication types

-