- 1Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo University, Ningbo, China
- 2Health Science Center, Ningbo University, Ningbo, China
Background: Cardiovascular disease (CVD) has emerged as a global public health concern. Identifying and preventing subclinical atherosclerosis (SCAS), an early indicator of CVD, is critical for improving cardiovascular outcomes. This study aimed to construct interpretable machine learning models for predicting SCAS risk in type 2 diabetes mellitus (T2DM) patients.
Methods: This study included 3084 T2DM individuals who received health care at Zhenhai Lianhua Hospital, Ningbo, China, from January 2018 to December 2022. The least absolute shrinkage and selection operator combined with random forest-recursive feature elimination were used to screen for characteristic variables. Linear discriminant analysis, logistic regression, Naive Bayes, random forest, support vector machine, and extreme gradient boosting were employed in constructing risk prediction models for SCAS in T2DM patients. The area under the receiver operating characteristic curve (AUC) was employed to assess the predictive capacity of the model through 10-fold cross-validation. Additionally, the SHapley Additive exPlanations were utilized to interpret the best-performing model.
Results: The percentage of SCAS was 38.46% (n=1186) in the study population. Fourteen variables, including age, white blood cell count, and basophil count, were identified as independent risk factors for SCAS. Nine predictors, including age, albumin, and total protein, were screened for the construction of risk prediction models. After validation, the random forest model exhibited the best clinical predictive value in the training set with an AUC of 0.729 (95% CI: 0.709-0.749), and it also demonstrated good predictive value in the internal validation set [AUC: 0.715 (95% CI: 0.688-0.742)]. The model interpretation revealed that age, albumin, total protein, total cholesterol, and serum creatinine were the top five variables contributing to the prediction model.
Conclusion: The construction of SCAS risk models based on the Chinese T2DM population contributes to its early prevention and intervention, which would reduce the incidence of adverse cardiovascular prognostic events.
1 Introduction
Type 2 diabetes mellitus (T2DM) is a metabolic disorder characterized by insulin resistance and relative insulin deficiency. In recent years, the prevalence of T2DM has increased steadily, which has become a serious public health issue. Updated estimates for 2021 showed that about 10.5% of the global population had T2DM, a prediction that this figure would increase to 12.2% by 2045 (1). Cardiovascular disease (CVD) is the leading cause of death and disability in T2DM (2, 3). Studies have shown that the risk of CVD in patients with T2DM is two to four times higher than in individuals without diabetes (4, 5). Atherosclerosis (AS), the predominant pathophysiologic process in CVD, may begin early in life and remain latent and asymptomatic for extended periods before progressing to advanced stages. Subclinical atherosclerosis (SCAS) serves as an early indicator of atherosclerotic burden, and its timely recognition can help slow down or prevent the progression to CVD (6). Therefore, the early identification and effective management of SCAS in individuals with T2DM are crucial strategies to mitigate progression to overt CVD, thereby improving life expectancy and quality.
Diagnostic methods for SCAS include angiography, intravascular ultrasound, carotid ultrasound (CUS), computed tomography (CT), and magnetic resonance imaging. Measuring carotid intima-media thickness (CIMT) and coronary artery calcification (CAC) using CUS and CT has become the mainstay for assessing SCAS, owing to their noninvasive and easily accessible nature (7, 8). However, large-scale use of CUS and CT could inevitably lead to the waste of medical resources and increased costs. Thus, establishing an assessment tool capable of screening individuals at high risk for SCAS without the need for imaging examinations is of great significance.
In recent years, artificial intelligence (AI) and machine learning (ML) have increasingly been utilized in the healthcare field (9). Several studies currently employ ML methods to research SCAS. For example, Sánchez-Cabo et al. (10) developed a SCAS risk prediction model for young asymptomatic individuals using four ML algorithms, demonstrating good clinical predictive value with an area under the receiver operating characteristic curve (AUC) of 0.890. Additionally, Núñez et al. (11) used ML methods to identify circulating proteins that can predict SCAS, also showing good clinical predictive value with an AUC of 0.730. However, there are few reports on the risk prediction models for SCAS in T2DM patients. The purpose of this study was to establish SCAS risk prediction models based on interpretable machine learning algorithms, contributing to the early identification of SCAS and guiding appropriate prevention and interventions.
2 Methods
2.1 Participants
This study enrolled 3140 T2DM individuals who had sought medical care through outpatient visits, inpatient admissions, and routine physical examinations at Zhenhai Lianhua Hospital in Ningbo, China, from January 2018 to December 2022. The sample size for this study adhered to the rule of 10 events per variable (12). The demographic data, comorbidities, complications, and biochemical parameters were obtained by questionnaires and laboratory tests. Inclusion criteria: participants aged ≥ 18 years who either self-report T2DM, are undergoing pharmacological treatment for T2DM, or meet the diagnostic criteria of T2DM. These criteria include fasting blood glucose (FBG) levels of ≥ 7.0 mmol/L, 2-hour blood glucose levels of ≥ 11.1 mmol/L, or a glycated hemoglobin level of ≥ 6.5% (13). Exclusion criteria: individuals with other forms of diabetes mellitus, concurrent coronary heart disease or cerebral infarction, acute complications related to diabetes mellitus, malignant tumors, severe liver and kidney function abnormalities, or pregnancy. SCAS was defined as CIMT > 1.0 mm and/or the presence of plaque without clinical manifestations (14). Data with more than 20% missing were excluded (n=56), and those with less than 20% were filled by multiple interpolations (Supplementary Figure 1). Ultimately, 3084 T2DM patients were included in this study. The study’s flow diagram is depicted in Figure 1.
Figure 1 Flow diagram of the study. T2DM, type 2 diabetes mellitus; SHAP, Shapley Additive exPlanations.
2.2 Clinical baseline data
Participants’ general characteristics include gender, age, body mass index, and blood pressure (both systolic and diastolic measurements). Blood cell counts comprise white blood cell count (WBC), neutrophil count, eosinophil count, basophil count (BASO), lymphocyte count (LYC), red blood cell count, hemoglobin, red blood cell distribution width, mean red blood cell volume (MCV), platelet count, platelet distribution width (PDW), and mean platelet volume (MPV). Biochemical indicators encompass total cholesterol (TC), triglycerides, high-density lipoprotein (HDL), low-density lipoprotein (LDL), FBG, total protein (TP), albumin (ALB), aspartate aminotransferase, alanine aminotransferase, gamma-glutamyl transpeptidase (GGT), serum uric acid (SUA), and serum creatinine (SCR).
2.3 Statistical analysis
Kolmogorov-Smirnov assessed sample distribution normality. Normal continuous variables were expressed as means (standard deviation, SD), non-normal continuous variables as median (interquartile range, IQR), and categorical variables as frequency (percentage, %). Between-group analyses involved independent samples t-tests for normal continuous variables, Mann-Whitney U tests for non-normal continuous variables, and chi-square tests for categorical variables. Box plots were used to elucidate the relationship between various metabolic parameters [including atherogenic index of plasma (AIP), Castelli risk index (CRI), metabolic score for insulin resistance (METS-IR), and triglyceride-glucose (TyG) index] and SCAS. The formulas for these parameters were calculated as follows: AIP = Log(TG/HDL); CRI = TC/HDL; METS-IR = Ln((2 * FBG + TG) * BMI)/(Ln(HDL)); TyG = Ln[(TG * FBG)/2]. Multivariate logistic regression identified independent risk factors for SCAS. Restricted cubic spline was employed to analyze the dose-response relationship betweent AIP and SCAS.
Least absolute shrinkage and selection operator (LASSO) combined with random forest-recursive feature elimination (RF-RFE) were used to screen for characteristic variables. Six ML methods, including linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost), were used to model construction. The primary parameters used to evaluate the effectiveness of risk prediction models included accuracy, sensitivity, specificity, precision, recall, and the F1 score. AUC was utilized to assess the models’ predictive ability. Calibration curves and the Brier score were used to assess calibration capability, while decision curve analysis (DCA) was employed to evaluate clinical applicability. Additionally, the Shapley Additive exPlanations (SHAP) was used to interpret the best predictive model.
All statistical analyses were conducted using Python (https://www.python.org/, version: 3.9.0) and R (https://cran.r-project.org/, version: 4.1.3). All tests were two-sided and P < 0.05 was deemed statistically significant.
3 Results
3.1 Clinical baseline information of the study population
A total of 3084 participants were enrolled in this study, comprising 1898 individuals with T2DM without SCAS, and 1186 individuals with T2DM with SCAS. The percentage of SCAS in the T2DM population was found to be as high as 38.46%. The median age of participants was 56 years (IQR: 49-61). Participants in the SCAS group were older, with a median age of 58 years (IQR: 53-62), compared to 54 years (IQR: 46-60) in the control group. The male proportion was similar in both groups (74.6% in the SCAS group vs. 73.8% in the control group, P > 0.05). Additionally, statistically significant differences were observed between the groups in terms of routine blood tests, lipid and glucose levels, and liver and kidney function (P < 0.05). The baseline clinical characteristics of the study population are presented in Table 1.
The AIP, CRI, METS-IR, and TyG index are metabolism-related parameters commonly used in the diagnosis and risk assessment of metabolism-related diseases (15–18). The current study showed that three metabolism-related parameters, including AIP, CRI, and TyG, were significantly higher in the SCAS group than in the control group (P < 0.05) (Figure 2).
Figure 2 Association of four metabolism-related parameters with risk of SCAS. AIP, atherogenic index of plasma; CRI, Castelli risk index; TyG, triglyceride-glucose; METS-IR, metabolic score for insulin resistance; SCAS, subclinical atherosclerosis.
3.2 Independent risk factors
Nineteen potential risk factors associated with SCAS were initially screened by univariate analysis (P < 0.05) (Table 1). To ensure the accuracy and credibility of the findings, we calculated the variance inflation factor (VIF) for each variable and considered to exhibit lower multicollinearity when their VIF was below 10 (Supplementary Figure 2). Afterward, we performed stepwise backward logistic regression analysis with the Akaike information criterion to filter and remove multicollinear variables. Ultimately, fifteen variables were included in the multivariate logistic regression analysis, and the final fourteen variables such as Age, WBC, BASO, and LYC (P < 0.05) were identified as independent risk factors for SCAS (Figure 3).
Figure 3 Multivariate logistic regression analysis of subclinical atherosclerosis. WBC, white blood cell count; BASO, basophil count; LYC, lymphocyte count; RDW, red blood cell distribution width; MCV, mean red blood cell volume; PDW, platelet distribution width; MPV, mean platelet volume; HDL, high-density lipoprotein; FBG, fasting blood glucose; TP, total protein; GGT, gamma-glutamyl transpeptidase; SUA, serum uric acid; SCR, serum creatinine; AIP, atherogenic index of plasma.
Based on the independent risk factors, we proceeded to explore the correlation between the variables (Figure 4). From the correlation analysis, we observed a negative correlation between AIP and Age (r = -0.24, P < 0.01), MCV (r = -0.13, P < 0.01), and HDL (r = -0.69, P < 0.01). Additionally, positive correlations were observed between AIP and WBC (r = 0.14, P < 0.01), GGT (r = 0.28, P < 0.01), and SUA (r = 0.27, P < 0.01).
Figure 4 Correlation analysis between the variables. MCV, mean red blood cell volume; HDL, high-density lipoprotein; PDW, platelet distribution width; MPV, mean platelet volume; FBG, fasting blood glucose; BASO, basophil count; AIP, atherogenic index of plasma; WBC, white blood cell count; LYC, lymphocyte count; GGT, gamma-glutamyl transpeptidase; SCR, serum creatinine; SUA, serum uric acid; TP, total protein. *P < 0.05; **P < 0.01.
To further assess the clinical applicability of AIP, we conducted a diagnostic experiment and a dose-response relationship study. The result of the diagnostic experiment (Figure 5A) revealed that although AIP holds promise as a potential biomarker for SCAS, its diagnostic value was moderate (AUC: 0.535). The dose-response relationship (Figure 5B) demonstrated a linear correlation between AIP and the risk of SCAS prevalence (P-overall < 0.001, P-non-linear = 0.319), with a significant increase in risk observed when AIP was greater than 0.625.
Figure 5 Receiver operating characteristic (ROC) curve and dose-response relationship between AIP and subclinical atherosclerosis. (A) ROC curve; (B) Dose-response relationship. AIP, atherogenic index of plasma.
3.3 Construction of risk prediction models
The study population was divided into training and internal validation sets at a 6:4 ratio. The basic characteristics of the participants in the two sets did not differ (Table 2). LASSO enables a data dimensionality reduction algorithm that screens feature predictors by constructing a penalty function that compresses regression coefficients to zero (19). RF-RFE is a recursive backward feature elimination method that evaluates the importance of variables and progressively removes the least important ones, ultimately screening the optimal number of features (20). In the training set, LASSO combined with RF-RFE was applied to screen the most characteristic variables for SCAS (Figures 6A, B). Subsequently, the common variables screened by both algorithms were selected as predictors for constructing the SCAS risk prediction models, which included Age, FBG, TC, HDL, LDL, TP, ALB, SUA, and SCR (Figure 6C). To determine the optimal risk prediction model, six machine learning algorithms, namely LDA, LR, NB, RF, SVM, and XGboost, were employed to construct risk prediction models.
Figure 6 Screening of characteristic predictors. (A) Characteristic variables screening based on LASSO (lambda: 1SE); (B) Characteristic variables screening based on RF-RFE; (C) LASSO combined RF-RFE. LASSO, least absolute shrinkage and selection operator; SE, standard error; RF-RFE, random forest-recursive feature elimination.
3.4 Validation of risk prediction models
Within the training set, 10-fold cross-validation was employed to evaluate the predictive value of the models and showed that the RF model had the best clinical predictive value [AUC: 0.729 (95% CI: 0.709-0.749)], followed by the SVM model [AUC: 0.720 (0.705-0.735)](Figure 7A). In the internal validation set, the RF model also demonstrated a good clinical predictive value [AUC: 0.715 (95% CI: 0.688-0.742)](Figure 7B). Furthermore, a comprehensive comparison of other clinical performance parameters, such as sensitivity and specificity, was conducted among the prediction models (Table 3). From the table, we observed that the RF model exhibits excellent performance in various parameters in the training set. The confusion matrix of the six machine learning models in the training set is shown in Figure 8.
Figure 7 Receiver operating characteristic curve. (A) Training set; (B) Internal validation set. LDA, linear discriminant analysis; LR, logistic regression; NB, Naive Bayes; RF, random forest; SVM, support vector machine; XGboost, extreme gradient boosting.
Figure 8 The confusion matrix of the six machine learning models in the training set. (A) Linear discriminant analysis; (B) Logistic regression; (C) Naive Bayes; (D) Random forest; (E) Support vector machine; (F) Extreme gradient boosting.
The calibration curve visually displays the fit of the risk prediction models. As shown in Figure 9, except for the XGboost and NB models, the predicted values of the other models closely match the theoretical values, demonstrating good clinical calibration.
Figure 9 Calibration curve. (A) Training set; (B) Internal validation set. LDA, linear discriminant analysis; LR, logistic regression; NB, Naive Bayes; RF, random forest; SVM, support vector machine; XGboost, extreme gradient boosting.
DCA was used to assess the clinical applicability of predictive models by showing the relationship between risks and benefits corresponding to different decision-making. In the training set, all six ML models showed good clinical applicability (Figure 10A). Further, we calculated the risk threshold probability for the RF prediction model in the internal validation set, which showed that the RF model was clinically beneficial in the range of 2%-70% (Figure 10B).
Figure 10 Decision curve analysis. (A) Training set; (B) Internal validation set. LDA, linear discriminant analysis; LR, logistic regression; NB, Naive Bayes; RF, random forest; SVM, support vector machine; XGboost, extreme gradient boosting.
3.5 Interpretation of risk prediction model
Based on the aforementioned analysis, we found that the RF prediction model demonstrated outstanding performance in both the training and internal validation sets, with the highest clinical predictive value observed in the training set [AUC: 0.729 (95% CI: 0.709-0.749)] and outperformed others in terms of accuracy, sensitivity, recall, and F1 score. Therefore, we have selected the RF model as the optimal prediction model for further model interpretation. SHAP interpretation is currently an emerging and the most commonly used method for interpreting predictive models in the field of ML, which interprets the model by computing the “contribution value” (Shapley values) of each characteristic predictor (21). Figure 11A depicts the contribution degree of the characteristic predictors to the prediction model, with the top five variables being Age, ALB, TP, TC, and SCR. Moreover, we observed that higher values of Age, TC, and SCR correspond to higher SHAP values and increased disease risk, whereas higher values of ALB and TP result in smaller SHAP values and reduced disease risk (Figure 11B).
Figure 11 Feature importance of random forest model. (A) The importance ranking of the features according to the mean absolute SHAP value; (B) The effect of features on the outcome of the model. ALB, albumin; TP, total protein; TC, total cholesterol; SCR, serum creatinine; FBG, fasting blood glucose; SUA, serum uric acid; LDL, low-density lipoprotein; HDL, high-density lipoprotein; SHAP, Shapley Additive exPlanation.
4 Discussion
This study included a total of 3084 T2DM individuals, of whom 1186 had SCAS. Multivariate logistic regression analysis identified 14 variables, such as Age, WBC, BASO, and LYC (P < 0.05) as independent risk factors for SCAS in T2DM patients. LASSO combined with RF-RFE algorithms revealed nine characteristic variables, including Age, FBG, TC, HDL, LDL, TP, ALB, SUA, and SCR, as predictors for the SCAS risk model. Six ML models were developed and validated for clinical performance. Ultimately, the RF model exhibited the highest clinical predictive value in the training set [AUC: 0.729 (0.709-0.749)] and outperformed in accuracy, sensitivity, recall, and F1 score. The SHAP interpretation of the RF model revealed that Age, ALB, TP, TC, and SCR were the top five variables that made the most significant contributions to the predictive model.
In this study, the percentage of SCAS in the T2DM population was 38.46%, lower than the 43.68% reported by Hashimoto et al. in a Japanese T2DM population (22), which might be related to the region and sample size. Multiple studies have demonstrated an association between the TyG index and the incidence of CVD, coronary artery stenosis, stroke, and AS (23, 24). A meta-analysis has revealed that an elevated TyG index is associated with SCAS and arterial stiffness in the adult population (25). Notably, the I-Lan Longitudinal Aging Study identified an association between the TyG index and SCAS in non-diabetic individuals, but not in those with diabetes (26). Consistent with this finding, our study also found no significant statistical association between the TyG index and SCAS in the T2DM population. AIP has emerged as a novel predictive biomarker for CVD. Associations have been identified between elevated AIP levels and increased incidences of CAC and AS (27, 28). In this study, we observed that for every 0.1 unit increase in AIP, the risk of SCAS increased by 0.31-fold [OR: 1.310 (1.201-1.401)]. However, the receiver operating characteristic curve indicated an average diagnostic value for AIP (AUC: 0.535).
Age, PDW, MPV, SUA, and GGT were observed as independent risk factors for SCAS, consistent with previous studies (29–33). Inflammation-related markers such as WBC, BASO, and LYC, were also found to be independent risk factors for SCAS. Long-term studies have shown that AS has a complex pathogenesis, primarily attributed to lipoprotein retention in the arterial wall and chronic inflammation (34, 35). Hyperglycemia leads to increased inflammasome activity, upregulated nucleotide-binding oligomerization domain-like receptor 3, and ultimately elevated pro-inflammatory interleukin1β and interleukin 18 levels (36). Our study further confirms that SCAS in T2DM is a chronic inflammatory condition. Dyslipidemia is a well-established independent risk factor for CVD. In our study, we observed that HDL is an independent risk factor for SCAS. While early research consistently demonstrated an inverse correlation between HDL levels and CVD risk (37, 38), more recent studies have unveiled a non-linear, U-shaped relationship, with very high HDL levels associated with cardiovascular mortality (39, 40).
Optimizing approaches for early diagnosis of SCAS and providing earlier and more precise interventions are crucial to reducing adverse cardiovascular events. Currently, CUS and CT examinations are the primary methods for screening SCAS, but massive generalization inevitably leads to the wastage of medical resources and increased costs, particularly in low-income countries with limited resources. In recent years, with the growing demand for high-quality healthcare, AI has become a powerful tool in clinical medicine. ML, as a branch of AI, was able to analyze large datasets, find complex patterns, and generate insights that contribute to early disease diagnosis, drug discovery, and risk prediction (41, 42). For instance, a study based on electronic health records used ML to generate an in-silico marker for coronary artery disease (CAD) that can non-invasively quantify AS and risk of death on a continuous spectrum, and identify underdiagnosed individuals (43). In addition, Ninomiya et al. (44) developed ML models to predict 5-year all-cause mortality in patients with CAD and assessed ML’s benefit in guiding decision-making between percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG). The results showed that the hybrid gradient boosting model was the most effective for predicting 5-year all-cause mortality (C-indexes of 0.78) and that ML is feasible and effective for identifying individuals who benefit from CABG or PCI. In this study, we have developed risk prediction models for SCAS in T2DM patients based on interpretable machine learning methods that could contribute to the early identification of high-risk individuals.
Our study carries significant clinical importance. This might be one of the initial studies to perform SCAS risk prediction in the T2DM population using interpretable ML methods. As a chronic condition, SCAS is challenging to reverse once it develops, emphasizing the effectiveness of early prevention over active treatment. This prediction model enables the identification of high-risk individuals with SCAS within the T2DM population, providing a valuable advantage for early disease prevention. Moreover, the prediction model could bring benefits not only to medically underdeveloped regions but also to inform the clinical decisions of physicians, thus contributing to the optimization of healthcare resources.
This study has certain unavoidable limitations. Firstly, the study population was limited to a specific region, which might impact the generalizability of the prediction model. Secondly, the collection of clinical data lacked comprehensiveness, which may have led to the omission of potential predictors. Thirdly, the risk prediction model has only undergone validation using internal datasets, necessitating further validation with external datasets. In future studies, we will conduct a long-term follow-up study and collaborate with multiple centers to further revise and improve the model.
5 Conclusions
In summary, the development, validation, and interpretation of the SCAS risk prediction model in a Chinese T2DM population has significant implications for the reduction and prevention of adverse cardiovascular events.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.
Ethics statement
The study protocol adhered to the Declaration of Helsinki and received approval from the Ethics Committee of the Affiliated Hospital of Medical School, Ningbo University, (In March 2023, renamed as The First Affiliated Hospital of Ningbo University), Ningbo, China (KY20220607). Informed consent was obtained from all participants, and the study data were anonymized.
Author contributions
XT: Data curation, Formal analysis, Methodology, Visualization, Writing – original draft. YS: Data curation, Formal analysis, Methodology, Writing – review & editing. GH: Conceptualization, Data curation, Methodology, Validation, Visualization, Writing – review & editing. YM: Conceptualization, Data curation, Funding acquisition, Supervision, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This project was supported by the Ningbo Natural Science Foundation (2018A610248 and 2022J233), Ningbo Medical and Health Leading Academic Discipline Project (2022-F24), Zhejiang Medicine and Health Technology Project (2018ZH029 and 2020KY871), Major Project for Science and Technology Innovation 2025 (2019B10035), Ningbo Social Development (2019C50080), and Ningbo Social Welfare Research (2022S047).
Acknowledgments
We thank Zhongwei Zhu of Ningbo Zhenhai Lianhua Hospital for his long-term support of this study.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2024.1332982/full#supplementary-material
References
1. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract (2022) 183:109119. doi: 10.1016/j.diabres.2021.109119
2. Einarson TR, Acs A, Ludwig C, Panton UH. Prevalence of cardiovascular disease in type 2 diabetes: a systematic literature review of scientific evidence from across the world in 2007-2017. Cardiovasc Diabetol (2018) 17:83. doi: 10.1186/s12933-018-0728-6
3. Tsao CW, Aday AW, Almarzooq ZI, Anderson CAM, Arora P, Avery CL, et al. Heart disease and stroke statistics-2023 update: A report from the American heart association. Circulation (2023) 147:e93–e621. doi: 10.1161/cir.0000000000001123
4. Sarwar N, Gao P, Seshasai SR, Gobin R, Kaptoge S, Di Angelantonio E, et al. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: A collaborative meta-analysis of 102 prospective studies. Lancet (2010) 375:2215–22. doi: 10.1016/s0140-6736(10)60484-9
5. Gregg EW, Sattar N, Ali MK. The changing face of diabetes complications. Lancet Diabetes Endocrinol (2016) 4:537–47. doi: 10.1016/s2213-8587(16)30010-9
6. Singh SS, Pilkerton CS, Shrader CD Jr., Frisbee SJ. Subclinical atherosclerosis, cardiovascular health, and disease risk: Is there a case for the Cardiovascular Health Index in the primary prevention population? BMC Public Health (2018) 18:429. doi: 10.1186/s12889-018-5263-6
7. Zaid M, Fujiyoshi A, Kadota A, Abbott RD, Miura K. Coronary artery calcium and carotid artery intima media thickness and plaque: Clinical use in need of clarification. J Atheroscler Thromb (2017) 24:227–39. doi: 10.5551/jat.RV16005
8. Jeevarethinam A, Venuraju S, Dumo A, Ruano S, Mehta VS, Rosenthal M, et al. Relationship between carotid atherosclerosis and coronary artery calcification in asymptomatic diabetic patients: A prospective multicenter study. Clin Cardiol (2017) 40:752–8. doi: 10.1002/clc.22727
9. Prakash S, Balaji JN, Joshi A, Surapaneni KM. Ethical conundrums in the application of artificial intelligence (AI) in healthcare-A scoping review of reviews. J Pers Med (2022) 12:1914. doi: 10.3390/jpm12111914
10. Sánchez-Cabo F, Rossello X, Fuster V, Benito F, Manzano JP, Silla JC, et al. Machine learning improves cardiovascular risk definition for young, asymptomatic individuals. J Am Coll Cardiol (2020) 76:1674–85. doi: 10.1016/j.jacc.2020.08.017
11. Núñez E, Fuster V, Gómez-Serrano M, Valdivielso JM, Fernández-Alvira JM, Martínez-López D, et al. Unbiased plasma proteomics discovery of biomarkers for improved detection of subclinical atherosclerosis. EBioMedicine (2022) 76:103874. doi: 10.1016/j.ebiom.2022.103874
12. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol (1996) 49:1373–9. doi: 10.1016/s0895-4356(96)00236-3
13. ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. Addendum. 2. Classification and diagnosis of diabetes: Standards of care in diabetes-2023. Diabetes Care (2023) 46:S19–40. doi: 10.2337/dc23-ad08
14. Wu L, Qian L, Zhang L, Zhang J, Zhou J, Li Y, et al. Fibroblast growth factor 21 is related to atherosclerosis independent of nonalcoholic fatty liver disease and predicts atherosclerotic cardiovascular events. J Am Heart Assoc (2020) 9:e015226. doi: 10.1161/jaha.119.015226
15. Lioy B, Webb RJ, Amirabdollahian F. The association between the atherogenic index of plasma and cardiometabolic risk factors: A review. Healthcare (Basel) (2023) 11:966. doi: 10.3390/healthcare11070966
16. Mahdavi-Roshan M, Mozafarihashjin M, Shoaibinobarian N, Ghorbani Z, Salari A, Savarrakhsh A, et al. Evaluating the use of novel atherogenicity indices and insulin resistance surrogate markers in predicting the risk of coronary artery disease: A case−control investigation with comparison to traditional biomarkers. Lipids Health Dis (2022) 21:126. doi: 10.1186/s12944-022-01732-9
17. Zhang X, Liu F, Li W, Zhang J, Zhang T, Yu X, et al. Metabolic score for insulin resistance (METS-IR) predicts adverse cardiovascular events in patients with type 2 diabetes and ischemic cardiomyopathy. Diabetes Metab Syndr Obes (2023) 16:1283–95. doi: 10.2147/dmso.S404878
18. Thai PV, Tien HA, Van Minh H, Valensi P. Triglyceride glucose index for the detection of asymptomatic coronary artery stenosis in patients with type 2 diabetes. Cardiovasc Diabetol (2020) 19:137. doi: 10.1186/s12933-020-01108-2
19. Zheng Z, Si Z, Wang X, Meng R, Wang H, Zhao Z, et al. Risk prediction for the development of hyperuricemia: Model development using an occupational health examination dataset. Int J Environ Res Public Health (2023) 20:3411. doi: 10.3390/ijerph20043411
20. Zhou L, Wang Q, Yin P, Xing W, Wu Z, Chen S, et al. Serum metabolomics reveals the deregulation of fatty acids metabolism in hepatocellular carcinoma and chronic liver diseases. Anal Bioanal Chem (2012) 403:203–13. doi: 10.1007/s00216-012-5782-4
21. Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: A review of machine learning interpretability methods. Entropy (Basel) (2020) 23:18. doi: 10.3390/e23010018
22. Hashimoto Y, Takahashi F, Okamura T, Osaka T, Okada H, Senmaru T, et al. Relationship between serum creatinine to cystatin C ratio and subclinical atherosclerosis in patients with type 2 diabetes. BMJ Open Diabetes Res Care (2022) 10:e002910. doi: 10.1136/bmjdrc-2022-002910
23. Li H, Jiang Y, Su X, Meng Z. The triglyceride glucose index was U-shape associated with all-cause mortality in population with cardiovascular diseases. Diabetol Metab Syndr (2023) 15:181. doi: 10.1186/s13098-023-01153-3
24. da Silva A, Caldas APS, Hermsdorff HHM, Bersch-Ferreira ÂC, Torreglosa CR, Weber B, et al. Triglyceride-glucose index is associated with symptomatic coronary artery disease in patients in secondary care. Cardiovasc Diabetol (2019) 18:89. doi: 10.1186/s12933-019-0893-2
25. Sajdeya O, Beran A, Mhanna M, Alharbi A, Burmeister C, Abuhelwa Z, et al. Triglyceride glucose index for the prediction of subclinical atherosclerosis and arterial stiffness: A meta-analysis of 37,780 individuals. Curr Probl Cardiol (2022) 47:101390. doi: 10.1016/j.cpcardiol.2022.101390
26. Lu YW, Chang CC, Chou RH, Tsai YL, Liu LK, Chen LK, et al. Gender difference in the association between TyG index and subclinical atherosclerosis: Results from the I-Lan Longitudinal Aging Study. Cardiovasc Diabetol (2021) 20:206. doi: 10.1186/s12933-021-01391-7
27. Nam JS, Kim MK, Nam JY, Park K, Kang S, Ahn CW, et al. Association between atherogenic index of plasma and coronary artery calcification progression in Korean adults. Lipids Health Dis (2020) 19:157. doi: 10.1186/s12944-020-01317-4
28. Huang Q, Liu Z, Wei M, Huang Q, Feng J, Liu Z, et al. The atherogenic index of plasma and carotid atherosclerosis in a community population: A population-based cohort study in China. Cardiovasc Diabetol (2023) 22:125. doi: 10.1186/s12933-023-01977-3
29. van den Munckhof ICL, Jones H, Hopman MTE, de Graaf J, Nyakayiru J, van Dijk B, et al. Relation between age and carotid artery intima-medial thickness: a systematic review. Clin Cardiol (2018) 41:698–704. doi: 10.1002/clc.22934
30. Lappegård J, Ellingsen TS, Vik A, Skjelbakken T, Brox J, Mathiesen EB, et al. Red cell distribution width and carotid atherosclerosis progression. Tromsø Study. Thromb Haemost (2015) 113:649–54. doi: 10.1160/th14-07-0606
31. Adam G, Kocak E, Reşorlu M. Evaluation of platelet distribution width and mean platelet volume in patients with carotid artery stenosis: Author's reply. Angiology (2015) 66:380. doi: 10.1177/0003319714565169
32. Gao Y, Xu B, Yang Y, Zhang M, Yu T, Zhang Q, et al. Association between serum uric acid and carotid intima-media thickness in different fasting blood glucose patterns: A case-control study. Front Endocrinol (Lausanne) (2022) 13:899241. doi: 10.3389/fendo.2022.899241
33. Kim YG, Park GM, Lee SB, Yang DH, Kang JW, Lim TH, et al. Association of gamma-glutamyl transferase with subclinical coronary atherosclerosis and cardiac outcomes in non-alcoholics. Sci Rep (2020) 10:17994. doi: 10.1038/s41598-020-75078-6
34. Wojtasińska A, Frąk W, Lisińska W, Sapeda N, Młynarska E, Rysz J, et al. Novel insights into the molecular mechanisms of atherosclerosis. Int J Mol Sci (2023) 24:13434. doi: 10.3390/ijms241713434
35. Morrison AM, Sullivan AE, Aday AW. Atherosclerotic disease: Pathogenesis and approaches to management. Med Clin North Am (2023) 107:793–805. doi: 10.1016/j.mcna.2023.04.004
36. Alfadul H, Sabico S, Ansari MGA, Alnaami AM, Amer OE, Hussain SD, et al. Differences and associations of NLRP3 inflammasome levels with interleukins 1α, 1β, 33 and 37 in adults with prediabetes and type 2 diabetes mellitus. Biomedicines (2023) 11:1315. doi: 10.3390/biomedicines11051315
37. Gordon DJ, Probstfield JL, Garrison RJ, Neaton JD, Castelli WP, Knoke JD, et al. High-density lipoprotein cholesterol and cardiovascular disease. Four prospective Am Stud Circulation (1989) 79:8–15. doi: 10.1161/01.CIR.79.1.8
38. Ko DT, Alter DA, Guo H, Koh M, Lau G, Austin PC, et al. High-density lipoprotein cholesterol and cause-specific mortality in individuals without previous cardiovascular conditions: The CANHEART study. J Am Coll Cardiol (2016) 68:2073–83. doi: 10.1016/j.jacc.2016.08.038
39. Madsen CM, Varbo A, Nordestgaard BG. Extreme high high-density lipoprotein cholesterol is paradoxically associated with high mortality in men and women: two prospective cohort studies. Eur Heart J (2017) 38:2478–86. doi: 10.1093/eurheartj/ehx163
40. Liu C, Dhindsa D, Almuwaqqat Z, Ko YA, Mehta A, Alkhoder AA, et al. Association between high-density lipoprotein cholesterol levels and adverse cardiovascular outcomes in high-risk populations. JAMA Cardiol (2022) 7:672–80. doi: 10.1001/jamacardio.2022.0912
41. Kumar Y, Koul A, Singla R, Ijaz MF. Artificial intelligence in disease diagnosis: A systematic literature review, synthesizing framework and future research agenda. J Ambient Intell Humaniz Comput (2023) 14:8459–86. doi: 10.1007/s12652-021-03612-z
42. Krishnan G, Singh S, Pathania M, Gosavi S, Abhishek S, Parchani A, et al. Artificial intelligence in clinical medicine: Catalyzing a sustainable global healthcare paradigm. Front Artif Intell (2023) 6:1227091. doi: 10.3389/frai.2023.1227091
43. Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez-Luna C, Jordan DM, et al. Machine learning-based marker for coronary artery disease: Derivation and validation in two longitudinal cohorts. Lancet (2023) 401:215–25. doi: 10.1016/s0140-6736(22)02079-7
Keywords: subclinical atherosclerosis, type 2 diabetes mellitus, independent risk factors, interpretable machine learning, prediction model
Citation: Tusongtuoheti X, Shu Y, Huang G and Mao Y (2024) Predicting the risk of subclinical atherosclerosis based on interpretable machine models in a Chinese T2DM population. Front. Endocrinol. 15:1332982. doi: 10.3389/fendo.2024.1332982
Received: 04 November 2023; Accepted: 07 February 2024;
Published: 27 February 2024.
Edited by:
Maria Pompea Antonia Baldassarre, G. d’Annunzio University of Chieti and Pescara, ItalyReviewed by:
Amirmohammad Khalaji, Tehran University of Medical Sciences, IranLiu Ouyang, Georgia State University, United States
Copyright © 2024 Tusongtuoheti, Shu, Huang and Mao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Guoqing Huang, Z3VvcWluZ2h1YW5nMTk5MkAxNjMuY29t; Yushan Mao, bWFveXVzaGFuQG5idS5lZHUuY24=