AUTHOR=Zhang Naiwen , Guo Xiaolong , Yu Xiaxia , Tan Zhen , Cai Feiyue , Dai Ping , Guo Jing , Dan Guo 

TITLE=An ensemble model for predicting dyslipidemia using 3-years continuous physical examination data

JOURNAL=Frontiers in Physiology

VOLUME=Volume 15 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2024.1464744

DOI=10.3389/fphys.2024.1464744

ISSN=1664-042X

ABSTRACT=Dyslipidemia has emerged as a significant clinical risk, with its associated complications, including atherosclerosis and ischemic cerebrovascular disease, presenting a grave threat to human well-being. Hence, it holds paramount importance to precisely predict the onset of dyslipidemia. This study aims to use ensemble technology to establish a machine learning model for the prediction of dyslipidemia. This study included three consecutive years of physical examination data of 2,479 participants, and used the physical examination data of the first two years to predict whether the participants would develop dyslipidemia in the third year. Feature selection was conducted through statistical methods and the analysis of mutual information between features. Five machine learning models, including support vector machine (SVM), logistic regression (LR), random forest (RF), K nearest neighbor (KNN) and extreme gradient boosting (XGBoost), were utilized as base learners to construct the ensemble model. Area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA) were used to evaluate the model. Experimental results show that the ensemble model achieves superior performance across several metrics, achieving an AUC of 0.88 ± 0.01 (P < 0.001), surpassing the base learners by margins of 0.04 to 0.20. Calibration curves and DCA exhibited good predictive performance as well. Furthermore, this study explores the minimal necessary feature set for accurate prediction, finding that just the top 12 features were required for dependable outcomes. Among them, HbA1c and CEA are key indicators for model construction. Our results suggest that the proposed ensemble model has good predictive performance and has the potential to become an effective tool for personal health management.Materials and methods