AUTHOR=Zhang Naiwen , Guo Xiaolong , Yu Xiaxia , Tan Zhen , Cai Feiyue , Dai Ping , Guo Jing , Dan Guo TITLE=An ensemble model for predicting dyslipidemia using 3-years continuous physical examination data JOURNAL=Frontiers in Physiology VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2024.1464744 DOI=10.3389/fphys.2024.1464744 ISSN=1664-042X ABSTRACT=Background

Dyslipidemia has emerged as a significant clinical risk, with its associated complications, including atherosclerosis and ischemic cerebrovascular disease, presenting a grave threat to human well-being. Hence, it holds paramount importance to precisely predict the onset of dyslipidemia. This study aims to use ensemble technology to establish a machine learning model for the prediction of dyslipidemia.

Methods

This study included three consecutive years of physical examination data of 2,479 participants, and used the physical examination data of the first two years to predict whether the participants would develop dyslipidemia in the third year. Feature selection was conducted through statistical methods and the analysis of mutual information between features. Five machine learning models, including support vector machine (SVM), logistic regression (LR), random forest (RF), K nearest neighbor (KNN) and extreme gradient boosting (XGBoost), were utilized as base learners to construct the ensemble model. Area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA) were used to evaluate the model.

Results

Experimental results show that the ensemble model achieves superior performance across several metrics, achieving an AUC of 0.88 ± 0.01 (P < 0.001), surpassing the base learners by margins of 0.04 to 0.20. Calibration curves and DCA exhibited good predictive performance as well. Furthermore, this study explores the minimal necessary feature set for accurate prediction, finding that just the top 12 features were required for dependable outcomes. Among them, HbA1c and CEA are key indicators for model construction.

Conclusions

Our results suggest that the proposed ensemble model has good predictive performance and has the potential to become an effective tool for personal health management.