Skip to main content

ORIGINAL RESEARCH article

Front. Physiol.
Sec. Computational Physiology and Medicine
Volume 15 - 2024 | doi: 10.3389/fphys.2024.1464744

An ensemble model for predicting dyslipidemia using 3-years continuous physical examination data

Provisionally accepted
Guo Dan Guo Dan 1*Naiwen Zhang Naiwen Zhang 1Xiaolong Guo Xiaolong Guo 1Xiaxia Yu Xiaxia Yu 1Zhen Tan Zhen Tan 2Feiyue Cai Feiyue Cai 2Ping Dai Ping Dai 2Jing Guo Jing Guo 2
  • 1 Shenzhen University, Shenzhen, China
  • 2 Shenzhen University General Hospital, Shenzhen, China

The final, formatted version of the article will be published soon.

    Dyslipidemia has emerged as a significant clinical risk, with its associated complications, including atherosclerosis and ischemic cerebrovascular disease, presenting a grave threat to human well-being. Hence, it holds paramount importance to precisely predict the onset of dyslipidemia. This study aims to use ensemble technology to establish a machine learning model for the prediction of dyslipidemia. This study included three consecutive years of physical examination data of 2,479 participants, and used the physical examination data of the first two years to predict whether the participants would develop dyslipidemia in the third year. Feature selection was conducted through statistical methods and the analysis of mutual information between features. Five machine learning models, including support vector machine (SVM), logistic regression (LR), random forest (RF), K nearest neighbor (KNN) and extreme gradient boosting (XGBoost), were utilized as base learners to construct the ensemble model. Area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA) were used to evaluate the model. Experimental results show that the ensemble model achieves superior performance across several metrics, achieving an AUC of 0.88 ± 0.01 (P < 0.001), surpassing the base learners by margins of 0.04 to 0.20. Calibration curves and DCA exhibited good predictive performance as well. Furthermore, this study explores the minimal necessary feature set for accurate prediction, finding that just the top 12 features were required for dependable outcomes. Among them, HbA1c and CEA are key indicators for model construction. Our results suggest that the proposed ensemble model has good predictive performance and has the potential to become an effective tool for personal health management.Materials and methods

    Keywords: Dyslipidemia, prediction, physical examination data, machine learning, Ensemble model

    Received: 15 Jul 2024; Accepted: 11 Oct 2024.

    Copyright: © 2024 Dan, Zhang, Guo, Yu, Tan, Cai, Dai and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Guo Dan, Shenzhen University, Shenzhen, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.