Skip to main content

ORIGINAL RESEARCH article

Front. Appl. Math. Stat.
Sec. Mathematical Biology
Volume 11 - 2025 | doi: 10.3389/fams.2025.1490104
This article is part of the Research Topic Advances in Mathematical Biology and Medicine: Modeling, Analysis, and Numerical Solutions View all 3 articles

Diabetes Risk Prediction Model for Community Health Check-up Population Based on Random Forest Algorithm

Provisionally accepted
Lingxia Wu Lingxia Wu *Hongbin Gong Hongbin Gong Limei Fang Limei Fang Chunhua Zhang Chunhua Zhang Hao Wang Hao Wang
  • Ninghai Road Community Health Service Center, Nanjing, China

The final, formatted version of the article will be published soon.

    Purpose: To construct a diabetes risk prediction model among community health check-up populations using the random forest algorithm. Methods: This study included 5,825 individuals who underwent health check-ups at our hospital from January 2023 to December 2023. Diabetes diagnosis was confirmed through medical history surveys and results of glycosylated hemoglobin (HbA1c), oral glucose tolerance tests (OGTT), and fasting plasma glucose (FPG). The check-up population was partitioned into training and test sets with a ratio of 70% to 30%. Feature selection for the training set was conducted using the Lasso algorithm, after which a random forest prediction model was established. The test set served to assess the model's predictive capacity. Evaluation of model performance was based on receiver operating characteristic area (AROC), sensitivity, and specificity. Finally, the fitness and predictive value of the established model were evaluated by calibration and decision curves, respectively. Results: Out of the 5,825 individuals screened in this health check-up, 1,144 were either previously diagnosed or newly diagnosed with diabetes, accounting for a prevalence of 19.64%. Based on the Random Forest algorithm, a DM prediction model was developed. The corresponding evaluation parameters in the training set to the test set are shown below. The AUC was 0.742 (95%CI: 0.723-0.780) to 0.702 (95%CI: 0.673-0.731). The sensitivity was 0.725 to 0.688, and the specificity was 0.603 to 0.607; At last, the accuracy turned out to be 0.627 to 0.623. D-value from the Delong test was 4.2493 (P=0.062), indicating comparable performance between the two datasets. Additionally, the calibration curve results indicated that the prediction curve closely followed the trajectory of the ideal curve, displaying strong consistency. Decision Curve Analysis (DCA) for both sets demonstrated that within the threshold range of 5% to 40%, the Random Forest prediction model provided a moderate additional net benefit in predicting the probability of DM.The prevalence of diabetes in the community health check-up population is relatively high, and the predictive model established based on the RF algorithm is affirmed for its efficacy in predicting the risk of diabetes. Moreover, the model demonstrates good consistency and a high net benefit rate.

    Keywords: health check-up population, Diabetes risk prediction model, model building, Random Forest algorithm, HbA1c

    Received: 02 Sep 2024; Accepted: 29 Jan 2025.

    Copyright: © 2025 Wu, Gong, Fang, Zhang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Lingxia Wu, Ninghai Road Community Health Service Center, Nanjing, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.