Skip to main content

ORIGINAL RESEARCH article

Front. Public Health
Sec. Children and Health
Volume 12 - 2024 | doi: 10.3389/fpubh.2024.1414046

Robust identification key predictors of short-and long-term weight status in children and adolescents by machine learning

Provisionally accepted
  • 1 The University of Hong Kong, Pokfulam, Hong Kong, SAR China
  • 2 Department of Health (Hong Kong), Hong Kong, Hong Kong, SAR China

The final, formatted version of the article will be published soon.

    Early identification of high-risk individuals for weight problems in children and adolescents is crucial for implementing timely preventive measures. While machine learning (ML) techniques have shown promise in addressing this complex challenge with high-dimensional data, feature selection is vital for identifying the key predictors that can facilitate effective and targeted interventions. This study aims to utilizing feature selection process to identify a robust and minimal set of predictors that can aid in the early prediction of short-and long-term weight problems in children and adolescents. We utilized demographic, physical, and psychological well-being predictors to model weight status (normal, underweight, overweight, and obese) for 1-, 3-, and 5-year periods. To select the most influential features, we employed four feature selection methods: (1) Chi-Square test; (2) Information Gain; (3) Random Forest; (4) eXtreme Gradient Boosting (XGBoost) with six ML approaches. The stability of the feature selection methods was assessed by Jaccard's index, Spearman's rank correlation and Pearson's correlation. Model evaluation was performed by various accuracy metrics. With 3,862,820 million student-visits were included in this population-based study, the mean age of 11.6 (SD=3.64) for the training set and 10.8 years (SD=3.50) for the temporal test set. From the initial set of 38 predictors, we identified 6, 9, and 13 features for 1-, 3-, and 5-year predictions, respectively, by the best performed feature selection method of Chi-Square test in XGBoost models.These feature sets demonstrated excellent stability and achieved prediction accuracies of 0.82, 0.73, and 0.70; macro-AUCs of 0.94, 0.86, and 0.83; micro-AUCs of 0.96, 0.93, and 0.92 for different prediction windows, respectively. Weight, height, sex, total score of self-esteem, and age were consistently the most influential predictors across all prediction windows. Additionally, several psychological and social well-being predictors showed relatively high importance in long-term weight status prediction. We demonstrate the potential of machine learning in identifying key predictors of weight status in children and adolescents. While traditional anthropometric measures remain important, psychological and social well-being factors also emerge as crucial predictors, potentially informing targeted interventions to address childhood and adolescent weight problems.

    Keywords: Child, Obesity, machine learning, Feature Selection, Feature stability

    Received: 08 Apr 2024; Accepted: 03 Sep 2024.

    Copyright: © 2024 LIU, LENG, WU, CHAU, CHUNG and FONG. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Daniel Yee Tak FONG, The University of Hong Kong, Pokfulam, Hong Kong, SAR China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.