Skip to main content

ORIGINAL RESEARCH article

Front. Endocrinol.
Sec. Systems Endocrinology
Volume 15 - 2024 | doi: 10.3389/fendo.2024.1450317

Advancing Non-Alcoholic Fatty Liver Disease Prediction: A Comprehensive Machine Learning Approach Integrating SHAP Interpretability and Multi-Cohort Validation

Provisionally accepted
  • 1 Department of Gastroenterology and Hepatology, Guizhou Aerospace Hospital, Zunyi, China
  • 2 Technology Innovation Center, Hunan University of Chinese Medicine, Changsha, Anhui Province, China
  • 3 Zunyi Medical University, Zunyi, Guizhou Province, China

The final, formatted version of the article will be published soon.

    Non-alcoholic fatty liver disease (NAFLD) represents a major global health challenge, often undiagnosed because of suboptimal screening tools. Advances in machine learning (ML) offer potential improvements in predictive diagnostics, leveraging complex clinical datasets. We utilized a comprehensive dataset from the Dryad database for model development and training and performed external validation using data from the National Health and Nutrition Examination Survey (NHANES) 2017-2020 cycles. Seven distinct ML models were developed and rigorously evaluated.Additionally, we employed the SHapley Additive exPlanations (SHAP) method to enhance the interpretability of the models, allowing for a detailed understanding of how each variable contributes to predictive outcomes. A total of 14,913 participants were eligible for this study. Among the seven constructed models, the light gradient boosting machine achieved the highest performance, with an area under the receiver operating characteristic curve of 0.90 in the internal validation set and 0.81 in the external NHANES validation cohort. In detailed performance metrics, it maintained an accuracy of 87%, a sensitivity of 92.9%, and an F1 score of 0.92. Key predictive variables identified included alanine aminotransferase, gamma-glutamyl transpeptidase, triglyceride glucose-waist circumference, metabolic score for insulin resistance, and HbA1c, which are strongly associated with metabolic dysfunctions integral to NAFLD progression. The integration of ML with SHAP interpretability provides a robust predictive tool for NAFLD, enhancing the early identification and potential management of the disease. The model's high accuracy and generalizability across diverse populations highlight its clinical utility, though future enhancements should include longitudinal data and lifestyle factors to refine risk assessments further.

    Keywords: non-alcoholic fatty liver disease1, Machine Learning2, SHAP Interpretability3, Light Gradient Boosting Machine4, predictive model5

    Received: 17 Jun 2024; Accepted: 18 Sep 2024.

    Copyright: © 2024 Yang, Lu and Ran. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Bo Yang, Department of Gastroenterology and Hepatology, Guizhou Aerospace Hospital, Zunyi, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.