Skip to main content

ORIGINAL RESEARCH article

Front. Endocrinol.
Sec. Clinical Diabetes
Volume 15 - 2024 | doi: 10.3389/fendo.2024.1444282
This article is part of the Research Topic Artificial Intelligence for Diabetes Related Complications and Metabolic Health View all articles

Enhancing Type 2 Diabetes Mellitus Prediction by Integrating Metabolomics and Tree-Based Boosting Approaches

Provisionally accepted
  • 1 Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya, Türkiye
  • 2 King Khalid University, Abha, Saudi Arabia
  • 3 Department of Anesthesiology and Reanimation, Faculty of Medicine, Inonu University, Malatya, Türkiye
  • 4 Department of Physiology, College of Medicine, King Khalid University, Abha, Saudi Arabia
  • 5 Department of Teacher Education, NLA Høgskolen, Linstows gate 3, 0166, Oslo, Norway, Oslo, Norway

The final, formatted version of the article will be published soon.

    Background: Type 2 diabetes mellitus (T2DM) is a global health problem characterized by insulin resistance and hyperglycemia. Early detection and accurate prediction of T2DM is crucial for effective management and prevention. This study explores the integration of machine learning (ML) and explainable artificial intelligence (XAI) approaches based on metabolomics panel data to identify biomarkers and develop predictive models for T2DM. Methods: Metabolomics data from T2DM (n = 31) and healthy controls (n = 34) were analyzed for biomarker discovery (mostly amino acids, fatty acids, and purines) and T2DM prediction. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression to enhance the model's accuracy and interpretability. Advanced three tree-based ML algorithms (KTBoost: Kernel-Tree Boosting; XGBoost: eXtreme Gradient Boosting; NGBoost: Natural Gradient Boosting) were employed to predict T2DM using these biomarkers. The SHapley Additive exPlanations (SHAP) method was used to explain the effects of metabolomics biomarkers on the prediction of the model. Results: The study identified multiple metabolites associated with T2DM, where LASSO feature selection highlighted important biomarkers. KTBoost [Accuracy: 0.938; CI: (0.880-0.997), Sensitivity: 0.971; CI: (0.847-0.999), Area under the Curve (AUC): 0.965; CI: (0.937-0.994)] demonstrated its effectiveness in using complex metabolomics data for T2DM prediction and achieved better performance than other models. According to KTBoost's SHAP, high levels of phenylactate (pla) and taurine metabolites, as well as low concentrations of cysteine, laspartate, and lcysteate, are strongly associated with the presence of T2DM.The integration of metabolomics profiling and XAI offers a promising approach to predicting T2DM. The use of tree-based algorithms, in particular KTBoost, provides a robust framework for analyzing complex datasets and improves the prediction accuracy of T2DM onset. Future research should focus on validating these biomarkers and models in larger, more diverse populations to solidify their clinical utility.

    Keywords: type 2 diabetes, Metabolomics, machine learning, Explainable artificial intelligence, biomarkers, Predictive Modeling

    Received: 05 Jun 2024; Accepted: 28 Oct 2024.

    Copyright: © 2024 Arslan, Yagin, Algarni, Karaaslan, AL-Hashem and Ardigò. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Fatma Hilal Yagin, Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya, Türkiye
    Luca P. Ardigò, Department of Teacher Education, NLA Høgskolen, Linstows gate 3, 0166, Oslo, Norway, Oslo, Norway

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.