Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 7 - 2024 | doi: 10.3389/frai.2024.1421751

Analyzing Classification and Feature Selection Strategies for Diabetes Prediction across Diverse Diabetes Datasets

Provisionally accepted
  • 1 VIT University, Vellore, Tamil Nadu, India
  • 2 Mayo Clinic, Rochester, Minnesota, United States
  • 3 University of Dubai, Dubai, United Arab Emirates

The final, formatted version of the article will be published soon.

    In the evolving landscape of healthcare and medicine, the merging of extensive medical datasets with the powerful capabilities of machine learning (ML) models presents a significant opportunity for transforming diagnostics, treatments, and patient care. This research paper delves into the realm of data-driven healthcare, placing a special focus on identifying the most effective ML models for diabetes prediction and uncovering the critical features that aid in this prediction. The prediction performance is analyzed using a variety of ML models, such as Random Forest (RF), XG Boost (XGB), Linear Regression (LR), Gradient Boosting (GB), and Support Vector Machine (SVM), across numerous medical datasets. The study of feature importance is conducted using methods including Filter-based, Wrapper-based techniques, and Explainable ArtificialInterpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), the decision-making process of the models is ensured to be transparent, thereby bolstering trust in AI-driven decisions. Features identified by RF in Wrapperbased techniques and the Chi-square in Filter-based techniques have been shown to enhance prediction performance. Both approaches are found to assign considerable importance to features like age, family history of diabetes, polyuria, polydipsia, and high blood pressure, which are strongly associated with diabetes. In this age of data-driven healthcare, the research presented here aspires to substantially improve healthcare outcomes. This is achieved by attaining notable precision and recall values, reaching up to 0.9, in predicting diabetes, potentially significantly enhancing patient care and treatment strategies.

    Keywords: machine learning, Diabetes prediction, Explainable AI, filter, wrapper, Feature Selection

    Received: 22 Apr 2024; Accepted: 24 Jul 2024.

    Copyright: © 2024 Kaliappan, Kumar I J, S, T, Singh, Vera-Garcia, Himeur, Mansoor, Atalla and Srinivasan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Yassine Himeur, University of Dubai, Dubai, United Arab Emirates

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.