Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 7 - 2024 | doi: 10.3389/frai.2024.1473837

Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm

Provisionally accepted
  • Shenzhen Second People's Hospital, Shenzhen, Guangdong Province, China

The final, formatted version of the article will be published soon.

    The Department of Rehabilitation Medicine is key to improving patients' quality of life. Driven by chronic diseases and aging population, there is a need to enhance efficiency and resource allocation of outpatient facilities. This study analyzes treatment preferences of outpatient rehabilitation patients using data and a grading tool to establish predictive models. The goal is to improve patient visit efficiency and optimize resource allocation through predictive models. Data were collected from 38 Chinese institutions, including 4244 patients visiting outpatient rehabilitation clinics. Data processing was conducted using Python software. Pandas library was used for data cleaning and preprocessing, involving 68 categorical and 12 continuous variables. Steps included handling missing values, data normalization, and encoding conversion. Data were divided into 80% training and 20% test sets using Scikit-learn library to ensure model independence and prevent overfitting. Performance comparisons among XGBoost, random forest, and logistic regression were conducted using metrics, including accuracy and receiver operating characteristic (ROC) curves. The imbalanced learning library's SMOTE technique was used to address sample imbalance during model training. The model was optimized using confusion matrix and feature importance analysis, and partial dependence plots (PDP) were used to analyze key influencing factors. XGBoost achieved the highest overall accuracy of 80.21% with high precision and recall in Category 1. Random forest showed similar overall accuracy. Logistic Regression had significantly lower accuracy, indicating difficulties with nonlinear data. Key factors include distance to institutions, arrival time, hospital stay, and diseases like cardiovascular, pulmonary, oncological, and orthopedic conditions. The tiered diagnosis tool helped doctors recommend institutions based on rehabilitation grading. Ensemble methods like XGBoost excel in complex datasets, and addressing imbalance improves performance. Understanding patient preferences aids healthcare policy to optimize resources, improve service quality, and enhance satisfaction.

    Keywords: XGBoost, Machine learning algorithm, Rehabilitation patient, Graded diagnosis and treatment, Treatment preferences

    Received: 16 Aug 2024; Accepted: 24 Dec 2024.

    Copyright: © 2024 Fan, Ye, Gao, Xue, Zhang, Xu and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Yulong Wang, Shenzhen Second People's Hospital, Shenzhen, Guangdong Province, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.