The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Med.
Sec. Infectious Diseases: Pathogenesis and Therapy
Volume 12 - 2025 |
doi: 10.3389/fmed.2025.1521660
Clinical validation and optimization of machine learning models for early prediction of sepsis
Provisionally accepted- 1 The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong Province, China
- 2 Guangzhou AID Cloud Technology, Guangzhou, China
Sepsis is a global health threat that has a high incidence and mortality rate. Early prediction of sepsis onset can drive effective interventions and improve patients' outcome.Methods: Data were collected retrospectively from a cohort of 2,329 adult patients with positive bacteria cultures from a tertiary hospital in China between October 1, 2019 and September 30, 2020. 36 clinical features were selected as inputs for the models. We trained models in predicting sepsis by machine learning (ML) methods, including logistic regression, decision tree, random forest (RF), multi-layer perceptron, and light gradient boosting. We evaluated the performance of the five ML models and the evaluation metrics were: area under the ROC curve (AUC), accuracy, F1-score, sensitivity and specificity. The data of another cohort of 2,286 patients between October 1, 2020 and April 1, 2022 were used to validate the performance of the model performing best in the in the internal validation set. Shapley additive explanations (SHAP) method was applied to evaluate feature importance and explain the predictions of this model.Of the five machine learning models developed, the RF model demonstrated the best performance in terms of AUC (0.818), F1 value (0.38), and sensitivity (0.746). The RF model also has a comparable AUC (0.771) in the external validation set. The SHAP method identified procalcitonin, albumin, prothrombin time, and sex as the important variables contributing to the prediction of sepsis. Discussion: The RF model we developed showed the greatest potential for early prediction of sepsis in admitted patients, which could aid clinicians in their decision-making process. Our findings also suggested that male patients with bacterial infections and high procalcitonin levels, lower albumin levels, or prolonged prothrombin times were more likely to develop sepsis.
Keywords: Sepsis, machine learning, artificial intelligence, Prediction model, Infectious Disease
Received: 02 Nov 2024; Accepted: 14 Jan 2025.
Copyright: © 2025 Liu, Li, Liu, Luo, Dong, Ouyang, He, Xia and Xiao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Jinyu Xia, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
Fei Xiao, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.