AUTHOR=Wang Xingchen , Zhu Tianqi , Xia Minghong , Liu Yu , Wang Yao , Wang Xizhi , Zhuang Lenan , Zhong Danfeng , Zhu Jun , He Hong , Weng Shaoxiang , Zhu Junhui , Lai Dongwu TITLE=Predicting the Prognosis of Patients in the Coronary Care Unit: A Novel Multi-Category Machine Learning Model Using XGBoost JOURNAL=Frontiers in Cardiovascular Medicine VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2022.764629 DOI=10.3389/fcvm.2022.764629 ISSN=2297-055X ABSTRACT=Background

Early prediction and classification of prognosis is essential for patients in the coronary care unit (CCU). We applied a machine learning (ML) model using the eXtreme Gradient Boosting (XGBoost) algorithm to prognosticate CCU patients and compared XGBoost with traditional classification models.

Methods

CCU patients' data were extracted from the MIMIC-III v1.4 clinical database, and divided into four groups based on the time to death: <30 days, 30 days−1 year, 1–5 years, and ≥5 years. Four classification models, including XGBoost, naïve Bayes (NB), logistic regression (LR), and support vector machine (SVM) were constructed using the Python software. These four models were tested and compared for accuracy, F1 score, Matthews correlation coefficient (MCC), and area under the curve (AUC) of the receiver operating characteristic curves. Subsequently, Local Interpretable Model-Agnostic Explanations method was performed to improve XGBoost model interpretability. We also constructed sub-models of each model based on the different categories of death time and compared the differences by decision curve analysis. The optimal model was further analyzed using a clinical impact curve. At last, feature ablation curves of the XGBoost model were conducted to obtain the simplified model.

Results

Overall, 5360 CCU patients were included. Compared to NB, LR, and SVM, the XGBoost model showed better accuracy (0.663, 0.605, 0.632, and 0.622), micro-AUCs (0.873, 0.811, 0.841, and 0.818), and MCC (0.337, 0.317, 0.250, and 0.182). In subgroup analysis, the XGBoost model had a better predictive performance in acute myocardial infarction subgroup. The decision curve and clinical impact curve analyses verified the clinical utility of the XGBoost model for different categories of patients. Finally, we obtained a simplified model with thirty features.

Conclusions

For CCU physicians, the ML technique by XGBoost is a potential predictive tool in patients with different conditions, and it may contribute to improvements in prognosis.