AUTHOR=Hong Wandong , Zhou Xiaoying , Jin Shengchun , Lu Yajing , Pan Jingyi , Lin Qingyi , Yang Shaopeng , Xu Tingting , Basharat Zarrin , Zippi Maddalena , Fiorino Sirio , Tsukanov Vladislav , Stock Simon , Grottesi Alfonso , Chen Qin , Pan Jingye TITLE=A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile JOURNAL=Frontiers in Cellular and Infection Microbiology VOLUME=12 YEAR=2022 URL=https://www.frontiersin.org/journals/cellular-and-infection-microbiology/articles/10.3389/fcimb.2022.819267 DOI=10.3389/fcimb.2022.819267 ISSN=2235-2988 ABSTRACT=Background and Aims

The aim of this study was to apply machine learning models and a nomogram to differentiate critically ill from non-critically ill COVID-19 pneumonia patients.

Methods

Clinical symptoms and signs, laboratory parameters, cytokine profile, and immune cellular data of 63 COVID-19 pneumonia patients were retrospectively reviewed. Outcomes were followed up until Mar 12, 2020. A logistic regression function (LR model), Random Forest, and XGBoost models were developed. The performance of these models was measured by area under receiver operating characteristic curve (AUC) analysis.

Results

Univariate analysis revealed that there was a difference between critically and non-critically ill patients with respect to levels of interleukin-6, interleukin-10, T cells, CD4+ T, and CD8+ T cells. Interleukin-10 with an AUC of 0.86 was most useful predictor of critically ill patients with COVID-19 pneumonia. Ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4+ T cells, interleukin-6 and interleukin-10) were used as candidate predictors for LR model, Random Forest (RF) and XGBoost model application. The coefficients from LR model were utilized to build a nomogram. RF and XGBoost methods suggested that Interleukin-10 and interleukin-6 were the most important variables for severity of illness prediction. The mean AUC for LR, RF, and XGBoost model were 0.91, 0.89, and 0.93 respectively (in two-fold cross-validation). Individualized prediction by XGBoost model was explained by local interpretable model-agnostic explanations (LIME) plot.

Conclusions

XGBoost exhibited the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. It is inferred that the nomogram and visualized interpretation with LIME plot could be useful in the clinical setting. Additionally, interleukin-10 could serve as a useful predictor of critically ill patients with COVID-19 pneumonia.