A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile

Hong, Wandong; Zhou, Xiaoying; Jin, Shengchun; Lu, Yajing; Pan, Jingyi; Lin, Qingyi; Yang, Shaopeng; Xu, Tingting; Basharat, Zarrin; Zippi, Maddalena; Fiorino, Sirio; Tsukanov, Vladislav; Stock, Simon; Grottesi, Alfonso; Chen, Qin; Pan, Jingye

doi:10.3389/fcimb.2022.819267

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol. , 12 April 2022

Sec. Clinical Microbiology

Volume 12 - 2022 | https://doi.org/10.3389/fcimb.2022.819267

This article is part of the Research Topic Insights In Clinical Microbiology: 2021 View all 8 articles

A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile

Wandong Hong^1*†

Zarrin Basharat^3†

Vladislav Tsukanov⁶

Simon Stock⁷

Alfonso Grottesi⁸

Qin Chen⁹

Jingye Pan^9*

¹Department of Gastroenterology and Hepatology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
²School of the First Clinical Medical Sciences, Wenzhou Medical University, Wenzhou, China
³Jamil-ur-Rahman Center for Genome Research, Dr. Panjwani Centre for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
⁴Unit of Gastroenterology and Digestive Endoscopy, Sandro Pertini Hospital, Rome, Italy
⁵Internal Medicine Unit, Budrio Hospital, Bologna, Italy
⁶Department of Gastroenterology, Scientific Research Institute of Medical Problems of the North, Krasnoyarsk, Russia
⁷Department of Surgery, World Mate Emergency Hospital, Battambang, Cambodia
⁸Unit of General Surgery, Sandro Pertini Hospital, Rome, Italy
⁹Department of Intensive Care Unit, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China

Background and Aims: The aim of this study was to apply machine learning models and a nomogram to differentiate critically ill from non-critically ill COVID-19 pneumonia patients.

Methods: Clinical symptoms and signs, laboratory parameters, cytokine profile, and immune cellular data of 63 COVID-19 pneumonia patients were retrospectively reviewed. Outcomes were followed up until Mar 12, 2020. A logistic regression function (LR model), Random Forest, and XGBoost models were developed. The performance of these models was measured by area under receiver operating characteristic curve (AUC) analysis.

Results: Univariate analysis revealed that there was a difference between critically and non-critically ill patients with respect to levels of interleukin-6, interleukin-10, T cells, CD4⁺ T, and CD8⁺ T cells. Interleukin-10 with an AUC of 0.86 was most useful predictor of critically ill patients with COVID-19 pneumonia. Ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4⁺ T cells, interleukin-6 and interleukin-10) were used as candidate predictors for LR model, Random Forest (RF) and XGBoost model application. The coefficients from LR model were utilized to build a nomogram. RF and XGBoost methods suggested that Interleukin-10 and interleukin-6 were the most important variables for severity of illness prediction. The mean AUC for LR, RF, and XGBoost model were 0.91, 0.89, and 0.93 respectively (in two-fold cross-validation). Individualized prediction by XGBoost model was explained by local interpretable model-agnostic explanations (LIME) plot.

Conclusions: XGBoost exhibited the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. It is inferred that the nomogram and visualized interpretation with LIME plot could be useful in the clinical setting. Additionally, interleukin-10 could serve as a useful predictor of critically ill patients with COVID-19 pneumonia.

Highlights

1. XGBoost exhibited the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia.

2. The nomogram and visualized interpretation with LIME plot could be useful in the clinical setting.

3. Interleukin-10 is a useful predictor of critically ill patients with COVID-19 pneumonia.

Introduction

Coronavirus disease 2019 (COVID-19) is a newly recognized illness, caused by the highly contagious severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and spread rapidly around the world in the last two years (Hong et al., 2021). As of February 28, 2022 (based on the WHO statistics), over 430 million confirmed cases and over 5.7 million deaths have been recorded (2022). COVID-19 causes a spectrum of symptoms ranging from mild to severe pneumonia as well as asymptomatic cases. Our previous study indicated that 34.9% patients with viral pneumonia would develop critical illness, and required admission to the ICU. They either had a fraction of inspired oxygen (FiO2) value of at least 60% or more during hospitalization and required mechanical ventilation (Hong et al., 2021). Delayed presentation of symptoms increases the risk of mortality and need for high-intensity healthcare (Suliman et al., 2021). The 28-day mortality span was reported for 61.5% of critically ill patients, with an average interval of 7 days between ICU admission to death in Wuhan, China (Yang et al., 2020). Early identification of critical illness grants an opportunity for timely intervention and thus, prevent more complicated, protracted and less successful hospital admissions (Suliman et al., 2021).

Anurag et al. validated the Pneumonia Severity Index (PSI)/PORT, Confusion, Respiratory rate, Blood pressure, 65 years of age and older (CURB-65) and the Severe Community-Acquired Pneumonia (SCAP) scoring system in COVID-19 pneumonia, for prediction of disease severity and 14-day mortality (Anurag and Preetam, 2021). However, in this study the severe COVID-19 pneumonia was defined by PSI/PORT score >130, CURB-65 score ≥53 or SCAP score ≥10 (Anurag and Preetam, 2021). San et al. classified the disease severity according to the interim guidance of the World Health Organization (San et al., 2021). They suggested that predicting high-risk group by the Brescia-COVID Respiratory Severity Scale (BRCSS) and quick SOFA (qSOFA), may improve clinical outcomes in COVID-19 patients (San et al., 2021). Bats et al. defined the severity with arterial oxygen saturation (SaO2) of less than 90% on room air or need of ≥4 L/min oxygen therapy (O2) to obtain a SaO2 ≥94% (Bats et al., 2021) and developed a COVID-19 severity risk score upon hospital admission (Bats et al., 2021). By enrolling patients both with and without pneumonia and using the definition of severity of COVID-19 recommended by the National Health Commission of China, Liang et al. developed a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 (Liang et al., 2020a). Using the same definition of severity of COVID-19 as Liang et al. (2020a), Zhang et al. (2020) developed a score consisting of age, WBCs, neutrophil count, glomerular filtration rate and myoglobin, for prediction of disease severity in COVID-19 (Zhang et al., 2020). Nomogram is a mathematical model that allows for individualized and evidence-based risk estimation, facilitating management-based decision-making. Feng et al. divided patients into three types (moderate, severe, and critical type) and reported that a nomogram based on chest CT and clinical characteristics could predict the disease progression in COVID-19 pneumonia patients much earlier (Feng et al., 2020). Li reported that, by using the definition of severity of COVID-19 recommended by the National Health Commission of China, a nomogram consisting of CT-based radiomics signature could be used for predicting severe COVID-19 pneumonia (Li et al., 2021). It has already been applied in COVID-19 to predict mortality (Ji et al., 2020) and assess survival (Dong et al., 2021). In summary, different studies used different definition of severity of disease and inclusion criteria. Few included clinical and laboratory prediction scores to identify critical illness in patients with CT confirmed COVID-19 pneumonia.

Machine learning (ML) methods such as deep learning, extreme gradient boosting (XGBoost) and RF focus on how computers learn from data before being applied to real settings. ML methods are useful for developing robust risk models and redefining patient classes (Deo, 2015). Therefore, many applications of ML exist (for clinical diagnosis, prediction, and classification of patients with COVID-19) (Mottaqi et al., 2021). Previously, Liang et al. have developed a deep learning mediated survival Cox model for early triage of critically ill COVID-19 patients (with and without X-ray abnormality) (Liang et al., 2020b). Deep Learning has also been used for the predictive model for the identification of natural molecules as potential inhibitors of SARS-CoV-2 inhibitors of main protease (Joshi et al., 2021). The XGBoost algorithm has shown to outperform other techniques for various sets of features, in a variety of different settings. Yan et al. has used XGBoost algorithm to identify lactic dehydrogenase (LDH), lymphocyte and C-reactive protein (CRP) as predictors of the mortality of individual patients (Yan et al., 2020). Wang et al. applied the XGBoost model to build a mortality-prediction model using clinical and laboratory data parameters for extrapolation of in-hospital mortality in patients with COVID-19 (Wang et al., 2020b). Liu et al. developed an XGBoost-based clinical model consisting of lymphocyte percentage, lactic dehydrogenase, neutrophil count, and D-dimer on admission for predicting critical illness risk in hospitalized patients with COVID-19 pneumonia (Liu et al., 2021). Ryan et al. reported that XGBoost-based algorithm is a useful predictive tool for anticipating patient mortality in COVID-19, pneumonia, and mechanically ventilated patients (Ryan et al., 2020).

In addition, most of the existing scores are developed based on only clinical and laboratory features. As the severity of COVID-19 pneumonia is clearly associated with multifactorial responses, the use of only clinical and laboratory features may result in missing important information from other risk factors. Cytokine storm plays an important role in severe COVID-19 pneumonia (Hu et al., 2021). Therefore, in most severe cases, the prognosis can be markedly worsened by the hyperproduction of proinflammatory cytokines, such as Interleukin-6 (IL-6) and TNF-α, preferentially targeting lung tissue (Costela-Ruiz et al., 2020). However, immune cells such as B Lymphocytes, T cells, CD4⁺ T & CD8⁺ T cells, and cytokine profile (such as IL-10), are rarely enrolled in these scores. Hence, further studies are required for developing scoring systems for prediction of critically ill patients with COVID-19 pneumonia (with both cytokine profiles and immune cell data).

The first aim of this study was to develop and compare an extreme gradient boosting (XGBoost) model, RF model, and a conventional LR model (present as nomogram) based on clinical, laboratory data and immune cells and cytokine profiles for prediction of critically ill patients with COVID-19 pneumonia. The second aim was to evaluate the role of immune cells and cytokine profile as potential predictors of the severity of COVID-19 pneumonia.

Material and Methods

Study Design, Subject Selection and Ethics

We conducted a post-hoc analysis of a previously reported retrospective cohort study in the First Affiliated Hospital of Wenzhou Medical University in mainland China (Hong et al., 2021). All patients with confirmed COVID-19 pneumonia between January 2020 and March 2020 were eligible for inclusion in this study. A confirmed case of COVID-19 was defined as a positive result on a real-time reverse-transcriptase–polymerase-chain-reaction (RT-PCR) assay of nasal and pharyngeal swab specimens (Hong et al., 2021). Exclusion criteria included lack of pneumonia and unavailability of chest computed tomography scans.

Definition of Severity

Patients with COVID-19 pneumonia were defined as critically ill if they were admitted to the intensive care unit (ICU) and required mechanical ventilation or had a fraction of inspired oxygen (FiO2) of at least 60% or more (Kumar et al., 2009; Hong et al., 2021).

Data Collection and Follow Up

The epidemiological, clinical symptoms & signs, laboratory parameters, cytokine profile, and immune cell data on admission were obtained using data collection forms of electronic medical records. These data included blood chemical analysis, liver, and renal function testing, glucose and coagulation testing, creatine kinase, B-type natriuretic peptide, C-reaction protein, procalcitonin, IL-2, IL-4, IL-6, IL-10, and tumor necrosis factor (TNF)-α, B Lymphocytes, T cells, CD4⁺ T and CD8⁺ T cell count. All patients were followed up until March 12, 2020 (Hong et al., 2021). We used LR and machine learning models to differentiate critically ill from non-critically ill patients with COVID-19 pneumonia.

Statistical Analysis

There were missing values in D-dimer, B-type natriuretic peptide levels, cytokine profiles, and immune cells. To handle this issue, missing values were imputed using Multiple Imputations by Chained Equations (MICE), when performing LR and ML analysis (Royston, 2005). MICE has emerged as one of the principal statistical approaches for dealing with missing data. The missing values were replaced by the estimated plausible values to create a “complete” dataset (Royston, 2005).

Categorical values were described as count and proportions and compared by the χ² test or Fisher’s exact test (Hong et al., 2020). According to the results of Shapiro–Wilk test, continuous values were expressed by mean ± SD or median and Inter Quartile Range (IQR) and compared using Student’s t-test or the Wilcoxon non-parametric test. All the variables, found to be different between critically ill and non-critically ill patients on univariate analysis, underwent receiver operating characteristic (ROC) curve analysis to identify the valuable single index predictor of critically ill patients with COVID-19 pneumonia. Then, only variables with the area under the receiver operating characteristic curve (AUC) >0.7 were used as potential predictors for critically ill patients having COVID-19 pneumonia (Hong et al., 2017). In addition, an exploratory variable importance analysis was also performed using both XGBoost and RF method to evaluate the role of different variables in prediction of critical illness. In XGBoost method, SHapley Additive exPlanations (SHAP) summary plot was used to quantify the variable importance of each variable, and SHAP force plot was used to explain the individual predictions, respectively (Deshmukh and Merchant, 2020). In the RF method, the importance of each variable was subsequently measured by calculating how much reduction each variable offers when they were added to the RF model using mean decreased accuracy and Gini (Gong et al., 2020).

Risk models were developed using conventional statistical method (forward-conditional step-wise LR), traditional machine learning algorithm (RF), and current state-of-the-art boosting algorithm utilized for gradient boosted decision trees (XGBoost). An RF model is a collection of many decision tree models, each of which is characterized by a tree-like structure (Gong et al., 2020). A gradient boosting ML algorithm (XGBoost) was employed for a binary classification task based on the presence or absence of critically ill patients with COVID-19 pneumonia (Al’aref et al., 2020).

We randomly held out two patients for individualized prediction, the remainder number (61 patients) was used to develop prediction models. When building and tuning prediction models, we used two-fold cross-validation as the resampling strategy to avoid overfitting of the model on new data. Training set was divided into two equal-sized sub-samples in which one sub-sample was taken for training and the other one for testing over all possible permutations. Analysis was repeated two times (folds). The AUC was calculated for each of the two analyses, using only the respective test data. At last, the mean AUC with 95% CI, and also area under precision recall curve and area under precision recall gain curve was calculated and compared (Saito and Rehmsmeier, 2015). Since the incidence of critical illness in patients with COVID-19 pneumonia was high (34.9%), we selected the best cut-of point (detected where the number of true positives was the highest with sensitivity >90%). This was done by selecting a threshold value at a point where the longest increase in the specificity of the slope declines. The sensitivity, specificity, accuracy, as well as F-score, which is a harmonic mean of recall and precision, were also calculated and compared (Saito and Rehmsmeier, 2015). To overcome the black box problem of XGBoost output and improve its interpretability, the LIME plot was used to explain the individualized prediction.

As for LR analysis, the conditional probabilities for stepwise entry and removal of a factor were 0.05 and 0.06, respectively (Hong et al., 2019). Based on the results of LR, an equation model and nomogram were developed to predict critical illness associated with COVID-19 pneumonia. Model calibration was done by Hosmer–Lomeshow goodness of fit test. Odds ratios (OR) were calculated, with 95% CI. Multicollinearity was considered to be significant if the largest variance inflation factor exceeded 10 (Hong et al., 2020).

A two-tailed P-value of less than 0.05 was considered statistically significant. All statistical analyses were performed in the R and STATA software. Data flow diagram of our study is shown in Supplementary Figure 1.

Results

Clinical Characteristics

A total of 63 hospitalized patients with confirmed COVID-19 pneumonia were enrolled in this study. Baseline clinical and laboratory findings of all patients on admission have been described before (Hong et al., 2021). In summary, out of the 63 patients, 22 (34.9%) required high-flow nasal cannula or higher-level oxygen support measures to correct hypoxemia during their hospital stay and were classified as critically ill patients. The remaining 41 patients were identified as non-critically ill. The mean age of the patients was 55.9 ± 15.3 years. Among these, 41 (65.1%) patients were men. The mean time from onset of symptoms to the hospital admission was 6.9 ± 3.7 days. The most frequent symptoms at the onset of illness were fever and cough (98.4 and 61.94% respectively). Of the clinical characteristics and laboratory findings, the respiratory rate, leukocyte, neutrophil counts, levels of aspartate transaminase, albumin, serum procalcitonin, D-dimer, and B-type natriuretic peptide levels were useful predictors of critically ill patients with COVID-19 pneumonia, having an AUC of more than 0.7 (Hong et al., 2021). Most patients had increased IL-6, IL-10, and decreased CD4⁺ T cells. The median values of these variables in all patients are shown in the Table 1.

TABLE 1

Table 1 Baseline characteristics of studied variables in the patients (on admission).

Cytokine and Immune Cells

As for the cytokine profiles and immune cells, univariate analysis revealed that in comparison to the non-critically ill patients, patients with critical illness had higher levels of, IL-6 and IL-10, as well as lower levels of T cells, CD4⁺ T, and CD8⁺ T cells (Figure 1) (Hong et al., 2021). There was no significant difference observed among patients with respect to IL-2, IL-4, Tumor Necrosis Factor Alpha (TNF-a), and B Lymphocytes. Among these, the T cells (AUC: 0.72 ± 0.09), CD4⁺ T levels (AUC: 0.72 ± 0.08), IL-6 (AUC: 0.85 ± 0.06), and IL-10 (AUC: 0.86 ± 0.06) were useful predictors of critically ill patients with COVID-19 pneumonia, with AUC of more than 0.7 (Figure 2) (Hong et al., 2021).

FIGURE 1

Figure 1 Comparison of cytokine profile and immune cells between critically and non-critically ill patients exhibiting COVID-19 pneumonia.

FIGURE 2

Figure 2 Forest plot for accuracy of IL-10 and T cells in predicting critical illness relate dto COVID-19 pneumonia. Each marker is plotted as an area under the curve (AUC) of the receiver operating characteristic curve, with a 95% confidence interval.

Exploratory Variable Importance Analysis

Leukocyte and T cells were not included in further analysis because of strong multicollinearity. Therefore, the ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4⁺ T cells, IL-6 and IL-10) were used for machine learning models. Based on the RF analysis, IL-10 was the most important predictor of critical illness in patients with COVID-19 pneumonia, followed by IL-6 and serum procalcitonin (Figure 3). SHAP summary plot revealed the relative importance of each feature in the XGBoost analysis. IL-10, IL-6, and CD4⁺ T cells were the three most important features (Figure 4).

FIGURE 3

Figure 3 Variable importance plot using RF model for the critically ill COVID-19 pneumonia patients. IL-10 and IL-6 were the most important variables in determining critical illness by either mean decrease accuracy or by mean decrease Gini.

FIGURE 4

Figure 4 SHAP summary plot for the all the variables contributing to the XGBoost model prediction for critically ill COVID-19 pneumonia patients. This shows the ranking features and their impact on the model output. The horizontal axis shows the corresponding SHAP value of the feature. A positive SHAP value contributes to the prediction of critically ill COVID-19 pneumonia patients and vice versa.

Development and Comparison of Prediction Models

The same ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer, B-type natriuretic peptide, CD4⁺ T cells, IL-6 and IL-10) were used for multivariable logistic analysis. Step-up LR identified the following three independent variables as predictive of critical illness in patients with COVID-19 pneumonia: aspartate transaminase (OR = 1.03, 95% CI 1.01, 1.05, P = 0.026), B-type natriuretic peptide (OR = 1.02, 95% CI 1.01, 1.03, P = 0.011), and IL-6 (OR = 1.04, 95% CI 1.02, 1.06, P <0.001). An LR model was developed to predict critically ill patients with COVID-19 pneumonia as follows: −5.25 + 0.031 aspartate transaminase (U/L) +0.016 B-type natriuretic peptide (pg/ml) +0.046 IL-6 (pg/ml). The coefficients from LR model were utilized to build a nomogram for the prediction of critical illness (Figure 5). The Hosmer–Lemeshow goodness-of-fit test was significant (P = 0.4), suggesting that our prediction model fits the actual data well.

FIGURE 5

Figure 5 Nomogram predicting the probability of critically ill COVID-19 related pneumonia patients. To obtain the nomogram-predicted probability, patient values have been plotted on each axis. A vertical line to the point axis depicts attributes for each variable value. Summing the points for all variables and obtaining the sum for the point line leads to assessment of the individual probability of critically ill COVID-19 related pneumonia patients.

When we compared the predicting models in two-fold cross-validation, the mean AUC of ROC curve analysis for LR model, RF model, and XGBoost model for the prediction of SAP was 0.91, 0.89, and 0.93, respectively (Figure 6). The area under precision recall curve also showed that the XGBoost model (0.82) achieved a higher mean area under precision recall curve than that of the LR (0.81) and RF model (0.75) (Figure 7). The area under precision recall gain curve for XGBoost model, LR model and RF model was 0.53, 0.49, and 0.43, respectively (Figure 8).

FIGURE 6

Figure 6 Mean receiver operator characteristic (ROC) curves for the XGBoost, RF model, and LR model.

FIGURE 7

Figure 7 Precision recall curves for the XGBoost, RF model, and LR model.

FIGURE 8

Figure 8 Precision recall gain curves for the XGBoost, RF model, and LR model.

XGBoost model achieved a sensitivity of 90.5%, specificity of 87.5%, and diagnostic accuracy of 88.5% and F-score of 84.4%. As a comparison, when RF and LR model achieved a similar of sensitivity of 90.1 and 90.5%, respectively, the RF and LR model achieved a lower specificity, diagnostic accuracy and F-score (Table 2).

TABLE 2

Table 2 Diagnostic values of various models implemented for differentiating critically ill patients with COVID-19 pneumonia.

Explanation of XGBoost Model Results: Individualized Prediction

To clarify the model prediction for individual patients, the LIME plot shows two typical predictions made by the XGBoost model, in which one is for critically ill and the other for non-critically ill patients with COVID-19 pneumonia (Figure 9). The length of the bar for each feature indicates the importance (weight) of that feature in making the prediction. A longer bar, therefore, indicates a feature that contributes more towards or against the prediction.

FIGURE 9

Figure 9 LIME plot explanation of two typical predictions, showing the main contributing features behind the model prediction. The length of the color bar represents the amount of contribution from the corresponding feature.

For example, the first case (case 19) is a critically ill patient that was correctly classified. This patient had a respiratory rate of 21 t/min, albumin = 32 mg/dl, Aspartate transaminase = 83 U/L, neutrophil = 2.98 (10⁹/L), Procalcitonin = 0.133 ng/ml, B-type natriuretic peptide = 19 pg/ml, D-dimer = 1.88 mg/L, CD4⁺ T = 167/ul, IL-6 = 74.65 pg/ml, and IL-10 = 10.8 pg/ml. The high IL-6 and decreased CD4⁺ T cells are the main reasons for critical illness factors, outweighing other factors such as normal B-type natriuretic peptide and IL-10.

The second case (case 49) is of a non-critically ill patient with COVID-19 pneumonia classified correctly. This patient had respiratory rate of 12 t/min, albumin = 33.7 mg/dl, Aspartate transaminase = 84 U/L, neutrophil = 2.11(109/L), Procalcitonin = 0.63 ng/ml, B-type natriuretic peptide = 15 pg/ml, D-dimer = 0.72 mg/L, CD4⁺ T = 257/ul, IL-6 = 2.46 pg/ml, and IL-10 = 3.5 pg/ml. The normal IL-10 and IL-6 are the main reasons for non-critical illness factors, outweighing other factors such as decreased albumin.

Discussion

IL-10 can be produced by many different myeloid and lymphoid cells, especially produced in large quantity by T helper 2 (Th2) during COVID-19 infections (Huang et al., 2020). It serves as an anti-inflammatory cytokine by suppressing macrophage and Dendritic Cells (DCs), thereby limiting T helper 1 (Th1) and T helper 2 (Th2) effector responses (Couper et al., 2008). Premature excretion during a virulent infection can cause overwhelming infection. Conversely, it may lead to severe tissue damage when produced too late during an avirulent infection (Couper et al., 2008). A recent study proposed that dramatic early proinflammatory IL-10 elevation may play a pathological role in COVID-19 severity as its pro-inflammatory or anti-inflammatory effects that distinguish depending on the different course of disease (Lu et al., 2021). Increasing evidence supports the elevation of IL-10 is correlated to the severity of COVID-19 (Han et al., 2020; Huang et al., 2020; Wang et al., 2020a; Zhao et al., 2020; Lu et al., 2021). Our study indicated the importance of IL-6 and 10 variables for RF (Figure 3) and SHAP summary plot in XGBoost method (Figure 4). Results confirm that IL-10 is the most important variable for the prediction of critical illness in patients with COVID-19 pneumonia. In addition, based on ROC analysis, IL-10 (AUC = 0.86) could be a useful single predictor of critically ill patients with COVID-19 pneumonia (Figure 2). The critically ill patients with pneumonia caused by this virosis are those who need high-flow nasal cannula or higher-level oxygen support measures to correct the hypoxemia. They are always observed to have pulmonary fibrotic changes on CT scans, ranging from fibrosis associated with pneumonia to severe lung injury, which results in hypoxemia (Shi et al., 2020). Several in vivo and in vitro studies have demonstrated that IL-10 demonstrates anti-fibrotic function in pancreatic, liver, and bleomycin-induced lung fibrosis (Thompson et al., 1998; Demols et al., 2002; Shamskhou et al., 2019). Therefore, it is speculated that IL-10 may play an anti-inflammatory and anti-fibrotic role for critically ill patients with COVID-19 pneumonia.

IL-6 is a pleiotropic cytokine secreted by myeloid cells following immune challenge or tissue injury (Yousif et al., 2021). It has a pro-inflammatory function but also has anti-inflammatory, pro-resolution, and regenerative properties (Mcelvaney et al., 2021). Production of IL-6 helps promote resistance to different pathogens and the maintenance of tissue homeostasis, but the overproduction causes chronic inflammatory disorders and severe hyperinflammation (Jones and Hunter, 2021). Several studies have reported that serum level of IL-6 is significantly elevated in the setting of severe COVID-19 disease (Coomes and Haghbayan, 2020; Cummings et al., 2020; Huang et al., 2020; Leisman et al., 2020). Moreover, the use of tocilizumab, a blocker of IL-6 receptor (IL-6R), has been recommended for severe cases of COVID-19 (Huang et al., 2020; Ruan et al., 2020; Wu et al., 2020a; Angriman et al., 2021; Galván-Román et al., 2021; Mcelvaney et al., 2021). IL-6 is also reported as one of the good predictors of progression and severity in patients with COVID-19 (Guirao et al., 2020; Liu et al., 2020; Broman et al., 2021; Ren et al., 2021). In addition, it is suggested that an elevated level of IL-6 is an important predictor of patients with severe COVID-19 needing ventilator support (Galván-Román et al., 2021). Therefore, IL-6 may be an effective marker of both disease severity and decision making in the clinical management of patients. As expected, our study suggests IL-6 (OR = 1.04, 95% CI 1.02, 1.06) is independently associated with critical illness in patients with COVID-19 pneumonia (Figure 5).

Aspartate aminotransferase (AST) is one type of aminotransferase that mainly exists in the liver and plays a role in the conversion of aspartate to ketoglutaric acid (Kwo et al., 2017). AST is normally present in the cytoplasm, but it is released into the serum after the damage of cells (Abd Rashid et al., 2021). Therefore, it is used as a method of assessing the liver condition. Recently, studies have reported that critically ill patients with COVID-19 pneumonia manifest elevated AST level (Zahedi et al., 2021). Among indicators of liver injury, elevated AST has been connected with the highest risk of death and the highest association with mortality (Lei et al., 2020). Padmaprakash et al. have demonstrated that AST is a significant predictor of COVID-19 mortality and elevated AST level is a valid indicator of COVID-19 pneumonia severity (Padmaprakash et al., 2022). Elevated AST levels have been independently associated with adverse clinical outcomes in COVID-19 patients, which includes admission to ICU, use of invasive mechanical ventilation, and death (Yip et al., 2021). At admission, AST has been demonstrated as an independent predictor of COVID-19 mortality, and it is essential to monitor AST in hospitalized patients (Ding et al., 2021). As expected, our LR model suggested that AST (OR = 1.03, 95% CI 1.01, 1.05) could be a predictive mark of critically ill patients with COVID-19 pneumonia (Figure 5).

Brain natriuretic peptide (BNP) is a 32 amino acid cardiac natriuretic peptide hormone, which is strongly upregulated in cardiac failure and locally in the area surrounding a myocardial infarction (Hall, 2004). Previous studies have highlighted that COVID-19 is a complex disease, targeting many organs and it is an independent risk factor for acute myocardial infarction, promoting the release of BNP (Katsoularis et al., 2021). Emerging data suggest that cardiac injury, manifested by cardiac biomarker elevation, is detected in sizeable COVID-19 patients and is associated with adverse outcomes and increased mortality (Qin et al., 2020). Stefanini et al. suggested that concomitant elevation of both BNP and troponin I serves as a strong independent predictor of all-cause mortality (OR 3.24) (Stefanini et al., 2020). Our study suggested that BNP (OR = 1.02, 95% CI 1.01, 1.03, P = 0.011) was independently associated with the development of critical illness in patients with COVID-19 pneumonia (Figure 5).

CD4⁺ T cells are instrumental as activators of both the innate and adaptive arms of the immune system (Ruterbusch et al., 2020). As critical protectors from infectious diseases, they can assist in humoral responses, indirectly activate macrophages, and directly suppress inflammation (Miller and Mitchell, 1969; Parish and Liew, 1972; Jandinski et al., 1976). Rydyznski et al. have suggested that SARS-CoV-2-specific CD4⁺ T cells are strongly associated with COVID-19 disease severity (Rydyznski Moderbacher et al., 2020). Oja et al. reported that CD4⁺ T-cell responses were qualitatively impaired in critically ill patients with COVID-19 patients (Oja et al., 2020). Our study suggested that in comparison to the non-critically ill patients, patients with critical illness had lower levels of CD4⁺ T cells (Figure 1). The SHAP summary plot by the XGBoost method suggested that the CD4⁺ T cells play an important role in predicting critical illness (Figure 4).

Nomogram is a two-dimensional graphical tool that could be used to predict the probability of a result, consisting of several lines arranged in proportions (Rahman et al., 2021). It demonstrates a great superiority in quantifying the risk of clinical events simply and intuitively (Iasonos et al., 2008; Jin et al., 2017). It is a quantitative and practical prediction tool and could provide clinicians with an easy-to-use method to predict severe pneumonia in COVID-19 patients (Feng et al., 2020). Wu established a nomogram model consisting of seven variables (age, lymphocyte, CRP, LDH, creatine kinase, urea and calcium) for severity risk prediction of COVID-19 pneumonia and classify COVID-19 patients into low-risk, medium-risk, and high-risk groups (Wu et al., 2020b). Incorporating different factors to construct a nomogram could have different clinical values. Ding et al. suggested that the prognosis of COVID-19 patients can be accurately predicted by the nomogram incorporating abnormal AST and D-bilirubin levels along with other individual signs at admission (Ding et al., 2021). Our study suggested that a nomogram based on LR model, consisting of IL-6, AST, and BNP achieved an excellent AUC of 0.91 for prediction of critically ill patients with COVID-19 pneumonia in two-fold cross-validation (Figure 6). Compared to other studies (Wu et al., 2020b; Ding et al., 2021), our nomogram was more simple to calculate because only three variables were needed (Figure 5). The point of each variable can be determined by referring vertically to the dotted line at the bottom. The scores of each corresponding variable have been added to calculate the total score, and the probability of severe COVID-19 pneumonia is predicted based on the values of the total points and lines, corresponding to the total score.

Compared to other ML methods, XGBoost shows resistance to overfitting in datasets with imbalanced feature/outcome ratios and hyperparameters, which allows tuning for imbalanced datasets (Vaid et al., 2021). By using SHAP summary plot, the variable importance of each variable could be quantified and explained. SHAP values are a game-theoretic approach to model interpretability that provide explanations of global model structure based on combinations of local explanations for each prediction (Vaid et al., 2021). XGBoost has been used to predict respiratory failure within 48 h, morbidity and mortality in patients hospitalized with COVID-19 (Pan et al., 2020; Bolourani et al., 2021; Wang et al., 2021). AlJame et al. (2021) used RF and XGBoost for screening COVID-19 from other patients while Montomoli et al. (2021) has used it to predict change in the SOFA score in a five day span for ICU admitted COVID-19 patients. Feng et al. (2021) used RF and XGBoost for predicting mortality in Covid-19 patients in comparison to several other methods and found XGBoost to be the superior ML method. Iwendi et al. (2020) reported use of RF for COVID-19 mediated deaths with respect to gender, age and geography. They reported more deaths in males, Wuhan population and people aged between 20 and 70 years.

Our study suggested that, when comparing the performance of the XGBoost model with the RF and LR models, the XGBoost (AUC = 0.93) exhibited the highest discriminatory performance, followed by LR (AUC = 0.91) and FR model (AUC = 0.89) (Figure 6). The area under precision recall curve and area under precision recall gain curve analysis showed similar results (Figures 7, 8). XGBoost model achieved a sensitivity of 90.5%, specificity of 87.5%, diagnostic accuracy of 88.5% and F-score of 84.4%, way higher than that of nomogram and RF models (Table 2). ML models are sophisticated and it is hard for clinicians to comprehend them, therefore less practiced in clinics (Ou et al., 2020). We have provided a visual illustration of the implemented models to help easily understand the importance of different models and features by clinicians. The results of XGBoost have been explained by LIME plot, which makes it easy to understand the individualized prediction (Figure 9).

To the best of our knowledge, this is the first study in the literature to implement and compare XGBoost, RF, and LR model (presented as a nomogram) based on clinical, laboratory data, immune cell and cytokine profiles for the differentiation of critically ill from non-critically ill patients with COVID-19 pneumonia. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. We used LIME plot to explain the outcome of XGBoost model. In addition, cytokine profile and immune cellular data were also evaluated as potential predictors for the severity of COVID-19 pneumonia. Our study still shows limitations and there is room for further improvement. First, it was a retrospective study from a single center. Secondly, the small sample size bears an intrinsic risk of over-fitting though we used two-fold cross-validation as the resampling method to avoid overfitting. Only patients with pneumonia were enrolled, therefore our results may be not applicable to patients without pneumonia. Thirdly, given that the proposed ML method is purely data-driven, our model may vary if applied on different datasets (Yan et al., 2020). Our XGBoost approach needs further model training, validation, and optimization before clinical application because patients in this study were enrolled from a single tertiary referral center. However, the findings are interesting and warrant further research. In future, application of deep learning models on our data would be interesting. Apart from classification of patients suffering from COVID-19, our protocol could be applied to subtype various cancers and could be extrapolated in other viral diseases as well. Amalgamation of more methods, deep learning and unsupervised algorithm comparison could also be interesting. The findings could be useful for doctors in prioritizing patient treatment and be a part of decision support systems to obtain useful predictors and impact clinical outcomes.

In conclusion, comparison stats showed that XGBoost had the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. The nomogram and visualized interpretation with LIME plot could also be useful in the clinical setting. Additionally, we identified that IL-10 is a useful predictor of critically ill patients with COVID-19 pneumonia and this finding is complemented by previously available literature as well.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

This study protocol was approved by the Ethics Committee of the First Affiliated Hospital of Wenzhou Medical University. The committee decided to waive the need for written informed consent from the participants studied in this analysis as the data were analyzed retrospectively and anonymously.

Author Contributions

WH conceived the study and carried out majority of the work. WH, GC, and JYeP participated in data collection. WH, ZB, XZ, SJ, YL, JYiP, QL, and SY conducted data analysis and drafted the manuscript. TX, ZB, MZ, SF, VT, SS and AG helped to finalize the manuscript. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

This work was supported by the Zhejiang Medical and Health Science and Technology Plan Project (Number: 2022KY886), the Wenzhou Science and Technology Bureau (Number: Y2020010), and the Wenzhou Key Technology Breakthrough Program on Prevention and Treatment for COVID-19 Epidemic (Number: ZG2020012).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We are very thankful for Professor Hemant Goyal (the Chief Gastroenterology Fellow of The Wright Center for Graduate Medical Education, Mercer University School of Medicine) for the help.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcimb.2022.819267/full#supplementary-material

Supplementary Figure 1 | Data flow diagram of this study.

References

Abd Rashid, ,. N., Halim, ,.A. S., Teoh, S. L., Budin, S. B., Hussan, F., Adib Ridzuan, N. R., et al. (2021). The Role of Natural Antioxidants in Cisplatin-Induced Hepatotoxicity. BioMed. Pharmacother. 144, 112328. doi: 10.1016/j.biopha.2021.112328

PubMed Abstract | CrossRef Full Text | Google Scholar

Al’aref, S. J., Maliakal, G., Singh, G., Van Rosendael, A. R., Ma, X., Xu, Z., et al. (2020). Machine Learning of Clinical Variables and Coronary Artery Calcium Scoring for the Prediction of Obstructive Coronary Artery Disease on Coronary Computed Tomography Angiography: Analysis From the CONFIRM Registry. Eur. Heart J. 41, 359–367. doi: 10.1093/eurheartj/ehz565

PubMed Abstract | CrossRef Full Text | Google Scholar

Aljame, M., Imtiaz, A., Ahmad, I., Mohammed, A. (2021). Deep Forest Model for Diagnosing COVID-19 From Routine Blood Tests. Sci. Rep. 11, 16682. doi: 10.1038/s41598-021-95957-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Angriman, F., Ferreyro, B. L., Burry, L., Fan, E., Ferguson, N. D., Husain, S., et al. (2021). Interleukin-6 Receptor Blockade in Patients With COVID-19: Placing Clinical Trials Into Context. Lancet Respir. Med. 9, 655–664. doi: 10.1016/S2213-2600(21)00139-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Anurag, A., Preetam, M. (2021). Validation of PSI/PORT, CURB-65 and SCAP Scoring System in COVID-19 Pneumonia for Prediction of Disease Severity and 14-Day Mortality. Clin. Respir. J. 15, 467–471. doi: 10.1111/crj.13326

PubMed Abstract | CrossRef Full Text | Google Scholar

(2022). Coronavirus Disease (COVID-19) Pandemic [Online]. Available at: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.

Google Scholar

Bats, M. L., Rucheton, B., Fleur, T., Orieux, A., Chemin, C., Rubin, S., et al. (2021). Covichem: A Biochemical Severity Risk Score of COVID-19 Upon Hospital Admission. PloS One 16, e0250956. doi: 10.1371/journal.pone.0250956

PubMed Abstract | CrossRef Full Text | Google Scholar

Bolourani, S., Brenner, M., Wang, P., Mcginn, T., Hirsch, J., Barnaby, D., et al. (2021). A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation. J. Med. Internet Res. 23, e24246. doi: 10.2196/24246

PubMed Abstract | CrossRef Full Text | Google Scholar

Broman, N., Rantasärkkä, K., Feuth, T., Valtonen, M., Waris, M., Hohenthal, U., et al. (2021). IL-6 and Other Biomarkers as Predictors of Severity in COVID-19. Ann. Med. 53, 410–412. doi: 10.1080/07853890.2020.1840621

PubMed Abstract | CrossRef Full Text | Google Scholar

Coomes, E. A., Haghbayan, H. (2020). Interleukin-6 in Covid-19: A Systematic Review and Meta-Analysis. Rev. Med. Virol. 30, 1–9. doi: 10.1002/rmv.2141

CrossRef Full Text | Google Scholar

Costela-Ruiz, V. J., Illescas-Montes, R., Puerta-Puerta, J. M., Ruiz, C., Melguizo-Rodriguez, L. (2020). SARS-CoV-2 Infection: The Role of Cytokines in COVID-19 Disease. Cytokine Growth Factor Rev. 54, 62–75. doi: 10.1016/j.cytogfr.2020.06.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Couper, K. N., Blount, D. G., Riley, E. M. (2008). IL-10: The Master Regulator of Immunity to Infection. J. Immunol. 180, 5771–5777. doi: 10.4049/jimmunol.180.9.5771

PubMed Abstract | CrossRef Full Text | Google Scholar

Cummings, M. J., Baldwin, M. R., Abrams, D., Jacobson, S. D., Meyer, B. J., Balough, E. M., et al. (2020). Epidemiology, Clinical Course, and Outcomes of Critically Ill Adults With COVID-19 in New York City: A Prospective Cohort Study. Lancet 395, 1763–1770. doi: 10.1016/S0140-6736(20)31189-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Demols, A., Van Laethem, J. L., Quertinmont, E., Degraef, C., Delhaye, M., Geerts, A., et al. (2002). Endogenous Interleukin-10 Modulates Fibrosis and Regeneration in Experimental Chronic Pancreatitis. Am. J. Physiol. Gastrointest. Liver Physiol. 282, G1105–G1112. doi: 10.1152/ajpgi.00431.2001

PubMed Abstract | CrossRef Full Text | Google Scholar

Deo, R. C. (2015). Machine Learning in Medicine. Circulation 132, 1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593

PubMed Abstract | CrossRef Full Text | Google Scholar

Deshmukh, F., Merchant, S. S. (2020). Explainable Machine Learning Model for Predicting GI Bleed Mortality in the Intensive Care Unit. Am. J. Gastroenterol. 115, 1657–1668. doi: 10.14309/ajg.0000000000000632

PubMed Abstract | CrossRef Full Text | Google Scholar

Ding, Z. Y., Li, G. X., Chen, L., Shu, C., Song, J., Wang, W., et al. (2021). Association of Liver Abnormalities With In-Hospital Mortality in Patients With COVID-19. J. Hepatol. 74, 1295–1302. doi: 10.1016/j.jhep.2020.12.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Dong, Y. M., Sun, J., Li, Y. X., Chen, Q., Liu, Q. Q., Sun, Z., et al. (2021). Development and Validation of a Nomogram for Assessing Survival in Patients With COVID-19 Pneumonia. Clin. Infect. Dis. 72, 652–660. doi: 10.1093/cid/ciaa963

PubMed Abstract | CrossRef Full Text | Google Scholar

Feng, C., Kephart, G., Juarez-Colunga, E. (2021). Predicting COVID-19 Mortality Risk in Toronto, Canada: A Comparison of Tree-Based and Regression-Based Machine Learning Methods. BMC Med. Res. Method. 21, 267. doi: 10.1186/s12874-021-01441-4

CrossRef Full Text | Google Scholar

Feng, Z., Yu, Q., Yao, S., Luo, L., Zhou, W., Mao, X., et al. (2020). Early Prediction of Disease Progression in COVID-19 Pneumonia Patients With Chest CT and Clinical Characteristics. Nat. Commun. 11, 4968. doi: 10.1038/s41467-020-18786-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Galván-Román, J. M., Rodríguez-García, S. C., Roy-Vallejo, E., Marcos-Jiménez, A., Sánchez-Alonso, S., Fernández-Díaz, C., et al. (2021). IL-6 Serum Levels Predict Severity and Response to Tocilizumab in COVID-19: An Observational Study. J. Allergy Clin. Immunol. 147, 72–80.e78. doi: 10.1016/j.jaci.2020.09.018

PubMed Abstract | CrossRef Full Text | Google Scholar

Gong, J., Ou, J., Qiu, X., Jie, Y., Chen, Y., Yuan, L., et al. (2020). A Tool for Early Prediction of Severe Coronavirus Disease 2019 (COVID-19): A Multicenter Study Using the Risk Nomogram in Wuhan and Guangdong, China. Clin. Infect. Dis. 71, 833–840. doi: 10.1093/cid/ciaa443

PubMed Abstract | CrossRef Full Text | Google Scholar

Guirao, J. J., Cabrera, C. M., Jiménez, N., Rincón, L., Urra, J. M. (2020). High Serum IL-6 Values Increase the Risk of Mortality and the Severity of Pneumonia in Patients Diagnosed With COVID-19. Mol. Immunol. 128, 64–68. doi: 10.1016/j.molimm.2020.10.006

PubMed Abstract | CrossRef Full Text | Google Scholar

Hall, C. (2004). Essential Biochemistry and Physiology of (NT-Pro)BNP. Eur. J. Heart Fail 6, 257–260. doi: 10.1016/j.ejheart.2003.12.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Han, H., Ma, Q., Li, C., Liu, R., Zhao, L., Wang, W., et al. (2020). Profiling Serum Cytokines in COVID-19 Patients Reveals IL-6 and IL-10 are Disease Severity Predictors. Emerg. Microbes Infect. 9, 1123–1130. doi: 10.1080/22221751.2020.1770129

PubMed Abstract | CrossRef Full Text | Google Scholar

Hong, W., Chen, Q., Qian, S., Basharat, Z., Zimmer, V., Wang, Y., et al. (2021). Critically Ill vs. Non-Critically Ill Patients With COVID-19 Pneumonia: Clinical Features, Laboratory Findings, and Prediction. Front. Cell Infect. Microbiol. 11, 550456. doi: 10.3389/fcimb.2021.550456

PubMed Abstract | CrossRef Full Text | Google Scholar

Hong, W., Lin, S., Zippi, M., Geng, W., Stock, S., Basharat, Z., et al. (2017). Serum Albumin Is Independently Associated With Persistent Organ Failure in Acute Pancreatitis. Can. J. Gastroenterol. Hepatol. 2017, 5297143. doi: 10.1155/2017/5297143

PubMed Abstract | CrossRef Full Text | Google Scholar

Hong, W., Tang, H., Dong, X., Hu, S., Yan, Y., Basharat, Z., et al. (2019). Prevalence of Helicobacter Pylori Infection in a Third-Tier Chinese City: Relationship With Gender, Age, Birth-Year and Survey Years. Microbiota Health Dis Off. J. Eur. Helicobacter. Microbiota Study Group 1, 1–12. doi: 10.26355/mhd_201911_150

CrossRef Full Text | Google Scholar

Hong, W., Zimmer, V., Basharat, Z., Zippi, M., Stock, S., Geng, W., et al. (2020). Association of Total Cholesterol With Severe Acute Pancreatitis: A U-Shaped Relationship. Clin. Nutr. 39, 250–257. doi: 10.1016/j.clnu.2019.01.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., et al. (2020). Clinical Features of Patients Infected With 2019 Novel Coronavirus in Wuhan, China. Lancet 395, 497–506. doi: 10.1016/S0140-6736(20)30183-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, B., Huang, S., Yin, L. (2021). The Cytokine Storm and COVID-19. J. Med. Virol. 93, 250–256. doi: 10.1002/jmv.26232

PubMed Abstract | CrossRef Full Text | Google Scholar

Iasonos, A., Schrag, D., Raj, G. V., Panageas, K. S. (2008). How to Build and Interpret a Nomogram for Cancer Prognosis. J. Clin. Oncol. 26, 1364–1370. doi: 10.1200/JCO.2007.12.9791

PubMed Abstract | CrossRef Full Text | Google Scholar

Iwendi, C., Bashir, A. K., Peshkar, A., Sujatha, R., Chatterjee, J. M., Pasupuleti, S., et al. (2020). COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm. Front. Public Health 8, 357. doi: 10.3389/fpubh.2020.00357

PubMed Abstract | CrossRef Full Text | Google Scholar

Jandinski, J., Cantor, H., Tadakuma, T., Peavy, D. L., Pierce, C. W. (1976). Separation of Helper T Cells From Suppressor T Cells Expressing Different Ly Components. I. Polyclonal Activation: Suppressor and Helper Activities are Inherent Properties of Distinct T-Cell Subclasses. J. Exp. Med. 143, 1382–1390. doi: 10.1084/jem.143.6.1382

PubMed Abstract | CrossRef Full Text | Google Scholar

Jin, C., Cao, J., Cai, Y., Wang, L., Liu, K., Shen, W., et al. (2017). A Nomogram for Predicting the Risk of Invasive Pulmonary Adenocarcinoma for Patients With Solitary Peripheral Subsolid Nodules. J. Thorac. Cardiovasc. Surg. 153, 462–469. e461. doi: 10.1016/j.jtcvs.2016.10.019

PubMed Abstract | CrossRef Full Text | Google Scholar

Ji, D., Zhang, D., Xu, J., Chen, Z., Yang, T., Zhao, P., et al. (2020). Prediction for Progression Risk in Patients With COVID-19 Pneumonia: The CALL Score. Clin. Infect. Dis. 71, 1393–1399. doi: 10.1093/cid/ciaa414

PubMed Abstract | CrossRef Full Text | Google Scholar

Jones, S. A., Hunter, C. A. (2021). Is IL-6 a Key Cytokine Target for Therapy in COVID-19? Nat. Rev. Immunol. 21, 337–339. doi: 10.1038/s41577-021-00553-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Joshi, T., Pundir, H., Sharma, P., Mathpal, S., Chandra, S. (2021). Predictive Modeling by Deep Learning, Virtual Screening and Molecular Dynamics Study of Natural Compounds Against SARS-CoV-2 Main Protease. J. Biomol. Struct. Dyn. 39, 6728–6746. doi: 10.1080/07391102.2020.1802341

PubMed Abstract | CrossRef Full Text | Google Scholar

Katsoularis, I., Fonseca-Rodríguez, O., Farrington, P., Lindmark, K., Fors Connolly, A. M. (2021). Risk of Acute Myocardial Infarction and Ischaemic Stroke Following COVID-19 in Sweden: A Self-Controlled Case Series and Matched Cohort Study. Lancet 398, 599–607. doi: 10.1016/S0140-6736(21)00896-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Kumar, A., Zarychanski, R., Pinto, R., Cook, D. J., Marshall, J., Lacroix, J., et al. (2009). Critically Ill Patients With 2009 Influenza A(H1N1) Infection in Canada. JAMA 302, 1872–1879. doi: 10.1001/jama.2009.1496

PubMed Abstract | CrossRef Full Text | Google Scholar

Kwo, P. Y., Cohen, S. M., Lim, J. K. (2017). ACG Clinical Guideline: Evaluation of Abnormal Liver Chemistries. Am. J. Gastroenterol. 112, 18–35. doi: 10.1038/ajg.2016.517

PubMed Abstract | CrossRef Full Text | Google Scholar

Lei, F., Liu, Y. M., Zhou, F., Qin, J. J., Zhang, P., Zhu, L., et al. (2020). Longitudinal Association Between Markers of Liver Injury and Mortality in COVID-19 in China. Hepatology 72, 389–398. doi: 10.1002/hep.31301

PubMed Abstract | CrossRef Full Text | Google Scholar

Leisman, D. E., Ronner, L., Pinotti, R., Taylor, M. D., Sinha, P., Calfee, C. S., et al. (2020). Cytokine Elevation in Severe and Critical COVID-19: A Rapid Systematic Review, Meta-Analysis, and Comparison With Other Inflammatory Syndromes. Lancet Respir. Med. 8, 1233–1244. doi: 10.1016/S2213-2600(20)30404-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Liang, W., Liang, H., Ou, L., Chen, B., Chen, A., Li, C., et al. (2020a). Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern. Med. 180, 1081–1089. doi: 10.1001/jamainternmed.2020.2033

PubMed Abstract | CrossRef Full Text | Google Scholar

Liang, W., Yao, J., Chen, A., Lv, Q., Zanin, M., Liu, J., et al. (2020b). Early Triage of Critically Ill COVID-19 Patients Using Deep Learning. Nat. Commun. 11, 3543. doi: 10.1038/s41467-020-17280-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, F., Li, L., Xu, M., Wu, J., Luo, D., Zhu, Y., et al. (2020). Prognostic Value of Interleukin-6, C-Reactive Protein, and Procalcitonin in Patients With COVID-19. J. Clin. Virol. 127, 104370. doi: 10.1016/j.jcv.2020.104370

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Q., Pang, B., Li, H., Zhang, B., Liu, Y., Lai, L., et al. (2021). Machine Learning Models for Predicting Critical Illness Risk in Hospitalized Patients With COVID-19 Pneumonia. J. Thorac. Dis. 13, 1215–1229. doi: 10.21037/jtd-20-2580

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, L., Wang, L., Zeng, F., Peng, G., Ke, Z., Liu, H., et al. (2021). Development and Multicenter Validation of a CT-Based Radiomics Signature for Predicting Severe COVID-19 Pneumonia. Eur. Radiol. 31, 7901–7912. doi: 10.1007/s00330-021-07727-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Lu, L., Zhang, H., Dauphars, D., He, Y. (2021). A Potential Role of Interleukin 10 in COVID-19 Pathogenesis. Trends Immunol. 42, 3–5. doi: 10.1016/j.it.2020.10.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Mcelvaney, O., Curley, G., Rose-John, S., Mcelvaney, N. (2021). Interleukin-6: Obstacles to Targeting a Complex Cytokine in Critical Illness. Lancet Respir. Med. 9, 643–654. doi: 10.1016/S2213-2600(21)00103-X

PubMed Abstract | CrossRef Full Text | Google Scholar

Miller, J. F., Mitchell, G. F. (1969). Thymus and Antigen-Reactive Cells. Transplant. Rev. 1, 3–42. doi: 10.1111/j.1600-065X.1969.tb00135.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Montomoli, J., Romeo, L., Moccia, S., Bernardini, M., Migliorelli, L., Berardini, D., et al. (2021). Machine Learning Using the Extreme Gradient Boosting (XGBoost) Algorithm Predicts 5-Day Delta of SOFA Score at ICU Admission in COVID-19 Patients. J. Intensive Med. 1, 110–116. doi: 10.1016/j.jointm.2021.09.002

CrossRef Full Text | Google Scholar

Mottaqi, M. S., Mohammadipanah, F., Sajedi, H. (2021). Contribution of Machine Learning Approaches in Response to SARS-CoV-2 Infection. Inform. Med. Unlocked 23, 100526. doi: 10.1016/j.imu.2021.100526

PubMed Abstract | CrossRef Full Text | Google Scholar

Oja, A. E., Saris, A., Ghandour, C. A., Kragten, N., Hogema, B. M., Nossent, E. J., et al. (2020). Divergent SARS-CoV-2-Specific T- and B-Cell Responses in Severe But Not Mild COVID-19 Patients. Eur. J. Immunol. 50, 1998–2012. doi: 10.1002/eji.202048908

PubMed Abstract | CrossRef Full Text | Google Scholar

Ou, C., Liu, J., Qian, Y., Chong, W., Zhang, X., Liu, W., et al. (2020). Rupture Risk Assessment for Cerebral Aneurysm Using Interpretable Machine Learning on Multidimensional Data. Front. Neurol. 11, 570181. doi: 10.3389/fneur.2020.570181

PubMed Abstract | CrossRef Full Text | Google Scholar

Padmaprakash, K. V., Thareja, S., Raman, N., Sowmya Karantha, C., Muthukrishnan, J., Vardhan, V. (2022). Does Transaminitis Predict Severity and Mortality in COVID 19 Patients?. J. Clin. Exp. Hepatol. doi: 10.1016/j.jceh.2022.01.004

CrossRef Full Text | Google Scholar

Pan, P., Li, Y., Xiao, Y., Han, B., Su, L., Su, M., et al. (2020). Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation. J. Med. Internet Res. 22, e23128. doi: 10.2196/23128

PubMed Abstract | CrossRef Full Text | Google Scholar

Parish, C. R., Liew, F. Y. (1972). Immune Response to Chemically Modified Flagellin. 3. Enhanced Cell-Mediated Immunity During High and Low Zone Antibody Tolerance to Flagellin. J. Exp. Med. 135, 298–311. doi: 10.1084/jem.135.2.298

PubMed Abstract | CrossRef Full Text | Google Scholar

Qin, J., Cheng, X., Zhou, F., Lei, F., Akolkar, G., Cai, J., et al. (2020). Redefining Cardiac Biomarkers in Predicting Mortality of Inpatients With COVID-19. Hypertens. (Dallas Tex 1979) 76, 1104–1112. doi: 10.1161/HYPERTENSIONAHA.120.15528

CrossRef Full Text | Google Scholar

Rahman, T., Khandakar, A., Hoque, M. E., Ibtehaz, N., Kashem, S. B., Masud, R., et al. (2021). Development and Validation of an Early Scoring System for Prediction of Disease Severity in COVID-19 Using Complete Blood Count Parameters. IEEE Access 9, 120422–120441. doi: 10.1109/ACCESS.2021.3105321

PubMed Abstract | CrossRef Full Text | Google Scholar

Ren, X., Wang, X., Ge, Z., Cui, S., Chen, Z. (2021). Clinical Features and Corresponding Immune Function Status of Recurrent Viral Polymerase Chain Reaction Positivity in Patients With COVID-19 : A Meta- Analysis and Systematic Review. Int. J. Immunopathol. Pharmacol. 35, 20587384211027679. doi: 10.1177/20587384211027679

PubMed Abstract | CrossRef Full Text | Google Scholar

Royston, P. (2005). Multiple Imputation of Missing Values: Update of Ice. Stata J. 5, 527–536. doi: 10.1177/1536867X0500500404

CrossRef Full Text | Google Scholar

Ruan, Q., Yang, K., Wang, W., Jiang, L., Song, J. (2020). Clinical Predictors of Mortality Due to COVID-19 Based on an Analysis of Data of 150 Patients From Wuhan, China. Intensive Care Med. 46, 846–848. doi: 10.1007/s00134-020-05991-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Ruterbusch, M., Pruner, K. B., Shehata, L., Pepper, M. (2020). In Vivo CD4(+) T Cell Differentiation and Function: Revisiting the Th1/Th2 Paradigm. Annu. Rev. Immunol. 38, 705–725. doi: 10.1146/annurev-immunol-103019-085803

PubMed Abstract | CrossRef Full Text | Google Scholar

Ryan, L., Lam, C., Mataraso, S., Allen, A., Green-Saxena, A., Pellegrini, E., et al. (2020). Mortality Prediction Model for the Triage of COVID-19, Pneumonia, and Mechanically Ventilated ICU Patients: A Retrospective Study. Ann. Med. Surg. (Lond.) 59, 207–216. doi: 10.1016/j.amsu.2020.09.044

PubMed Abstract | CrossRef Full Text | Google Scholar

Rydyznski Moderbacher, C., Ramirez, S. I., Dan, J. M., Grifoni, A., Hastie, K. M., Weiskopf, D., et al. (2020). Antigen-Specific Adaptive Immunity to SARS-CoV-2 in Acute COVID-19 and Associations With Age and Disease Severity. Cell 183, 996–1012.e1019. doi: 10.1016/j.cell.2020.09.038

PubMed Abstract | CrossRef Full Text | Google Scholar

Saito, T., Rehmsmeier, M. (2015). The Precision-Recall Plot is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PloS One 10, e0118432. doi: 10.1371/journal.pone.0118432

PubMed Abstract | CrossRef Full Text | Google Scholar

San, I., Gemcioglu, E., Baser, S., Yilmaz Cakmak, N., Erden, A., Izdes, S., et al. (2021). Brescia-COVID Respiratory Severity Scale (BRCSS) and Quick SOFA (qSOFA) Score are Most Useful in Showing Severity in COVID-19 Patients. Sci. Rep. 11, 21807. doi: 10.1038/s41598-021-01181-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Shamskhou, E. A., Kratochvil, M. J., Orcholski, M. E., Nagy, N., Kaber, G., Steen, E., et al. (2019). Hydrogel-Based Delivery of Il-10 Improves Treatment of Bleomycin-Induced Lung Fibrosis in Mice. Biomaterials 203, 52–62. doi: 10.1016/j.biomaterials.2019.02.017

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi, H., Han, X., Jiang, N., Cao, Y., Alwalid, O., Gu, J., et al. (2020). Radiological Findings From 81 Patients With COVID-19 Pneumonia in Wuhan, China: A Descriptive Study. Lancet Infect. Dis. 20, 425–434. doi: 10.1016/S1473-3099(20)30086-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Stefanini, G. G., Chiarito, M., Ferrante, G., Cannata, F., Azzolini, E., Viggiani, G., et al. (2020). Early Detection of Elevated Cardiac Biomarkers to Optimise Risk Stratification in Patients With COVID-19. Heart 106, 1512–1518. doi: 10.1136/heartjnl-2020-317322

PubMed Abstract | CrossRef Full Text | Google Scholar

Suliman, L. A., Abdelgawad, T. T., Farrag, N. S., Abdelwahab, H. W. (2021). Validity of ROX Index in Prediction of Risk of Intubation in Patients With COVID-19 Pneumonia. Adv. Respir. Med. 89, 1–7. doi: 10.5603/ARM.a2020.0176

PubMed Abstract | CrossRef Full Text | Google Scholar

Thompson, K., Maltby, J., Fallowfield, J., Mcaulay, M., Millward-Sadler, H., Sheron, N. (1998). Interleukin-10 Expression and Function in Experimental Murine Liver Inflammation and Fibrosis. Hepatology 28, 1597–1606. doi: 10.1002/hep.510280620

PubMed Abstract | CrossRef Full Text | Google Scholar

Vaid, A., Chan, L., Chaudhary, K., Jaladanki, S. K., Paranjpe, I., Russak, A., et al. (2021). Predictive Approaches for Acute Dialysis Requirement and Death in COVID-19. Clin. J. Am. Soc. Nephrol. 16, 1158–1168. doi: 10.2215/CJN.17311120

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, F., Hou, H., Luo, Y., Tang, G., Wu, S., Huang, M., et al. (2020a). The Laboratory Tests and Host Immunity of COVID-19 Patients With Different Severity of Illness. JCI Insight 5 (10), e137799. doi: 10.1172/jci.insight.137799

CrossRef Full Text | Google Scholar

Wang, J., Liu, W., Chen, X., Mcrae, M., Mcdevitt, J., Fenyö, D. (2021). Predictive Modeling of Morbidity and Mortality in Patients Hospitalized With COVID-19 and its Clinical Implications: Algorithm Development and Interpretation. J. Med. Internet Res. 23, e29514. doi: 10.2196/29514

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, K., Zuo, P., Liu, Y., Zhang, M., Zhao, X., Xie, S., et al. (2020b). Clinical and Laboratory Predictors of In-Hospital Mortality in Patients With Coronavirus Disease-2019: A Cohort Study in Wuhan, China. Clin. Infect. Dis. 71, 2079–2088. doi: 10.1093/cid/ciaa538

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, C., Chen, X., Cai, Y., Xia, J., Zhou, X., Xu, S., et al. (2020a). Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease 2019 Pneumonia in Wuhan, China. JAMA Intern. Med. 180, 934–943. doi: 10.1001/jamainternmed.2020.0994

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, G., Yang, P., Xie, Y., Woodruff, H. C., Rao, X., Guiot, J., et al. (2020b). Development of a Clinical Decision Support System for Severity Risk Prediction and Triage of COVID-19 Patients at Hospital Admission: An International Multicentre Study. Eur. Respir. J. 56 (2), 2001104. doi: 10.1183/13993003.01104-2020

CrossRef Full Text | Google Scholar

Yang, X., Yu, Y., Xu, J., Shu, H., Xia, J., Liu, H., et al. (2020). Clinical Course and Outcomes of Critically Ill Patients With SARS-CoV-2 Pneumonia in Wuhan, China: A Single-Centered, Retrospective, Observational Study. Lancet Respir. Med. 8, 475–481. doi: 10.1016/S2213-2600(20)30079-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Yan, L., Zhang, H.-T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., et al. (2020). An Interpretable Mortality Prediction Model for COVID-19 Patients. Nat. Mach. Intell. 2, 283–288. doi: 10.1038/s42256-020-0180-7

CrossRef Full Text | Google Scholar

Yip, T. C., Lui, G. C., Wong, V. W., Chow, V. C., Ho, T. H., Li, T. C., et al. (2021). Liver Injury is Independently Associated With Adverse Clinical Outcomes in Patients With COVID-19. Gut 70, 733–742. doi: 10.1136/gutjnl-2020-321726

PubMed Abstract | CrossRef Full Text | Google Scholar

Yousif, A. S., Ronsard, L., Shah, P., Omatsu, T., Sangesland, M., Bracamonte Moreno, T., et al. (2021). The Persistence of Interleukin-6 is Regulated by a Blood Buffer System Derived From Dendritic Cells. Immunity 54, 235–246. e235. doi: 10.1016/j.immuni.2020.12.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Zahedi, M., Yousefi, M., Abounoori, M., Malekan, M., Tajik, F., Heydari, K., et al. (2021). The Interrelationship Between Liver Function Test and the Coronavirus Disease 2019: A Systematic Review and Meta-Analysis. Iran J. Med. Sci. 46, 237–255. doi: 10.30476/ijms.2021.87555.1793

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, C., Qin, L., Li, K., Wang, Q., Zhao, Y., Xu, B., et al. (2020). A Novel Scoring System for Prediction of Disease Severity in COVID-19. Front. Cell Infect. Microbiol. 10, 318. doi: 10.3389/fcimb.2020.00318

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, Y., Qin, L., Zhang, P., Li, K., Liang, L., Sun, J., et al. (2020). Longitudinal COVID-19 Profiling Associates IL-1RA and IL-10 With Disease Severity and RANTES With Mild Disease. JCI Insight 5 (13), e139834. doi: 10.1172/jci.insight.139834

CrossRef Full Text | Google Scholar

Keywords: COVID-19, infection, pneumonia, severity, critically ill, predictor, machine learning

Citation: Hong W, Zhou X, Jin S, Lu Y, Pan J, Lin Q, Yang S, Xu T, Basharat Z, Zippi M, Fiorino S, Tsukanov V, Stock S, Grottesi A, Chen Q and Pan J (2022) A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile. Front. Cell. Infect. Microbiol. 12:819267. doi: 10.3389/fcimb.2022.819267

Received: 21 November 2021; Accepted: 07 March 2022;
Published: 12 April 2022.

Edited by:

Max Maurin, Université Grenoble Alpes, France

Reviewed by:

Joseph Bamidele Awotunde, University of Ilorin, Nigeria
Molka Rekik, Prince Sattam Bin Abdulaziz University, Saudi Arabia

Copyright © 2022 Hong, Zhou, Jin, Lu, Pan, Lin, Yang, Xu, Basharat, Zippi, Fiorino, Tsukanov, Stock, Grottesi, Chen and Pan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wandong Hong, eGhuay1od2RAMTYzLmNvbQ==; Jingye Pan, c3R1ZHlwYW5qaW5neWVAc2luYS5jb20=

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile

Highlights

Introduction

Material and Methods

Study Design, Subject Selection and Ethics

Definition of Severity

Data Collection and Follow Up

Statistical Analysis

Results

Clinical Characteristics

Cytokine and Immune Cells

Exploratory Variable Importance Analysis

Development and Comparison of Prediction Models

Explanation of XGBoost Model Results: Individualized Prediction

Discussion

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

Publisher’s Note

Acknowledgments

Supplementary Material

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good