Early Prediction Model for Critical Illness of Hospitalized COVID-19 Patients Based on Machine Learning Techniques

Fu, Yacheng; Zhong, Weijun; Liu, Tao; Li, Jianmin; Xiao, Kui; Ma, Xinhua; Xie, Lihua; Jiang, Junyi; Zhou, Honghao; Liu, Rong; Zhang, Wei

doi:10.3389/fpubh.2022.880999

ORIGINAL RESEARCH article

Front. Public Health, 24 May 2022

Sec. Infectious Diseases: Epidemiology and Prevention

Volume 10 - 2022 | https://doi.org/10.3389/fpubh.2022.880999

This article is part of the Research TopicBiomarkers and Early Warning Scores: The Time for High-Precision Emergency MedicineView all 19 articles

Early Prediction Model for Critical Illness of Hospitalized COVID-19 Patients Based on Machine Learning Techniques

Yacheng Fu^1,2^†

Weijun Zhong^1,2^†

Tao Liu³

Jianmin Li⁴

Kui Xiao⁵

Xinhua Ma⁶

Lihua Xie⁷

Junyi Jiang^1,2

Honghao Zhou^1,2

Rong Liu^1,2^*

Wei Zhang^1,2^*

¹Department of Clinical Pharmacology, Xiangya Hospital, Central South University, Changsha, China
²National Clinical Research Center for Geriatric Disorders, Changsha, China
³Shenzhen Center for Chronic Disease Control, Shenzhen, China
⁴Department of Pulmonary and Critical Care Medicine, Hunan Provincial People's Hospital, The First Affiliated Hospital of Hunan Normal University, Changsha, China
⁵Department of Pulmonary and Critical Care Medicine, The Second Xiangya Hospital, Central South University, Changsha, China
⁶Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
⁷B7 Department, Zhongfa District of Tongji Hospital, Tongji Medical, Huazhong University of Science and Technology, Wuhan, China

Motivation: Patients with novel coronavirus disease 2019 (COVID-19) worsen into critical illness suddenly is a matter of great concern. Early identification and effective triaging of patients with a high risk of developing critical illness COVID-19 upon admission can aid in improving patient care, increasing the cure rate, and mitigating the burden on the medical care system. This study proposed and extended classical least absolute shrinkage and selection operator (LASSO) logistic regression to objectively identify clinical determination and risk factors for the early identification of patients at high risk of progression to critical illness at the time of hospital admission.

Methods: In this retrospective multicenter study, data of 1,929 patients with COVID-19 were assessed. The association between laboratory characteristics measured at admission and critical illness was screened with logistic regression. LASSO logistic regression was utilized to construct predictive models for estimating the risk that a patient with COVID-19 will develop a critical illness.

Results: The development cohort consisted of 1,363 patients with COVID-19 with 133 (9.7%) patients developing the critical illness. Univariate logistic regression analysis revealed 28 variables were prognosis factors for critical illness COVID-19 (p < 0.05). Elevated CK-MB, neutrophils, PCT, α-HBDH, D-dimer, LDH, glucose, PT, APTT, RDW (SD and CV), fibrinogen, and AST were predictors for the early identification of patients at high risk of progression to critical illness. Lymphopenia, a low rate of basophils, eosinophils, thrombopenia, red blood cell, hematocrit, hemoglobin concentration, blood platelet count, and decreased levels of K, Na, albumin, albumin to globulin ratio, and uric acid were clinical determinations associated with the development of critical illness at the time of hospital admission. The risk score accurately predicted critical illness in the development cohort [area under the curve (AUC) = 0.83, 95% CI: 0.78–0.86], also in the external validation cohort (n = 566, AUC = 0.84).

Conclusion: A risk prediction model based on laboratory findings of patients with COVID-19 was developed for the early identification of patients at high risk of progression to critical illness. This cohort study identified 28 indicators associated with critical illness of patients with COVID-19. The risk model might contribute to the treatment of critical illness disease as early as possible and allow for optimized use of medical resources.

Introduction

The coronavirus disease 2019 (COVID-19) pandemic is spreading worldwide. As a communicable disease, COVID-19 is caused by severe acute respiratory syndrome coronavirus 2. Until 14 February 2022, the WHO reported 412,044,520 COVID-19 confirmed cases globally, with an average mortality rate of 1.4%. The clinical spectrum of COVID-19 infection ranges from asymptomatic infection, and mild upper respiratory tract illness to critically ill cases (1). It has been reported that about 5% of patients with COVID-19 infection experience rapid deterioration from the onset of symptoms into critical illness (2) and with a mortality rate of 61.5% for critical ones within 28 days of hospital admission (3). Treatment of patients with critical illnesses constitutes great pressure on medical services, especially results in the lack of intensive care resources. Therefore, early identification and effective triaging of patients with a high risk of developing critical illness COVID-19 upon admission can aid in improving patient care, increasing the cure rate, and mitigating the burden on the medical care system.

The risk factors for critical illness are not well-revealed. Previous reports have identified that older age, organ dysfunction, neutrophilia, preexisting concurrent cardiovascular or cerebrovascular diseases, coagulopathy, amounts of CD3+CD8+ T cells, and elevated D-dimer levels are associated with the development of acute respiratory distress syndrome and increased mortality risk (1, 4–9). A limited number of publications have identified chest radiographic abnormality, older age, hemoptysis, dyspnea, unconsciousness, number of comorbidities, cancer history, neutrophil-to-lymphocyte ratio (10), lactate dehydrogenase (LDH), and direct bilirubin are risk factors associated with the development of critical illness (11, 12). Clinical scores for predicting which patients with COVID-19 will develop critical illness were developed with these above 10 factors (11, 12), which show well-discrimination. In addition, an integrated model was developed with patient history, laboratory markers, and chest radiography at hospital admission to predict critical illness by Schalekamp et al. (13). However, in these models, some diagnoses of co-existing illness and symptoms were from patients' self-reports at admission, which might lead to recall bias.

Mathematical modeling with appropriate inputs can make predictions in the dynamics and control of the infectious disease. A series of mathematical models have been developed on the transmission dynamics and control of COVID-19 or SARS-CoV-2 virus in different countries (14–24), namely, Wuhan, Italy, and the USA. In this retrospective multicenter study, we proposed and extended classical least absolute shrinkage and selection operator (LASSO) logistic regression for the early identification of patients at high risk of progression to critical illness. We systematically analyzed the accessible laboratory findings of confirmed 1,929 patients with COVID-19 having clear prognostic information in 32 hospitals in Hubei and Hunan provinces of China and identified robust and meaningful factors associated with a critical illness. The laboratory findings were measured objectively. A risk prediction model was constructed according to LASSO logistic regression to help identify patients at the time of hospital admission who are at high risk of developing a critical illness. This model aims at distinguishing patients at imminent risk of critical illness, thereby optimizing the allocation of limited healthcare resources and potentially lowering the mortality rate.

Methods

Data Collection

This study has been proved by the Institute of Clinical Pharmacology, Central South University. For the urgent need to collect and analyze data on this emerging pathogen, the ethics committee of the Institute of Clinical Pharmacology, Central South University granted a waiver of written informed consent from study participants. Medical records of hospitalized patients with COVID-19 diagnosed in 31 hospitals in China (4 hospitals in Hubei Province and 27 hospitals in Hunan Province) were collected. All patients who were diagnosed with COVID-19 by positive high-throughput sequencing or real-time reverse-transcription PCR (RT-PCR) assay for nasal and pharyngeal swab specimens were screened, our study enrolled all adult inpatients (≥18 years old) who were hospitalized for COVID-19 and had an explicit outcome of critical illness. The data were cross-checked by experienced respiratory clinicians. All patients with data on clinical status at hospitalization (laboratory findings, critical illness, and discharge status) were included.

Clinical Outcome

The outcome of this study is a critical illness, which is defined as a composite of invasive ventilation, admission to the intensive care unit (ICU), or fatal of patients with COVID-19 (25). The follow-up time was calculated from the first day of hospitalization to the date of death or discharge, or the censored date (12th April 2020 for the development cohort and 11 June 2020 for the validation cohort).

Potential Predictive Variables

Demographic variables and laboratory findings of patients at hospital admission were collected as potential predictive variables. Demographic variables included age and gender. Laboratory findings were conducted as the first measurement within 2 days after at admission, laboratory indexes with complete measurements for more than 50% of the patients in the development cohort were collected: hematologic (hematokrit, basophils, eosinophils, lymphocytes, monocytes, neutrophils, mean corpuscular volume, hemoglobin concentration, coefficient of variation [CV] and SD of red blood cell volume distribution width [RDW], blood platelet count, thrombocytocrit, red blood cell, and white blood cells), biochemical [levels of glucose, K, Na, total Ca, Cl, total protein, lactate dehydrogenase (LDH), glutamic-pyruvic transaminase, creatine kinase, aspartate transaminase (AST), creatine kinase muscle-brain isoform (CK-MB), creatinine, ureophil, albumin, globulin, albumin to globulin ratio, and glomerular filtration rate (GFR)], coagulation function indexes [levels of D-dimer and fibrinogen, activated partial thromboplastin time (APTT), and prothrombin time (PT)], infection-related indices [levels of C-reactive protein (CRP), procalcitonin (PCT), and alpha hydroxybutyrate dehydrogenase (α-HBDH)], and also the level of uric acid. For the complete laboratory findings and corresponding ratio of missing values, please refer to Supplementary Table 1.

Statistical Analysis

Continuous and categorical variables were presented as mean, SD [interquartile range (IQR)], and n (%), respectively.

A total of 1,255 patients hospitalized with COVID-19 in the development cohort were included for variable selection. To access the association between the quantitative laboratory findings described above and the occurrence of critical illness, a univariate logistic regression analysis was conducted. Since the odds ratio (OR) is interpreted per unit change, to standardize ORs between variables with a different range, logistic regression analysis was applied to dichotomies data (1 = with the occurrence of critical illness and 0 = without the occurrence of critical illness) with quartiles of each of the 38 laboratory findings modeled as continuous (<25% quartile = 1; ≥25% and <50% quartile = 2; ≥50% quartile and <75% quartile = 3; and ≥ 75% quartile = 4). The associations between the occurrence of critical illness and age (≥55 vs. <55 years) were also evaluated.

The statistically significant 28 covariates (p < 0.05) in the univariate logistic analysis were selected as candidates for risk score development of critical illness. A total of 1,064 patients with at least 80% data completeness of the above 28 variables were utilized for model establishment. We applied predictive mean matching to impute numeric features (laboratory findings) with “mice” packages in R for these 1,064 patients.

Prediction models were developed with the LASSO logistic regression, support vector regression (SVR), artificial neural network (ANN), regression tree (RT), and multivariate adaptive regression splines (MARS) machine learning techniques. We used the “glmnet” (14) package for LASSO, “e1071” package for SVR, “RSNNS” package for ANN, “rpart” package for RT, and “earth” package for MARS. Default parameters were used. L1-penalized least absolute shrinkage and selection regression augmented with 1,000-fold cross-validation for internal validation was utilized. LASSO logistic regression is a logistic regression model that penalizes the absolute size of the coefficients of a regression model according to the value of λ. In the process of LASSO regression coefficients, some unimportant regression coefficients can be directly reduced to 0 to achieve the function of variable screening. In comparison to the ridge regression model, the penalty term in the LASSO regression is an absolute value, namely, L1 regular. The estimates of weaker factors shrink toward zero with larger penalties, then only the greatest predictors were left in the model. We select the most predictive covariates by the minimum value of λ. Subsequently, variables identified by LASSO regression analysis were used to construct the risk score with their coefficients:

\begin{array}{l} R i s k S c o r e (R S) = \sum_{i = 1}^{n} (V a l u e_{i} * C o e_{i}) & (1) \end{array}

where n stands for the number of prognostic variables in the model; Value_i is the original value of variable_i; and Coe_i is the estimated coefficient of Value_i in the LASSO logistic regression model. The probability of developing critical illness was calculated with the following formula: probability = exp (RS)/[1+ exp(RS)].

We used receiver operating characteristic (ROC) curves to compare the sensitivity and specificity of scores generated with different machine learning techniques. The abscissa and ordinate coordinates of ROC curves are false-positive rate and true probability, respectively. The points of ROC curves reflect the susceptibility to the same signal stimulus. By comparing the false-positive and true numbers, ROC curves show the performance of a classification model at all classification thresholds. The area under the receiver operating characteristics (AUROC), namely, the entire two-dimensional area underneath the entire ROC curve, was used as the precision measurement. AUROC shows how much the model is capable of distinguishing between classes. The larger the AUROC value, the better will be the model at predicting different classes. R-package “ROCR” was utilized for the calculation of the AUROC curve.

To explore temporal changes in laboratory findings during hospitalization, differences between critical illness groups during follow-up in laboratory findings were estimated from linear mixed models with R package “nlme.”

Details of samples used at each stage of statistical analysis were depicted in Figure 1. All statistical analysis was conducted with R software (version 3.6.2, R Foundation), and p-values were computed from two-tailed tests of statistical significance with a type I error rate of 5%.

FIGURE 1

Figure 1. Study flowchart detailing which samples were utilized at each phase of statistical analysis. COVID-19: severe coronavirus disease 2019.

External Model Validation

To validate the generalizability of the risk scores, we used an independent cohort from hospitals in Hunan province including 566 patients. We collected the same variables required for calculating the risk score from the validation cohort and cross-checked them. The 432 patients with at least 80% data completeness of the 28 variables used for model development were selected. The laboratory findings were imputed and the risk score was calculated as described for the development cohort. To assess the discriminative ability, the AUCs were evaluated.

Results

Characteristics of the Cohorts

The development cohort with 1,363 patients, of which a total of 133 patients eventually developed critical illness (9.8%), from 4 hospitals in Hubei. The median follow-up time for patients was 14 days. The average (SD) age of patients in this cohort was 57.84 (16.29) years; 634 patients (46.52%) were men. The validation cohort included 566 patients with a mean (SD) age of 45.94 (15.33) years, 291 (51.41) were men. The median follow-up time for patients was 13 days. The critical illness eventually developed in 28 (4.24%) of these patients.

Prognostic Factors of Critical Illness

A total of 39 features were tested for associations with critical illness in the development cohort with univariate logistic regression analysis. The results of the 1,255 patients showed that 28 variables were prognosis factors for critical illness COVID-19 (p < 0.05, Table 1, Figure 2). The odds of critical illness were higher in patients older than 65 years. Laboratory results show that elevated CK-MB, neutrophils, PCT, α-HBDH, D-dimer, LDH, glucose, PT, APTT, RDW (SD and CV), fibrinogen, and AST were associated with a critical illness. Patients in the critical illness group showed lymphopenia and had a low rate of basophils, eosinophils, thrombopenia, red blood cell, hematocrit, hemoglobin concentration, and blood platelet count and represented decreased levels of K, Na, albumin, albumin to globulin ratio, and uric acid, compared with the non-critical illness group.

TABLE 1

Table 1. Laboratory characteristics among patients who did not or did develop critical illness in the development cohort.

FIGURE 2

Figure 2. Prognostic associations of clinical characteristics and laboratory findings in the development dataset. Unadjusted ORs (boxes) and corresponding 95% CIs (horizontal lines) for variables associated with the development of critical illness are represented. Box size is inversely proportional to the standard error of OR. The variables are stratified as quartiles. OR, odds ratio. CI, confidence interval.

Longitudinal Observations of Laboratory Variables

To determine the major clinical features that appeared during COVID-19 disease progression, the dynamic changes in 28 clinical laboratory parameters were measured within 2 days after hospital admission and associated with critical illness, namely, hematological and biochemical parameters, were recorded from day 3 to day 25 after hospital admission. The temporal changes in laboratory findings during hospitalization were explored (Figure 3). Baseline lymphocyte count was significantly lower in critical illness than in non-critical illness patients. Levels of CRP, D-dimer, LDH, and glucose were clearly elevated in the critical illness group compared with the non-critical illness group throughout the clinical course either in the developing dataset. Furthermore, we found that compared to that in the non-critical illness group, neutrophils, α-HBDH, and globulin were increased in the critical illness group, while eosinophils and albumin were decreased in the critical illness group.

FIGURE 3

Figure 3. Temporal changes in laboratory findings from illness onset in patients hospitalized with COVID-19. Temporal changes in neutrophils (A), lymphocytes (B), eosinophils (C), D-dimer (D), alpha hydroxybutyrate dehydrogenase (E), lactate dehydrogenase (F), C-reaction protein (G), albumin (H), and glucose (I) in the development dataset were presented. Differences between critical illness patients and non-critical illness patients were demonstrated with p-values calculated with mixed linear models. The dashed lines in black and red color show the lower and upper normal limits of each laboratory finding.

Construction of the Risk Models and their Performances

A total of 28 variables determined at hospital admission and associated with a critical illness (Figure 2) were included in the model development. Prediction models were constructed using LASSO logistic regression, SVR, ANN, RT, and MARS, their performance was evaluated by the ROC analysis (Figure 4). Although the predictive ability of ANN and SVR in the development cohort was better than other algorithms, the predictive ability using models of LASSO logistic regression and ANN outperformed the other algorithms in the validating dataset (Figure 4D). The LASSO logistic regression model was selected by us for its high predictive power and interpretability. In LASSO regression, after excluding irrelevant and redundant features (Figures 4A,B), 21 features remained for LASSO regression analysis, including age, whether take ARB drugs and blood test results, lymphocytes, neutrophils, blood platelet, thrombocytocrit, RDW (CV and SD), hematocrit, hemoglobin concentration, AST, CK-MB, albumin, LDH, glucose, K, Na, CRP, PCT, PT, APTT, fibrinogen, and uric acid. The risk score was constructed based on the coefficients from the LASSO logistic model (Table 2) and then converted into a probability with formulas presented in the method and materials section. By internal 100 times bootstrap validation, the mean AUC based on data from the development cohort was 0.83 (95% CI, 0.78–0.86) (Figure 4C). Variables utilized in the risk score for the validation cohort are shown in Table 3. The accuracy of the COVID risk score in the validation cohort was like that of the development cohort with an AUC in the validation cohort of 0.84 (Figure 4D).

FIGURE 4

Figure 4. Feature selection using the least absolute shrinkage and selection operator (LASSO) logistic regression model. (A) LASSO coefficient profiles of the 29 baseline features. (B) Tuning parameter (λ) selection in the LASSO model used 1,000-fold cross-validation via minimum criteria. Receiver operating characteristic curve for the performance of different machine learning techniques to distinguish individuals with COVID-19 from those with critical illness COVID-19 in the training cohort (C) and validation cohort 1 (D), respectively. AUC, area under the receiver operating characteristic curve. The true positive rate represents module sensitivity, whereas the false positive rate is one minus the specificity.

TABLE 2

Table 2. Coefficients of LASSO logistic regression model for predicting development of critical illness in 1,064 patients hospitalized with COVID-19 in the development dataset.

TABLE 3

Table 3. Laboratory characteristics of patients with COVID-19 in validation cohort.

Discussion

Early identification of patients with COVID-19 at risk of progression to critical illness disease will aid in better patient management and effective usage of healthcare resources. In this study, we unraveled that older age and higher levels of laboratory test indexes such as CRP, LDH, and glucose, and lower levels of laboratory findings such as lymphocytes and albumin on admission were associated with higher probabilities of critical illness COVID-19. In addition, a clinical risk score based on LASSO logistic regression was developed to predict the development of critical illness patients with COVID-19 with satisfactory accuracy according to AUC (0.83). Generally, the 21 variables required for estimating the probability of developing critical illness can be easily obtained from routine tests at hospital admission. The robustness and applicability of the risk score were confirmed in the independent validation dataset (AUC = 0.84).

Univariate analyses revealed that factors, namely, age, neutrophils, D-dimer, LDH, CRP, glucose, APTT, fibrinogen, AST, and several other biochemical parameters were associated with a critical illness. In addition, the dynamic profile of the significant laboratory findings was tracked. Levels of LDH, D-dimer, glucose, CRP, α-HBDH, and globulin are higher in the critical illness group compared with the non-critical illness group. And neutrophil counts and albumin are lower in the critical illness group compared with the non-critical illness group. A prediction model for critical illness was developed with 21 predictors that were found to be independently correlated with critical illness via multivariate LASSO logistic regression analysis. Previous studies have found several of these variables to be prognosis factors for patients with COVID-19. It has been reported that elderly patients were more commonly critically ill with COVID-19 (3, 26, 27) and have a higher probability of a death outcome (28, 29). Modelli and colleagues revealed that the 28-day fatality rate was associated with increasing age, hypertension, cardiovascular disease, and higher body mass index (17), in agreement with the previous work.

Lymphopenia, leukocytosis (with increased absolute neutrophil counts), eosinopenia, neutrophilia, increased CRP and PCT which reflects a persistent state of inflammation (30) may be related to cytokine storm and cellar immune deficiency induced by virus invasion (27, 31). Zhou et al. found lower lymphocyte counts and higher LDH in patients who died from COVID-19 (1). Injured alveolar epithelial cells could lead to the infiltration of lymphocytes, resulting in persistent lymphopenia (32, 33). Lymphopenia is a common characteristic in patients with COVID-19 and might play an important role in the disease process (34, 35). Zhang et al. noted that 53% of patients admitted with COVID-19 had eosinopenia on the day of hospital admission (36). Calabrese et al. reported that lymphocyte and platelet counts were the most important features able to stratify patients into different clinical clusters (37). Ewan et al. demonstrated that risk stratification was improved by blood and physiological parameters (C-reactive protein, neutrophil/lymphocyte ratio, and neutrophil count) measured at hospital admission (20). Such findings were consistent with this work. A higher level of LDH was an indication of the activity and severity of idiopathic pulmonary fibrosis and is one of the most important prognostic biomarkers of lung injury (37). LDH was reported to be higher in severe and patients who received ICU treatment with COVID-19 than in mild and non-ICU patients (27, 30, 38, 39), which is utilized as a valuable prognosis predictor (40, 41). In addition, patients with elevated CK-MB levels on hospital admission were at significantly increased risk of critical illness. Li and colleagues found that cardiac injury (elevated LDH and CK-MB levels) were associated with severe disease or ICU admission and death in patients with COVID-19 (42). Increased PT and APTT, decreased blood platelet, thrombocytocrit, and fibrinogen which reflect the coagulation activation might be associated with the sustained inflammatory response. Banoei et al. noted that prothrombin and lactate were the most differentiating biochemical markers in the mortality prediction model (18).

Since hyperglycemia is harmful to the management of inflammation and viremia, the association between the level of glucose and critical illness in COVID-19 viral infections is not surprising. Based on big data analysis with a cohort with 7,337 COVID-19 cases, Zhu et al. revealed that diabetics with better-controlled blood glucose were associated with a decreased death risk than diabetics with poorly controlled blood glucose (43). Banoei and colleagues demonstrated that disease, coronary artery disease, dementia, age > 65, and altered mental status were the topmost differentiating mortality predictors (22).

Previous studies have identified that 15–53% of cases reported abnormal levels of AST during disease progression (44–47). In a study conducted by Huang et al. (48), the elevation of AST was found in 8 (62%) of 13 patients in the ICU compared with 7 (25%) of 28 COVID-19 infected cases who did not need ICU care. Abnormal liver tests occur in most hospitalized patients with COVID-19 and may be associated with ICU admission, mechanical ventilation (48), and death (28, 48). Liver damage (decreased albumin and increased globulin) in patients with COVID-19 infections might be associated with the direct effect of the viral infection of liver cells, drug hepatotoxicity, or immune-mediated inflammation (37), such as cytokine storm and pneumonia-associated hypoxia.

Prediction models for the dynamic and control of COVID-19 infection found broad similarities with the features retained in our models, particularly regarding aging, hypertension, CRP, LDH, prothrombin, lactate, and neutrophil levels (14–24). The main advantage of the LASSO logistic regression is that the variable with a large parameter estimation is compressed to a smaller variable, while the variable with the smaller parameter estimate is compressed to 0. The parameter estimation of the LASSO analysis is continuous, which is suitable for model selection with high-dimensional data.

In the development dataset, we found that the discriminative abilities of SVR, ANN, RT, and MARS were outperforming that of LASSO logistic regression as evaluated by AUCs. However, in the independent validation dataset, the predictive ability of LASSO logistic regression was the best within all algorithms and was selected by us. The phenomenon that the model that incorporates the highest level of non-linearity displayed better in-sample prediction, but also yielded the worse out-of-sample performances may account for the over-fitting problem of the ANN, RT, MARS, and SVR algorithms (45). The linear Kernel function utilized in LASSO logistic regression performed badly in-sample but generated the best out-of-sample predictions.

There are inevitably limitations in our retrospective study. The primary one is incomplete laboratory findings in the electronic database and the lacking of CT images, which decreases the statistical power of the LASSO logistic regression model. Therefore, important information might be missed and further prospective studies are required. However, our model has a certain tolerance to missing data, as high performance as measured by AUC on the developing and external validation dataset for samples missing 20% of the predictors was achieved. Second, since the algorithms we tried are purely data-driven, the performances of these models may vary if developed with different datasets. We believe that more accurate models can be obtained with the increasing of available datasets. Third, the data for risk probability development and validation are from two provinces of China, which could potentially limit the generalizability of the risk model. Further studies on different populations all over the world with larger patient cohorts are needed to validate our findings.

Conclusion

In summary, this study identified 28 indicators (such as age, LDH, CRP, and lymphocytes) associated with critical illness of patients with COVID-19. The longitudinal laboratory variables were explored. A risk score to estimate the risk of developing critical illness among patients with COVID-19 was developed based on 21 variables independently associated with critical illness and commonly measured on hospital admission. The risk model is especially valuable for early detection and intervention of the incidence of critical illness COVID-19, thus making improvements to clinical strategies against COVID-19, optimizing the use of healthcare resources, and potentially reducing mortality in patients with COVID-19.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author Contributions

YF: conceptualization and writing. WZho: resources and data curation. TL, JL, KX, XM, LX, and JJ: resources. HZ: supervision. RL: project administration and supervision. WZha: funding acquisition. All authors contributed to the article and approved the submitted version.

Funding

This study was supported by the National Scientific Foundation of China (Nos. 81874329, 81573511, and 81522048).

Conflict of Interest

YF was employed by Cofoe Medical Technology Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We thank all study participants, their families, medical staff, and participating hospitals for their involvement and support in this study. We are grateful to the High Performance Computing Center of Central South University for assistance with the computations.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2022.880999/full#supplementary-material

References

1. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. (2020) 395:1054–62. doi: 10.1016/S0140-6736(20)30566-3

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in china: summary of a report of 72 314 cases from the Chinese center for disease control and prevention. JAMA. (2020) 323:1239–42. doi: 10.1001/jama.2020.2648

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Yang X, Yu Y, Xu J, Shu H, Xia J, Liu H, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Res Med. (2020) 8:475–81. doi: 10.1016/S2213-2600(20)30079-5

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Wu C, Chen X, Cai Y, Xia J, Zhou X, Xu S, et al. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Intern Med. (2020) 180:1–11. doi: 10.1001/jamainternmed.2020.0994

PubMed Abstract | CrossRef Full Text

5. Du RH, Liang LR, Yang CQ, Wang W, Cao TZ, Li M, et al. Predictors of mortality for patients with COVID-19 pneumonia caused by SARS-CoV-2: a prospective cohort study. The Eur Res J. (2020) 55:2000524. doi: 10.1183/13993003.00524-2020

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Siddiqi H, Mandeep M COVID-19 illness in native and immunosuppressed states: a clinical-therapeutic staging proposal. J. Heart Lung Transplant. (2020) 39:405–7. doi: 10.1016/j.healun.2020.03.012

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Zhao X, Li Y, Ge Y, Shi Y, Lv P, Zhang J, et al. Evaluation of nutrition risk and its association with mortality risk in severely and critically ill COVID-19 patients. JPEN J Parenter Enteral Nutr. (2021) 45:32–42. doi: 10.1002/jpen.1953

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Alhasan K, Shalaby M, Temsah MH, Aljamaan F, Shagal R, AlFaadhel T, et al. Factors that influence mortality in critically ill patients with SARS-CoV-2 infection: a multicenter study in the kingdom of Saudi Arabia. Healthcare. (2021) 9:1608. doi: 10.3390/healthcare9121608

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Jakob CEM, Mahajan UM, Oswald M, Stecher M, Schons M, Mayerle J, et al. Prediction of COVID-19 deterioration in high-risk patients at diagnosis: an early warning score for advanced COVID-19 developed by machine learning. Infection. (2021) 50:359–70. doi: 10.1007/s15010-021-01656-z

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Liu J, Liu Y, Xiang P, Pu L, Xiong H, Li C, et al. Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage. J Transl Med. (2020) 18:206. doi: 10.1186/s12967-020-02374-0

PubMed Abstract | CrossRef Full Text

11. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. (2020) 180:1081–9. doi: 10.1001/jamainternmed.2020.2033

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Liang W, Yao J, Chen A, Lv Q, Zanin M, Liu J, et al. Early triage of critically ill COVID-19 patients using deep learning. Nat Commun. (2020) 11:3543. doi: 10.1038/s41467-020-17280-8

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Schalekamp S, Huisman M, Dijk RA, Boomsma MF, Freire Jorge PJ, Boer WS, et al. Model-based prediction of critical illness in hospitalized patients with COVID-19. Radiology. (2021) 298:E46–54. doi: 10.1148/radiol.2020202723

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Jerome TH, Robert T. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. (2010) 33:1–22. doi: 10.18637/jss.v033.i01

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Kamran F, Tang S, Otles E, McEvoy S, Saleh N, Gong J, et al. Early identification of patients admitted to hospital for covid-19 at risk of clinical deterioration: model development and multisite external validation study. BMJ. (2022) 376:e068576. doi: 10.1136/bmj-2021-068576

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Wanga B, Mondal J, Samui P, Chatterjee AN, Yusuf A, Effect Effect of an antiviral drug control and its variable order fractional network in host COVID-19 kinetics. Eur Phys J Spec Top. (2022) 1–15. doi: 10.1140/epjs/s11734-022-00454-4

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Modelli LG, Sandes-Freitas TV, Requiao-Moura LR, Viana LA, Cristelli MP, Garcia VD, et al. Development and validation of a simple web-based tool for early prediction of COVID-19-associated death in kidney transplant recipients. Am J Transplant. (2022) 22:610–25. doi: 10.1111/ajt.16807

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Chatterjee AN, Ahmad B. A fractional-order differential equation model of COVID-19 infection of epithelial cells. Chaos Solitons Fractals. (2021) 147:110952. doi: 10.1016/j.chaos.2021.110952

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Mondal J, Samui P, Chatterjee N. Dynamical demeanour of SARS-CoV-2 virus undergoing immuneresponse mechanism in COVID-19 pandemic. Eur Phys. J Spec Top. (2022). doi: 10.1140/epjs/s11734-022-00437-5

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Rai RK, Khajanchi S, Tiwari PK, Venturino E, Misra AK. Impact of social media advertisements on the transmission dynamics of COVID-19 pandemic in India. J Appl Math Comput. (2021). https://doi.org/10.1007/s12190-021-01507-y

PubMed Abstract | Google Scholar

21. Chatterjee N, Fahad B, Muqrin A, Jayanta M, Ilyas K. SARS-CoV-2 infection with lytic and non-lytic immune responses: a fractional order optimal control theoretical study. Results Phys. (2021) 26:104260. doi: 10.1016/j.rinp.2021.104260

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Banoei MM, Dinparastisaleh R, Zadeh AV, Mirsaeidi M. Machine-learning-based COVID-19 mortality prediction model and identification of patients at low and high risk of dying. Crit Care. (2021) 25:328. doi: 10.1186/s13054-021-03749-5

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Chatterjee N, Fahad B. A model for sars-cov-2 infection with treatment. Comput Math Methods Med. (2020) 1352982. doi: 10.1101/2020.04.24.20077958

CrossRef Full Text | Google Scholar

24. Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. (2020) 395:507–13. doi: 10.1016/S0140-6736(20)30211-7

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA. (2020) 323:1061–9. doi: 10.1001/jama.2020.1585

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Ruan Q, Yang K, Wang W, Jiang L, Song J. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. (2020) 46:846–8. doi: 10.1007/s00134-020-05991-x

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Chen R, Liang W, Jiang M, Guan W, Zhan C, Wang T, et al. Risk factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in China. Chest. (2020) 158:97–105.

PubMed Abstract | Google Scholar

28. Hu C, Li J, Xing X, Gao J, Zhao S, Xing L, et al. The effect of age on the clinical and immune characteristics of critically ill patients with COVID-19: a preliminary report. PLoS ONE. (2021) 16:e0248675. doi: 10.1371/journal.pone.0248675

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Bajwa EK, Khan UA, Januzzi JL, Gong MN, Thompson BT, Christiani DC. Plasma C-reactive protein levels are associated with improved outcome in ARDS. Chest. (2009) 136:471–80. doi: 10.1378/chest.08-2413

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Li F, Li W, Farzan M, Harrison SC. Structure of SARS coronavirus spike receptor-binding domain complexed with receptor. Science. (2005) 309:1864–8. doi: 10.1126/science.1116480

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Ge XY, Li JL, Yang XL, Chmura AA, Zhu G, Epstein JH, et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature. (2013) 503:535–8. doi: 10.1038/nature12711

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Chan JF, Yuan S, Kok KH, To KK, Chu H, Yang J, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. (2020) 395:514–23. doi: 10.1016/S0140-6736(20)30154-9

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Ziadi A, Hachimi A, Admou B, Hazime R, Brahim I, Douirek F, et al. Lymphopenia in critically ill COVID-19 patients: a predictor factor of severity and mortality. Intern J Hematol. (2021) 43:e38–40. doi: 10.1111/ijlh.13351

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Zhang J, Dong X, Cao Y, Yuan Y, Yang Y, Yan Y, et al. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan, China. Allergy. (2020) 75:1730–41. doi: 10.1111/all.14238

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Calabrese F, Pezzuto F, Boscolo A, Lunardi F, Giraudo C, Giraudo C, et al. Machine learning-based analysis of alveolar and vascular injury in SARS-CoV-2 acute respiratory failure. J Pathol. (2021) 254:173–84. doi: 10.1002/path.5653

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Chen T, Wu D, Chen H, Yan W, Yang D, Chen G, et al. Clinical characteristics of 113 deceased patients with coronavirus disease 2019: retrospective study. BMJ. (2020) 368:m1295. doi: 10.1136/bmj.m1295

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Drent M, Cobben NA, Henderson RF, Wouters EF, Dieijen-Visser M. Usefulness of lactate dehydrogenase and its isoenzymes as indicators of lung damage or inflammation. Eur Respir J. (1996) 9:1736–42. doi: 10.1183/09031936.96.09081736

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Yan L, Zhang HT, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. (2020) 2:283–8. doi: 10.1038/s42256-020-0180-7

CrossRef Full Text | Google Scholar

39. Zhang S, Guo M, Duan L, Wu F, Hu G, Wang Z, et al. Development and validation of a risk factor-based system to predict short-term survival in adult hospitalized patients with COVID-19: a multicenter, retrospective, cohort study. Crit Care. (2020) 24:438. doi: 10.1186/s13054-020-03123-x

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Mo P, Xing Y, Xiao Y, Deng L, Zhao Q, Wang H, et al. Clinical characteristics of refractory COVID-19 pneumonia in Wuhan, China. Clin Infect Dis. (2021) 73:e4208–13 doi: 10.1093/cid/ciaa270

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Li X, Pan X, Li Y, An N, Xing Y, Yang F, et al. Cardiac injury associated with severe disease or ICU admission and death in hospitalized patients with COVID-19: a meta-analysis and systematic review. Crit Care. (2020) 24:468. doi: 10.1186/s13054-020-03183-z

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Zhu L, She ZG, Cheng X, Qin JJ, Zhang XJ, Cai J, et al. Association of blood glucose control and outcomes in patients with COVID-19 and pre-existing type 2 diabetes. Cell Metab. (2020) 31:1068–77.e3. doi: 10.1016/j.cmet.2020.04.021

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. (2020) 395:497–506. doi: 10.1016/S0140-6736(20)30183-5

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Shi H, Han X, Jiang N, Cao Y, Alwalid O, Gu J, et al. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. Lancet Infect Dis. (2020) 20:425–34. doi: 10.1016/S1473-3099(20)30086-4

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Eastin C, Eastin T. Clinical characteristics of coronavirus disease 2019 in Gansu province. J Emerg Med. (2020) 58:711–2. doi: 10.21037/apm-20-887

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Cai Q, Huang D, Yu H, Zhu Z, Xia Z, Su Y, et al. COVID-19: Abnormal liver function tests. J Hepatol. (2020) 73:566–74. doi: 10.1016/j.jhep.2020.04.006

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Fan Z, Chen L, Li J, Cheng X, Yang J, Tian C, et al. Clinical features of COVID-19-related liver functional abnormality. Clin Gastroenterol Hepatol. (2020) 18:1561–6. doi: 10.1016/j.cgh.2020.04.002

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Peng Y, Nagata MH. An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data. Chaos Solitons Fractals. (2020) 139:110055. doi: 10.1016/j.chaos.2020.110055

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: COVID-19, risk factors, critical illness, machine learning, LASSO regression

Citation: Fu Y, Zhong W, Liu T, Li J, Xiao K, Ma X, Xie L, Jiang J, Zhou H, Liu R and Zhang W (2022) Early Prediction Model for Critical Illness of Hospitalized COVID-19 Patients Based on Machine Learning Techniques. Front. Public Health 10:880999. doi: 10.3389/fpubh.2022.880999

Received: 22 February 2022; Accepted: 13 April 2022;
Published: 24 May 2022.

Edited by:

Subhas Khajanchi, Presidency University, India

Reviewed by:

Amar Nath Chatterjee, K.L.S. College, India
Fahad Al Basir, Asansol Girls' College, India

Copyright © 2022 Fu, Zhong, Liu, Li, Xiao, Ma, Xie, Jiang, Zhou, Liu and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rong Liu, bGl1cm9uZ2h5d0Bjc3UuZWR1LmNu; Wei Zhang, Y3N1emhhbmd3ZWlAY3N1LmVkdS5jbg==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.