A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis

Zhao, Qin-Yu; Liu, Le-Ping; Luo, Jing-Chao; Luo, Yan-Wei; Wang, Huan; Zhang, Yi-Jie; Gui, Rong; Tu, Guo-Wei; Luo, Zhe

doi:10.3389/fmed.2020.637434

ORIGINAL RESEARCH article

Front. Med. , 21 January 2021

Sec. Intensive Care Medicine and Anesthesiology

Volume 7 - 2020 | https://doi.org/10.3389/fmed.2020.637434

This article is part of the Research Topic Clinical Application of Artificial Intelligence in Emergency and Critical Care Medicine, Volume I View all 20 articles

A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis

$\nQin-Yu Zhao,&#x;$ Qin-Yu Zhao^1,2^†

Le-Ping Liu¹^†

Jing-Chao Luo³^†

Yan-Wei Luo¹

Huan Wang³

Yi-Jie Zhang³

Rong Gui¹^*

Guo-Wei Tu³^*

Zhe Luo^3,4^*

¹Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
²College of Engineering and Computer Science, Australian National University, Canberra, ACT, Australia
³Department of Critical Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, China
⁴Department of Critical Care Medicine, Xiamen Branch, Zhongshan Hospital, Fudan University, Xiamen, China

Background: Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients.

Objectives: Our study aimed to develop and validate machine-learning models to dynamically predict the risk of SIC in critically ill patients with sepsis.

Methods: Machine-learning models were developed and validated based on two public databases named Medical Information Mart for Intensive Care (MIMIC)-IV and the eICU Collaborative Research Database (eICU-CRD). Dynamic prediction of SIC involved an evaluation of the risk of SIC each day after the diagnosis of sepsis using 15 predictive models. The best model was selected based on its accuracy and area under the receiver operating characteristic curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. A compact model was developed, based on 15 features selected according to their importance and clinical availability. These two models were compared with Logistic Regression and SIC scores in terms of SIC prediction.

Results: Of 11,362 patients in MIMIC-IV included in the final cohort, a total of 6,744 (59%) patients developed SIC during sepsis. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869; 95% CI: 0.850–0.886). Coagulation profile and renal function indicators were the most important features for predicting SIC. A compact model was developed with an AUC of 0.854 (95% CI: 0.832–0.872), while the AUCs of Logistic Regression and SIC scores were 0.746 (95% CI: 0.735–0.755) and 0.709 (95% CI: 0.687–0.733), respectively. A cohort of 35,252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in the external validation were 0.842 (95% CI: 0.837–0.846) and 0.803 (95% CI: 0.798–0.809), respectively, which were still larger than those of Logistic Regression (0.660; 95% CI: 0.653–0.667) and SIC scores (0.752; 95% CI: 0.747–0.757). Prediction results were illustrated by SHapley Additive exPlanations (SHAP) values, which made our models clinically interpretable.

Conclusions: We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores.

Introduction

Sepsis, defined as life-threatening organ dysfunction caused by a dysregulated host response to infection, remains the first leading cause of mortality in critically ill patients (1, 2). Coagulopathy is one of the major complications of sepsis, leading to a higher risk of thrombosis, the deterioration of organ failure, and an increased mortality rate (3–6). However, the usefulness of anticoagulant therapies has not been confirmed in septic patients (7, 8). Recent observational studies and subgroup analyses of large-scale randomized controlled trials revealed that anticoagulant therapies might result in a significant reduction in mortality risk and improved outcome in septic patients with coagulopathy (9–12). In contrast, anticoagulant therapies in patients without coagulopathy should be avoided due to the increased risk of bleeding with no survival benefit (11, 13). Furthermore, some drugs commonly administered in septic patients, such as linezolid and vancomycin, may alter coagulation function through various mechanisms and should be used with caution in patients with a high risk of coagulopathy (14). These study results have heightened the need for early identification of coagulopathy in septic patients in a timely way.

Sepsis-induced coagulopathy (SIC) criteria were developed by members of the Scientific and Standardization Committee (SSC) on Disseminated Intravascular Coagulation (DIC) of the International Society of Thrombosis and Haemostasis (ISTH) in 2017 (15) (Supplementary Table 1). The criteria are a scoring system designed to identify patients with “sepsis and coagulation disorders.” SIC is defined as a score ≥ 4. It was found that the mortality rate increased as the SIC score rose and exceeded 30% at a score of 4 (15). Compared with DIC, SIC is more relevant for the updated Sepsis-3 criteria (1, 16). In addition, observational evidence has shown that SIC preceded DIC in most cases (17, 18). As a result, the new guideline in 2019 recommended that septic patients with thrombocytopenia (platelet count <150 × 10⁹/L) should be screened, first using SIC diagnostic criteria and then using ISTH DIC diagnostic criteria (16). However, the SIC score mainly serves as a diagnostic system; there is still a lack of reliable predictive tools for SIC in clinical practice.

In recent years, the emergence of new machine-learning algorithms has enabled us to predict disease events dynamically based on huge and complicated clinical information. Advanced machine-learning models can fit high-order relationships between covariates and outcomes, and therefore, they excel in the analysis of complex signals in data-rich environments (19–22). The aims of this study were to develop and validate to develop and validate machine-learning models for the early dynamic prediction of SIC, and to assess the risk features by interpreting the final model.

Materials and Methods

Source of Data

We conducted this retrospective study based on two sizeable critical care databases the Medical Information Mart for Intensive Care (MIMIC)-IV (23) and the eICU Collaborative Research Database (eICU-CRD) (24). The MIMIC-IV database is an updated version of MIMIC-III and currently contains comprehensive and high-quality data of patients admitted to intensive care units (ICUs) at the Beth Israel Deaconess Medical Center between 2008 and 2019. The other database, eICU-CRD, is a multicenter database comprising de-identified health data associated with over 200,000 admissions to ICUs across the United States between 2014 and 2015. One author (QZ) obtained access to both databases and was responsible for data extraction. The study was reported according to the recommendations of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement (25).

Selection of Participants

In MIMIC-IV, patients who fulfilled the definition of sepsis between 2008 and 2019 were included. According to the Sepsis-3 criteria, sepsis was defined as a suspected infection combined with an acute increase in Sequential Organ Failure Assessment (SOFA) score ≥ 2 (1). Patients with prescriptions of antibiotics and sampling of bodily fluids for microbiological culture were considered to have suspected infection. In line with previous research, when the antibiotic was given first, the microbiological sample must have been collected within 24 h; when the microbiological sampling occurred first, the antibiotic must have been administered within 72 h (26). Hourly SOFA was evaluated based on the clinical and laboratory data. In eICU-CRD, microbiology data were not well populated due to the limited availability of microbiology interfaces; instead, infection was identified according to documented diagnosis.

Only patients who were older than 18 years and stayed in the ICU for more than 24 h were included. No patients were excluded due to missing values. We made no attempt to estimate the sample size of the study; instead, all eligible patients in MIMIC-IV and eICU-CRD were included to maximize the statistical power of the predictive model.

Outcome (SIC)

We annotated patients' every day when the sepsis definition was fulfilled with their current coagulation state according to the SIC criteria, as recommended (16). Specifically, the worst daily values of SIC-related indicators were extracted. Then daily repeated scoring was performed. A patient was annotated as SIC positive if he or she had a SIC score ≥ 4 on that day.

Predictors of SIC

Clinical and laboratory variables were extracted during sepsis. For some variables with multiple measurements, average values were assessed. For the prediction of SIC, 88 variables were collected (Supplementary Table 2), including patient characteristics (age, gender, ethnicity, admission type), vital signs (respiratory rate, blood pressure, heart rate, SpO₂, and temperature), laboratory data (blood gas, routine blood analysis, liver function, renal function, and coagulation profile), transfusion (red blood cells, platelets, and fresh frozen plasma) and urine output. Comorbidities were also collected based on the recorded International Classification of Diseases (ICD)-9 and ICD-10 codes, including hypertension, diabetes mellitus, chronic obstructive pulmonary disease, congestive heart failure, myocardial infarction, chronic kidney disease, leukemia, stroke, cancer, and liver disease. Lastly, medications such as heparin, antibiotics and vasopressors, continuous renal replacement therapy (CRRT), and mechanical ventilation (MV) were collected.

Statistical Analysis

Baseline characteristics on the first sepsis day were compared between SIC and non-SIC groups in MIMIC-IV. Values are presented as the means [standard deviations] (if normal) or medians [interquartile ranges] (if non-normal) for continuous variables, and total numbers [percentages] for categorical variables. Comparisons were made using the Student t-test or rank-sum test for continuous variables, and the Chi-square test or Fisher's exact test for categorical variables, as appropriate.

As shown in Figure 1A, our model generated a continuous prediction score based on the above-mentioned 88 variables on each day when patients were diagnosed with sepsis. The scores assessed the risk of SIC in the following day. Prediction was not performed if SIC criteria were already fulfilled on that day; when the patients recovered from SIC, our model then restarted to predict if they still had sepsis. None of the imputation methods were used for advanced boosting machine-learning methods as they automatically handle missing values; in contrast, missing values were imputed using the median values for continuous variables or mode values for categorical values when training other models. As shown in Figure 1B, we preliminarily compared the prediction performance of 15 algorithms using the PyCaret Python package (version 1.0.0), an open-sourced, automated machine-learning workflow. The assessment process was performed using 10-fold cross-validation. Accuracy and area under the receiver operating characteristic curve (AUC) were calculated on each fold and pooled to evaluate each model. The algorithm with the highest accuracy and the largest AUC was selected. Then, we performed fine-grained hyperparameter adjustment for the potential model using the Bayesian Optimization Algorithm. This algorithm is an efficient constrained global optimization tool, which was performed using the functions of the bayes_opt Python package (version 1.2.0) (27). The optimized model was the best model for SIC prediction in this study and was defined as the full model.

FIGURE 1

Figure 1. Schematic illustration of the study design. (A) Design of dynamic prediction in our study. Daily assessment was performed from the time when sepsis was diagnosed. If SIC criteria were not fulfilled, the risk of SIC the next day was predicted by our model. Prediction stopped when SIC was diagnosed, and restarted when patients recovered from SIC. (B) Schematic illustration of model development. We compared the discrimination of 15 machine-learning models using 10-fold cross-validation. The model with the best accuracy and greatest AUC was chosen. Fine-grained hyperparameter adjustment was performed using Bayesian Optimization. Fifteen features were selected according to their SHAP values and clinical availability. A compact model was developed based on the selected features. Lastly, these two models were validated in eICU-CRD. ICU, intensive care unit; SIC, sepsis-induced coagulopathy; SHAP, SHapley Additive exPlanations; MIMIC-IV, Medical Information Mart for Intensive Care-IV; C.V., cross-validation; eICU-CRD, the eICU Collaborative Research Database.

The effects of features on prediction scores were measured using the functions of the SHapley Additive exPlanations (SHAP) Python package (version 0.32.1), which assessed the importance of each feature using a game-theoretic approach based on the validation set (28). We selected 15 features which had great importance and were as easy as possible to collect in the clinical setting (Supplementary Table 2). A compact model was then trained for SIC prediction based on the selected features. Although this model was not as accurate as the full model, it might be more practical in clinical settings.

External validation of the full and compact models was performed in eICU-CRD. The median and 95% confidence intervals of AUC were calculated using the Bootstrap Resampling technique with 1,000 iterations. Conventional Logistic Regression and the SIC scoring system were assessed to predict the risk of SIC and were compared with our models in both internal and external validations.

All analyses were performed using Python (version 3.7.6), and p < 0.01 was considered statistically significant.

Results

Baseline Characteristics

As shown in Figure 2, of 12,381 septic patients in MIMIC-IV, 11,362 were included in the final cohort. A total of 6,744 patients developed SIC during sepsis, and 4,618 patients did not. A cohort of 35,252 septic patients in eICU-CRD was included as external dataset.

FIGURE 2

Figure 2. Flow chart of patient selection.

Variable values on the first day of sepsis in MIMIC-IV were analyzed; the differences in characteristics were compared (Supplementary Table 3). The SIC group had a higher rate of comorbidities, higher SAPS-II scores (44 [35, 54] vs. 37 [30, 45]; p < 0.001), higher SOFA scores (6 [4, 9] vs. 4 [3, 5]; p < 0.001), longer prothrombin time (PT) (16.9 [14.3, 21.8] vs. 13.0 [11.9, 14.1]; p < 0.001), less urine output (790 [300, 1,545] vs. 1,205 [605, 2,015]; p < 0.001), higher rates of linezolid (2.9 vs. 1.7%; p < 0.001), vancomycin (55.6 vs. 46.0%; p < 0.001), CRRT (5.0 vs. 0.6%; p < 0.001), vasopressors (46.8 vs. 23.2%; p < 0.001) and MV (50.3 vs. 40.6%; p < 0.001), and higher 28-day mortality (27.0 vs. 10.8%; p < 0.001) than the non-SIC group. The length of hospital stay was also longer in the SIC group than in the non-SIC group (14.4 [7.9, 26.7] vs. 10.9 [6.5, 19.5], p < 0.001).

Comparison of 15 Models

Daily data were extracted, and 16,183 samples for prediction in MIMIC-IV were created. Of these samples, 1,489 were labeled as positive (SIC the next day), 14,694 were labeled as negative (still non-SIC the next day). The prediction performances of the various models are listed in Table 1. As shown, Logistic Regression had an acceptable performance (accuracy: 0.908; AUC: 0.746). Ensemble learning algorithms had better accuracy and larger AUC than others, such as Categorical Boosting (CatBoost) (accuracy: 0.913; AUC: 0.841), Light Gradient Boosting (accuracy: 0.912; AUC: 0.835) and Random Forest Classifier (accuracy: 0.909; AUC: 0.760). The CatBoost model had the most powerful discrimination for predicting SIC risk, and we optimized this model in the next step.

TABLE 1

Table 1. Performance of different models in internal validation.

Full and Compact Models

Fifteen iterations of Bayesian optimization were performed. The hyperparameter search domains and final settings are listed in Supplementary Table 4. The optimized CatBoost model had the greatest AUC in our study (0.869; 95% CI: 0.850–0.886). SHAP values were calculated and are plotted in Figure 3. The summary plot sorts features by the sum of SHAP value magnitudes over all samples and shows the distribution of the impact that each feature has on the full model output. As shown, the coagulation profile (platelet, International Normalized Ratio, PT) and renal function indicators (urine output, creatinine) are the most important features for distinguishing the SIC and non-SIC groups. Fifteen features were selected based on their SHAP values and clinical availability. The compact CatBoost model was built based on the selected features. It had a slightly smaller AUC (0.854; 95% CI: 0.832–0.872), but is considered more practical in clinical practice. The medians and 95% confidence intervals of AUCs are plotted in Figure 4 to compare the discrimination of different methods in MIMIC-IV. As shown, our two models outperformed conventional Logistic Regression (0.746; 95% CI: 0.735–0.755) and the SIC scoring system (0.709; 95% CI: 0.687–0.733) in terms of SIC prediction.

FIGURE 3

Figure 3. Distribution of the impact each feature had on the full model output estimated using the SHapley Additive exPlanations (SHAP) values. The plot sorts features by the sum of SHAP value magnitudes over all samples. The color represents the feature value (red high, blue low). The x axis measures the impact on the model output (right positive, left negative). Taking the feature platelet as an example, red points are on the left whereas blue points are on the right. This means prediction scores will be smaller when patients have a low level of platelets. PT, prothrombin time; INR, international normalized ratio; SIC, sepsis-induced coagulopathy; SIC platelet, platelet term in the SIC score; SOFA, sequential organ failure assessment; PTT, Partial Thromboplastin Time; BMI, body mass index; MAP, mean arterial pressure; WBC, white blood cell count; RDW, red cell distribution width; MV, mechanical ventilation.

FIGURE 4

Figure 4. AUCs of four predictive methods in internal (MIMIC-IV) and external (eICU-CRD) validations. AUCs of our two models, Logistic Regression and SIC scores were assessed using the Bootstrap Resampling technique with 1,000 iterations. The heights of the bars represent the median AUCs, while the error bars represent the 95% confidence intervals. Full, the full model; Comp, the compact model; LR, Logistic Regression; SIC, the sepsis-induced coagulopathy criteria; AUC, area under receiver operating characteristic curve; MIMIC-IV, Medical Information Mart for Intensive Care-IV; eICU-CRD, the eICU Collaborative Research Database.

Prediction Performance in eICU-CRD

The results of external validation are shown in Figure 4 ([0.842; 95% CI: 0.837–0.846] for the full model, and [0.803; 95% CI: 0.798–0.809] for the compact model). It can be seen that the SIC scoring system had better predictive power (0.752; 95% CI: 0.747–0.757) than in MIMIC-IV but its AUC was still worse than those of our two models (p < 0.001), while Logistic Regression had the poorest generalization ability (0.660; 95% CI: 0.653–0.667). The sensitivity and specificity analysis of the four predictive methods is summarized in Table 2.

TABLE 2

Table 2. Performance of the final models and SIC scores in internal and external validations.

Model performance in different patient cohorts in eICU-CRD is shown in Figure 5. As shown, the two models had the greatest AUC for patients who had APACHE-IV scores between 81 and 100, who were younger than 65 years, or who were admitted to the NICU and SICU. The two models maintained good performance over four regions of the United States. In addition, the two models had better discrimination when sepsis lasted for several days. A similar sub-cohort analysis was also performed in MIMIC-IV (Supplementary Figure 1).

FIGURE 5

Figure 5. Model performance in different patient cohorts in eICU-CRD. Different validation sets were derived based on APACHE-IV (A), age (B), region of the United States (C), ethnicity (D), time since sepsis onset (E) and unit type (F). AUC of the full and the compact models in each set was measured using the Bootstrap Resampling technique. The colored area represents 95% confidence intervals. Full, the full model; Comp, the compact model; AUC, area under receiver operating characteristic curve; APACHE-IV, Acute Physiology and Chronic Health Evaluation-IV; CICU, cardiac intensive care unit; CSICU, cardiac surgical intensive care unit; CTICU, cardiothoracic intensive care unit; MICU, medical intensive care unit; NICU, neuro intensive care unit; SICU, surgical intensive care unit.

Model Interpretation

The summary plot of SHAP in Figure 3 provides an overview of the impact of features on the final models. Additionally, the prediction results of two specific instances are explained in Figure 6. The bars in red and blue represent risk factors and protective factors, respectively; longer bars represent greater feature importance. For the example in Figure 6A, although the patient's coagulation profile was normal, she had a poor circulatory status with a high serum lactate level and the vasopressor administration. The model successfully predicted that she would have SIC the next day. For the example in Figure 6B, the patient's condition was more moderate, and our model predicted a low-risk value.

FIGURE 6

Figure 6. Explanation of the prediction results for specific instances. The base value (−3.33) is the average value of the predictive model; the output values are the predicted SIC risks. The bars in red and blue represent risk factors and protective factors, respectively; longer bars mean greater feature importance. Here, these values are the model outputs before the SoftMax layer, and therefore, they are not equal to the final predicted probabilities. This figure shows the explanation for a high-risk instance (A) and a low-risk instance (B). RDW, red cell distribution width; PT, prothrombin time; WBC, white blood cell count; PTT, Partial Thromboplastin Time; INR, international normalized ratio; MAP, mean arterial pressure; BMI, body mass index.

Website-Based Tool

A website-based tool was established for clinicians to use the compact model, http://www.aimedicallab.com/tool/aiml-sicrisk.html. The SIC risk in the following day can be assessed by using this tool, and interpretation of the prediction result in the instance level will be shown to the user.

Discussion

To the best of our knowledge, this is first attempt to apply machine-learning models for the dynamic prediction of SIC. Our study developed and validated two variants of dynamic machine-learning models, providing an accurate predictive tool for SIC in sepsis patients.

In this study, we reconfirmed that coagulopathy worsens the clinical outcomes of septic patients (15). As shown in Supplementary Table 3, SIC can lead to a higher mortality rate and longer length of hospital/ICU stay. In addition, SIC patients received more advanced antibiotics (linezolid and vancomycin), implying a more severe state of infection. On the other hand, the administration of these drugs may also alter coagulation function through various mechanisms (29, 30). As a result, early identification of septic patients with high coagulopathy risks is of great importance.

Currently, there is a lack of reliable tools for the early prediction of coagulopathy in septic patients. Our study demonstrated that the family of gradient boosting algorithms, such as CatBoost, Light Gradient Boosting and Extreme Gradient Boosting, can predict SIC with higher accuracy than others. In short, gradient boosting is a powerful machine-learning technique that iteratively trains a weak classifier (e.g., decision tree) to fit residuals of previous models (31). CatBoost, one of gradient boosting algorithms, showed the greatest AUC in our study, partly because it had two main advantages. First, it successfully handles categorical features and deals with them during training instead of preprocessing time (32). This means that categorical features no longer need to be encoded, and a CatBoost model can be developed based on raw data. Another advantage of this algorithm is that it uses a new schema to calculate leaf values when selecting the tree structure. The schema helps to reduce overfitting, a major problem that constrains the generalization ability of machine-learning models (32).

In this study, we developed two variants of CatBoost models that can identify patients with a high risk of SIC and provide clinical decision-makers with more information. As shown in Figure 5, our models had comparable AUCs in different patient cohorts, demonstrating that machine-learning models based on big data have good generalization capability.

In general, based on more valuable variables, models have better discrimination but worse clinical usability. Therefore, in our study, two model variants were developed for different application scenarios. The full model predicted SIC based on 88 clinical variables and achieved the highest AUC in this study. In the external validation, the full model maintained good discrimination with only a slight reduction in AUC. However, it is difficult to collect 88 variables and apply this model. As a result, the full model is recommended in hospitals with a well-designed clinical data system. By contrast, the compact model was trained based on 15 selected variables. Under the condition of ensuring accuracy, it achieved practicality as far as possible. In addition, a website tool was developed to help clinicians use the compact model in clinical practice. By logging on to the website and entering the values of 15 variables, our compact model will give the prediction result, and interpretation of the prediction result will be shown to the user.

By interpreting the full model, it was found that many clinical variables can help to indicate the risk of SIC. In this study, coagulopathy profile was found to be the most important variable in predicting SIC followed by renal function indicators (urine output and creatinine). As shown in Figure 3, patients with poorer renal function (less urine output and higher serum creatinine) tended to have a higher risk of SIC. Also, body mass index (BMI), vital signs (heart rate and mean arterial pressure), laboratory tests (such as lactate and white blood cell count), the use of MV and vasopressors, and SAPS-II scores can help assess the risk of SIC. In addition, prediction results can be illustrated at the instance level, as shown in Figure 6, which makes our model clinically interpretable.

Several limitations of this study should be considered. Firstly, only septic adults in ICUs were included, whereas hospitalized sepsis cases were not analyzed. In addition, in consideration of the immaturity of the coagulation system in children, especially newborns, more research is needed on SIC in children with sepsis. Secondly, our models screen out patients with a high risk of SIC but do not indicate who will benefit from anticoagulant therapy. It is still up to clinicians to decide whether to administer anticoagulant agents. However, the process from sepsis to severe coagulopathy is a continuous condition arising from a coagulation disorder. Early and accurate prediction of SIC can provide more time for clinicians to adjust treatment strategies, and study the potential effect of anticoagulant therapy in the early stage. Thirdly, this is a retrospective observational study. Missing data and input errors exist, despite the very high quality of the MIMIC-IV and eICU-CRD databases. Therefore, prospective validation is still required in the future. Compared with septic shock, for which advances have been made in recent years, giving rise to significant survival improvements, there is still a long way to go in the diagnosis and management of sepsis-associated coagulopathy.

Conclusions

In conclusion, the present study developed two variants of the CatBoost model, which can discriminate septic patients who would and would not develop SIC.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://mimic-iv.mit.edu/; https://eicu-crd.mit.edu/.

Ethics Statement

The study was an analysis of two third-party anonymized publicly available databases with pre-existing institutional review board (IRB) approval.

Author Contributions

Q-YZ, L-PL, and J-CL: conception and design. RG, G-WT, and ZL: administrative support. Q-YZ: collection and assembly of data. Q-YZ and L-PL: data analysis and interpretation. All authors: manuscript writing and final approval of manuscript.

Funding

This article was supported by grants from the Research Funds of Shanghai Municipal Health Commission (2019ZB0105), Natural Science Foundation of Shanghai (20ZR1411100), Program of Shanghai Academic/Technology Research Leader (20XD1421000), National Natural Science Foundation of China (82070085), Clinical Research Funds of Zhongshan Hospital (2020ZSLC38 and 2020ZSLC27), and Smart Medical Care of Zhongshan Hospital (2020ZHZS01).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank the Massachusetts Institute of Technology and the Beth Israel Deaconess Medical Center for the MIMIC project. We also would like to thank the Philips eICU Research Institute and Philips Healthcare for their contribution to the eICU-CRD project.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2020.637434/full#supplementary-material

Supplementary Figure 1. Model performance in different patient cohorts in MIMIC-IV.

Supplementary Figure 2. Model interpretation of the full model in eICU-CRD.

Supplementary Figure 3. Model interpretation of the compact model in eICU-CRD.

Supplementary Table 1. Sepsis-induced coagulopathy (SIC) criteria.

Supplementary Table 2. Predictors extracted in MIMIC-IV and eICU-CRD.

Supplementary Table 3. Baseline characteristics between the SIC and non-SIC groups in the MIMIC-IV cohort.

Supplementary Table 4. Hyperparameter search domain in Bayesian optimization and final settings.

Supplementary Table 5. Results of logistic regression.

References

1. Singer M, Deutschman CS, Seymour CW., Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.20160287

CrossRef Full Text | Google Scholar

2. Martin GS, Mannino DM, Eaton S, Moss M. The epidemiology of sepsis in the United States from 1979 through 2000. N Engl J Med. (2003) 348:1546–54. doi: 10.1056/NEJMoa022139

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Lyons PG, Micek ST, Hampton N, Kollef MH. Sepsis-associated coagulopathy severity predicts hospital mortality. Crit Care Med. (2018) 46:736–42. doi: 10.1097/CCM0000000000002997

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Levi M, van der Poll T. Coagulation and sepsis. Thromb Res. (2017) 149:38–44. doi: 10.1016/j.thromres.2016.11007

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Levi M, Ten Cate H. Disseminated intravascular coagulation. N Engl J Med. (1999) 341:586–92. doi: 10.1056/NEJM199908193410807

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Zhao H, Cai X, Liu N, Zhang Z. Thromboelastography as a tool for monitoring blood coagulation dysfunction after adequate fluid resuscitation can predict poor outcomes in patients with septic shock. J Chin Med Assoc. (2020) 83:674–7. doi: 10.1097/JCMA0000000000000345

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Allingstrup M, Wetterslev J, Ravn FB, Moller AM, Afshari A. Antithrombin III for critically ill patients. Cochrane Database Syst Rev. (2016) 2:CD005370. doi: 10.1002/14651858.CD005370pub3

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Warren BL, Eid A, Singer P, Pillay SS, Carl P, Novak I, et al. Caring for the critically ill patient. High-dose antithrombin III in severe sepsis: a randomized controlled trial. JAMA. (2001) 286:1869–78. doi: 10.1001/jama.286.151869

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Dhainaut JF, Yan SB, Joyce DE, Pettila V, Basson B, Brandt JT, et al. Treatment effects of drotrecogin alfa (activated) in patients with severe sepsis with or without overt disseminated intravascular coagulation. J Thromb Haemost. (2004) 2:1924–33. doi: 10.1111/j.1538-7836.2004.00955x

CrossRef Full Text | Google Scholar

10. Iba T, Gando S, Thachil J. Anticoagulant therapy for sepsis-associated disseminated intravascular coagulation: the view from Japan. J Thromb Haemost. (2014) 12:1010–9. doi: 10.1111/jth12596

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Kienast J, Juers M, Wiedermann CJ, Hoffmann JN, Ostermann H, Strauss R, et al. Treatment effects of high-dose antithrombin without concomitant heparin in patients with severe sepsis with or without disseminated intravascular coagulation. J Thromb Haemost. (2006) 4:90–7. doi: 10.1111/j.1538-7836.2005.01697x

CrossRef Full Text | Google Scholar

12. Umemura Y, Yamakawa K, Ogura H, Yuhara H, Fujimi S. Efficacy and safety of anticoagulant therapy in three specific populations with sepsis: a meta-analysis of randomized controlled trials. J Thromb Haemost. (2016) 14:518–30. doi: 10.1111/jth13230

CrossRef Full Text | Google Scholar

13. Umemura Y, Yamakawa K. Optimal patient selection for anticoagulant therapy in sepsis: an evidence-based proposal from Japan. J Thromb Haemost. (2018) 16:462–4. doi: 10.1111/jth13946

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Aster RH, Bougie DW. Drug-induced immune thrombocytopenia. N Engl J Med. (2007) 357:580–7. doi: 10.1056/NEJMra066469

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Iba T, Nisio MD, Levy JH, Kitamura N, Thachil J. New criteria for sepsis-induced coagulopathy (SIC) following the revised sepsis definition: a retrospective analysis of a nationwide survey. BMJ Open. (2017) 7:e017046. doi: 10.1136/bmjopen-2017-017046

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Iba T, Levy JH, Warkentin TE, Thachil J, van der Poll T, Levi M, et al. Diagnosis and management of sepsis-induced coagulopathy and disseminated intravascular coagulation. J Thromb Haemost. (2019) 17:1989–94. doi: 10.1111/jth14578

CrossRef Full Text | Google Scholar

17. Iba T, Arakawa M, Nisio Di M, Gando S, Anan H, Sato K, et al. Newly proposed Sepsis-induced coagulopathy precedes international society on thrombosis and haemostasis overt-disseminated intravascular coagulation and predicts high mortality. J Intensive Care Med. (2020) 35:643–9. doi: 10.1177/0885066618773679

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Iba T, Arakawa M, Levy JH, Yamakawa K, Koami H, Hifumi T, et al. Sepsis-induced coagulopathy and japanese association for acute medicine DIC in coagulopathic patients with decreased antithrombin and treated by antithrombin. Clin Appl Thromb Hemost. (2018) 24:1020–6. doi: 10.1177/1076029618770273

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. (2018) 319:1317–8. doi: 10.1001/jama.201718391

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. (2019) 23:112. doi: 10.1186/s13054-019-2411-z

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Zhang Z. Predictive analytics in the era of big data: opportunities and challenges. Ann Transl Med. (2020) 8:68. doi: 10.21037/atm.2019.1097

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Ge H, Pan Q, Zhou Y, Xu P, Zhang L, Zhang J, et al. Lung mechanics of mechanically ventilated patients with COVID-19: analytics with high-granularity ventilator waveform data. Front Med. (2020) 7:541. doi: 10.3389/fmed.202000541

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. (2000) 101:E215–20. doi: 10.1161/01.CIR.101.23e215

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci Data. (2018) 5:180178. doi: 10.1038/sdata.2018178

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. (2015) 13:1. doi: 10.1186/s12916-014-0241-z

CrossRef Full Text | Google Scholar

26. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, et al. Assessment of clinical criteria for Sepsis: for the third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:762–74. doi: 10.1001/jama.20160288

CrossRef Full Text | Google Scholar

27. Wu J, Chen X-Y, Zhang H, Xiong L-D, Lei H, Deng S-H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol. (2019) 17:26–40. doi: 10.11989/JEST.1674-862X.80904120

CrossRef Full Text | Google Scholar

28. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. (2020) 2:56–67. doi: 10.1038/s42256-019-0138-9

CrossRef Full Text

29. Kishor K, Dhasmana N, Kamble SS, Sahu RK. Linezolid induced adverse drug reactions - an update. Curr Drug Metab. (2015) 16:553–9. doi: 10.2174/1389200216666151001121004

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Mohammadi M, Jahangard-Rafsanjani Z, Sarayani A, Hadjibabaei M, Taghizadeh-Ghehi M. Vancomycin-induced thrombocytopenia: a narrative review. Drug Saf . (2017) 40:49–59. doi: 10.1007/s40264-016-0469-y

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Zhang Z, Zhao Y, Canes A, Steinberg D, Lyashevska O, written on behalf of AMEB-DCTCG. Predictive analytics with gradient boosting in clinical medicine. Ann Transl Med. (2019) 7:152. doi: 10.21037/atm.2019.0329

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. (2020) 7:94. doi: 10.1186/s40537-020-00369-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: sepsis-induced coagulopathy, dynamic prediction, machine learning, Logistic Regression, external validation, model interpretation

Citation: Zhao Q-Y, Liu L-P, Luo J-C, Luo Y-W, Wang H, Zhang Y-J, Gui R, Tu G-W and Luo Z (2021) A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis. Front. Med. 7:637434. doi: 10.3389/fmed.2020.637434

Received: 03 December 2020; Accepted: 30 December 2020;
Published: 21 January 2021.

Edited by:

Zhongheng Zhang, Sir Run Run Shaw Hospital, China

Reviewed by:

Hamza Rayes, University of Cincinnati, United States
Anastasia N. Kotanidou, National and Kapodistrian University of Athens, Greece

Copyright © 2021 Zhao, Liu, Luo, Luo, Wang, Zhang, Gui, Tu and Luo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rong Gui, YWd1aXJvbmdAMTYzLmNvbQ==; Guo-Wei Tu, dHUuZ3Vvd2VpQHpzLWhvc3BpdGFsLnNoLmNu; Zhe Luo, bHVvLnpoZUB6cy1ob3NwaXRhbC5zaC5jbg==

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis

Introduction

Materials and Methods

Source of Data

Selection of Participants

Outcome (SIC)

Predictors of SIC

Statistical Analysis

Results

Baseline Characteristics

Comparison of 15 Models

Full and Compact Models

Prediction Performance in eICU-CRD

Model Interpretation

Website-Based Tool

Discussion

Conclusions

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

Acknowledgments

Supplementary Material

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good