Predicting cerebral edema in patients with spontaneous intracerebral hemorrhage using machine learning

Xu, Jiangbao; Yuan, Cuijie; Yu, Guofeng; Li, Hao; Dong, Qiutong; Mao, Dandan; Zhan, Chengpeng; Yan, Xinjiang

doi:10.3389/fneur.2024.1419608

ORIGINAL RESEARCH article

Front. Neurol. , 03 October 2024

Sec. Neurocritical and Neurohospitalist Care

Volume 15 - 2024 | https://doi.org/10.3389/fneur.2024.1419608

This article is part of the Research Topic Artificial Intelligence in Acute Neurology View all 17 articles

Predicting cerebral edema in patients with spontaneous intracerebral hemorrhage using machine learning

Jiangbao Xu¹

Cuijie Yuan¹

Guofeng Yu¹

Hao Li^2,3

Qiutong Dong^2,3

Dandan Mao¹

Chengpeng Zhan¹

Xinjiang Yan¹^*

¹The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital, Quzhou, China
²Postgraduate Training Base Alliance of Wenzhou Medical University, Wenzhou, China
³Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, China

Background: The early prediction of cerebral edema changes in patients with spontaneous intracerebral hemorrhage (SICH) may facilitate earlier interventions and result in improved outcomes. This study aimed to develop and validate machine learning models to predict cerebral edema changes within 72 h, using readily available clinical parameters, and to identify relevant influencing factors.

Methods: An observational study was conducted between April 2021 and October 2023 at the Quzhou Affiliated Hospital of Wenzhou Medical University. After preprocessing the data, the study population was randomly divided into training and internal validation cohorts in a 7:3 ratio (training: N = 150; validation: N = 65). The most relevant variables were selected using Support Vector Machine Recursive Feature Elimination (SVM-RFE) and Least Absolute Shrinkage and Selection Operator (LASSO) algorithms. The predictive performance of random forest (RF), GDBT, linear regression (LR), and XGBoost models was evaluated using the area under the receiver operating characteristic curve (AUROC), precision–recall curve (AUPRC), accuracy, F1-score, precision, recall, sensitivity, and specificity. Feature importance was calculated, and the SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) methods were employed to explain the top-performing model.

Results: A total of 84 (39.1%) patients developed cerebral edema changes. In the validation cohort, GDBT outperformed LR and RF, achieving an AUC of 0.654 (95% CI: 0.611–0.699) compared to LR of 0.578 (95% CI, 0.535–0.623, DeLong: p = 0.197) and RF of 0.624 (95% CI, 0.588–0.687, DeLong: p = 0.236). XGBoost also demonstrated similar performance with an AUC of 0.660 (95% CI, 0.611–0.711, DeLong: p = 0.963). However, in the training set, GDBT still outperformed XGBoost, with an AUC of 0.603 ± 0.100 compared to XGBoost of 0.575 ± 0.096. SHAP analysis revealed that serum sodium, HDL, subarachnoid hemorrhage volume, sex, and left basal ganglia hemorrhage volume were the top five most important features for predicting cerebral edema changes in the GDBT model.

Conclusion: The GDBT model demonstrated the best performance in predicting 72-h changes in cerebral edema. It has the potential to assist clinicians in identifying high-risk patients and guiding clinical decision-making.

Introduction

Spontaneous intracerebral hemorrhage (SICH) is a prevalent subtype of stroke, with a mortality rate significantly higher than ischemic stroke. Approximately 20–30% of SICH patients die within 3 months (1–3). The high incidence and mortality rates pose a significant threat to public health (4, 5). Cerebral edema, a common complication of SICH, involves the accumulation of excess water in the brain tissues adjacent to the hemorrhage. This can lead to severe consequences, including compromised blood flow, intracranial pressure shifts, and neuronal damage (6, 7). Timely identification of edema development and its influencing factors is crucial for optimizing patient care, allocating resources effectively, and reducing healthcare costs. Cerebral edema typically appears within 24–72 h after bleeding, peaks 2–7 days later, and can persist for up to 2 weeks. To monitor edema progression, patients with SICH undergo head CT scans at admission, 24 and 72 h post-admission. Subsequent scans may be ordered based on clinical changes. This study aimed to develop and validate machine learning models capable of predicting changes in cerebral edema within the first 72 h following SICH (8).

Over the past few years, advances in imaging omics have refined the use of CT scans for evaluating brain edema (9). In addition, machine learning algorithms have demonstrated significant promise in predicting medical outcomes and complications, aiding clinicians in making informed decisions and enhancing patient care (10–13). These advancements inform the development of accurate and reliable prognostic models to identify patients at risk of severe cerebral edema, enabling healthcare providers to implement targeted preventive strategies and interventions.

This study sought to develop and validate a machine learning-based prognostic model that could evaluate the progression patterns and influencing factors of cerebral edema, considering various patient attributes and clinical determinants. We aimed to compare the predictive accuracy and clinical utility of different machine learning algorithms. Ultimately, our goal was to provide clinicians with valuable tools for the early identification of patients at risk of severe cerebral edema, enabling the implementation of targeted preventive strategies to reduce its prevalence.

Methods

This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines.

Data source

This observational study was conducted between April 2021 and October 2023 at the Quzhou Affiliated Hospital of Wenzhou Medical University. The study protocol was approved by the hospital’s ethical board (reference number: LW2023-163) and adhered to the principles of the Declaration of Helsinki. All patients provided informed consent through their relatives, and no patient data were used in a way that could pose a risk to them. Patients with SICH were included if they met the following criteria: (1) admission within 72 h after first-ever stroke; (2) SICH confirmed by head computerized tomography (CT) scan; (3) hospitalization within 24 h after the onset of stroke symptom; and (4) age of 18 years or greater. Exclusion criteria encompassed the following:(1) secondary brain bleeding as a result of congenital or acquired coagulation abnormalities, hemorrhagic transformation of cerebral infarction, moyamoya disease, cerebral aneurysm, and arteriovenous malformation or tumor; (2) primary intraventricular bleeding; (3) presence of previous neurological diseases, such as brain tumors and severe head trauma; and (4) coexistence with severe systemic diseases, for example, malignancies, immune deficiency syndromes, and severe heart, liver, lung, or kidney dysfunction.

A total of 215 patients presented to the emergency department with suspected SICH, which was confirmed by head CT scans. All CT scans were conducted following the radiology department’s protocol by radiologists blinded to clinical information. To ensure data relevance, we collected 51 basic patient characteristics at admission (detailed in Supporting Material 1). In addition, we gathered 38 imaging characteristics, including clinical review times at 6, 24, 72 h, and subsequent hours until the absence of hematoma and edema was confirmed by two senior doctors (Supporting Material 1).

The cerebral edema volume was calculated using two methods: (1) Image-based analysis: The boundaries of hematoma and edema were delineated on CT scans using image browser measurement software. The hematoma volume was calculated by summing the areas of each layer, while the edema volume was determined by subtracting the hematoma volume from the combined volume of hematoma and surrounding brain edema. (2) Formula-based analysis (verification): The hematoma length and width in the maximum plane were measured as A and B, respectively, and the thickness (number of layers, C) was calculated. The hematoma volume was calculated as 1/2 ABC, with layers categorized as 75% (layer 1), 75–25% (1/2 layer), and < 25% (excluded). The edema volume was determined as the difference between the combined volume of blood and edema and the hematoma volume. Absolute hematoma and edema volumes were assessed at each time point. The primary outcome measure was defined as an increase in cerebral edema volume between baseline and repeat imaging of more than 6 mL or a relative increase of >33% within 72 h (14–17). To optimize statistical power and minimize bias, multiple imputation using random forests was employed to supplement missing values. The imputed data were then randomly stratified into training (N = 150) and validation cohorts (N = 65) in a 7:3 ratio.

Feature selection

To prevent variable misselection, we employed a rigorous variable selection approach using a training cohort to identify the most relevant predictors for constructing a predictive model. Initially, pairwise Pearson’s correlation matrices were used to assess the collinearity of clinical variables. Collinearity occurs when two or more predictors exhibit a strong correlation (r > 0.8), complicating the evaluation of each variable’s unique contribution to the outcome. Therefore, we have chosen to remove the more readily available variables from the collinear variables. Subsequently, we used the Minimum Absolute Shrinkage and Selection Operator (LASSO) and the SVM-RFE algorithm in a two-step process. LASSO is a regularization technique that performs variable selection and coefficient estimation by applying constraints to the sum of the absolute values of the model parameters. This process causes some of the coefficients to be narrowed down to zero, effectively excluding them from the final model. Then, the SVM-RFE algorithm was used for further variable selection. The SVM-RFE algorithm enables the machine learning algorithm to continuously reduce the number of features, verify the performance of the model, and finally achieve the optimal number of features for screening. By using the SVM-RFE algorithm, we obtain another important set of predictors. Finally, the intersection of predictors determined by the LASSO and SVM-RFE algorithms is employed to ensure that only the most relevant and robust variables are included in the development of our predictive models. This combined approach aims to improve the accuracy and generalizability of the model while reducing the risk of overfitting or including irrelevant predictors.

Model development and validation

We employed four machine learning classifiers—extreme gradient boosting (XGBoost), random forest (RF), linear regression (LR), and gradient-boosted decision trees (GDBT) (18–21)—to develop predictive models for the risk of 72-h brain edema growth. All models incorporated the same input variables. Grid and random hyperparameter searches were conducted on the training data to identify optimal hyperparameters for each model, with performance evaluated using the area under the receiver operating characteristic curve (AUROC), precision–recall curve (AUPRC), F1 score, precision, recall, sensitivity, and specificity. To interpret the best-performing model, Shapley Additive exPlanations (SHAP) (22) and Local Interpretable Model-Agnostic Explanations (LIME) (23) were applied to provide consistent and locally accurate variable importance values, enhancing our understanding of the model’s predictive capabilities.

Dataset selection

Given the limited sample size in this study, which can introduce bias, a 5-fold cross-validation was employed to ensure objectivity and minimize sampling bias. To select the dataset with the greatest statistical significance, a Wilcoxon rank-sum test was performed. The dataset with the largest p-value was chosen as illustrated in Figure 1.

Figure 1

Figure 1. Dataset selection plots.

Tuning of hyperparameters

XGBoost

XGBoost, a widely used and powerful ensemble technique, is based on the gradient boosting framework. It combines the predictions of multiple weak learners, primarily decision trees, to create a more accurate and robust model. XGBoost implements machine learning algorithms within the Gradient Boosting framework. The optimal parameters were determined using the “xgboost” package and 5-fold cross-validation, as illustrated in Figure 2.

Figure 2

Figure 2. Hyperparameter selection plots.

Random forest

Random forest algorithms utilize tree-based models, combining multiple decision trees through bootstrapping to improve predictive accuracy (19). The optimal number of trees was determined using 5-fold cross-validation with the “randomForest” package, as illustrated in Figure 2.

Linear regression

Linear regression is a widely used statistical method for modeling binary outcomes. The most common approach is least squares, which aims to minimize the average squared error between predicted and observed values. To select optimal variables and construct an LR model, we employed backward stepwise regression based on the Akaike information criterion. The “MASS” package in R software was utilized to fit the model (24).

Gradient-boosted decision tree (GBDT)

The GBDT model iteratively calculates residuals at each step and establishes the model by moving in the negative gradient direction of these residuals. GBDT’s powerful, flexible, efficient, and accurate predictive capabilities have made it a popular machine-learning algorithm for analyzing and processing abstract data. Optimal parameters for the GBDT model were determined using the “GBDT” package and 5-fold cross-validation, as illustrated in Figure 2.

Statistical analysis

Prior to formal analysis, the Kolmogorov–Smirnov test was used to assess data distribution. Continuous variables were analyzed using either the independent t-test (for normally distributed data) or the Mann–Whitney U-test (for non-normally distributed data) and were presented as mean ± standard deviation (SD) or median with interquartile range (IQR), respectively. Categorical variables were analyzed using the chi-square test for large samples or Fisher’s exact test for small samples and are expressed as frequencies (percentages). To compare the area under the curve (AUC) of the different models statistically, the DeLong test was used. All statistical tests were two-tailed, and a p < 0.05 was considered statistically significant. In addition, the study adhered to the rule of thumb of having at least 10 events per variable for robust analysis. Statistical analyses were performed using R (version 4.2.2; R Foundation for Statistical Computing) and Python (version 3.9.0; Python Software Foundation).

Results

Patient characteristics

The dataset comprised information on 215 patients with SICH, including 949 imaging CT scans. The total number of CT scans was determined at specific time frames: 6, 24, 72 h, and subsequent hours until the absence of hematoma and edema was confirmed by two senior doctors. Of these patients, 86 (40%) exhibited cerebral edema expansion (edema volume increased by more than 6 mL or by >33% relative to the last measurement) within 72 h. The cohort included 143 male (66.5%) and 72 female (33.5%) patients, with 60 male (69.8%) and 26 female (30.2%) patients experiencing dilated cerebral edema. No significant differences were observed in baseline characteristics between the training and validation groups. Tables 1, 2 provide detailed baseline patient characteristics.

Table 1

Table 1. Summary table of categorized data.

Table 2

Table 2. Summary table of continuous variable data.

Feature selection

As shown in Figure 3, no pairwise Pearson’s correlations between continuous variables exceeded 0.8, indicating the absence of collinearity. Consequently, all variables were included in the subsequent feature selection process. SVM-RFE identified 31 important predictors (Supporting Material 1), while the LASSO regression algorithm selected 26 (Supporting Material 1). Ultimately, 20 factors emerged as significant predictors of the outcome (Figure 4), including sex, diabetes history, hypertension history, alcohol history, ventricular drainage, hemostatic treatment, decompressive craniectomy, antihypertensive treatment, antiemesis and antacid, HDL, cholesterol, alanine aminotransferase, serum magnesium, serum sodium, CRP, admission Barthel ADL Index, cerebral subarachnoid hemorrhage volume, subdural hemorrhage volume, and hemorrhage in the left cerebellum, left basal ganglia, or left parietal lobe. These selected features were integrated into four machine learning classifiers—GDBT, LR, RF, and XGBoost—to develop the predictive model.

Figure 3

Figure 3. Pearson’s correlation matrix thermodynamic chart.

Figure 4

Figure 4. Variable screening diagram.

Tuning of hyperparameters

As shown in Table 3 and Figure 2, the optimal hyperparameters for the GDBT models were as follows: n_estimators (10–250), min_samples_split (2–25), max_features (0.1–0.999), max_depth (3–15), min_samples_leaf (1–25), and learning_rate (0.001–0.3). For the extreme gradient boosting (XGBoost) models, the optimal hyperparameters were as follows: n_estimators (10–250), min_samples_split (1–25), max_depth (3–15), subsample (0.001–1), colsample_bytree (0.01–1), and learning_rate (0.001–0.3). Finally, the optimal hyperparameters for the RF models were as follows: n_estimators (10–250), max_depth (3–15), min_samples_split (2–25), min_samples_leaf (1–25), and max_features (0.1–0.999).

Table 3

Table 3. Summary table of the model parameters.

Development and validation of prediction models

When evaluating model performance on the validation cohort, our results demonstrated that the GDBT model, with an AUC value of 0.654 (95% CI: 0.611–0.699), outperformed the LR and RF models, which yielded AUC values of 0.578 (95% CI: 0.535–0.623, DeLong: p = 0.197) and 0.624 (95% CI: 0.588–0.687, DeLong: p = 0.236), respectively. Similarly, GDBT outperformed XGBoost, with an AUC of 0.660 (95% CI: 0.611–0.711, DeLong: p = 0.963). However, in the training set, GDBT (AUC = 0.603 ± 0.100) outperformed XGBoost (AUC = 0.575 ± 0.096). To mitigate the effects of random sampling, we repeated this process 50 times. Over-validation revealed that the LR model exhibited overfitting, performing poorly on the independent dataset despite good performance on the training set. In contrast, the GDBT model demonstrated greater stability and superior performance in both the training and validation sets compared to XGBoost. Based on these results, we selected the GDBT model for subsequent experiments, as summarized in Table 4. Receiver operating curves and precision–recall curves for the models are depicted in Figure 5.

Table 4

Table 4. Model performance evaluation using training and validation cohorts.

Figure 5

Figure 5. Receiver operating curves and precision–recall curves of the models.

Model explainability

The SHAP summary plot (Figure 6) illustrates the relative importance of the 20 predictors in the GDBT model. We discovered that serum sodium, HDL cholesterol, subarachnoid hemorrhage volume, sex, and left basal ganglia hemorrhage volume were the five most significant features for predicting cerebral edema changes in the SHAP (GDBT) model.

Figure 6

Figure 6. SHAP summary plot.

The LIME interpreter was applied to data generated by the GDBT model to examine classification outcomes. Each case’s feature weights are depicted in Figure 7, with green indicating factors favoring the outcome and red representing those opposing it. In case 1, the 100% predicted increase in edema was likely attributed to sex (male), cerebral subarachnoid hemorrhage volume within the range of 0.02–0.15 mL, alcohol use history, HDL levels between 1.2 and 1.44 mmol/L, serum sodium between 141.55 and 143.45 mmol/L, absence of subdural hemorrhage, alanine aminotransferase levels within 13.4–19.4 U/L, and left parietal lobe volume between 0.13 mL and 0.67 mL (favoring variables). However, no drinking history and no cerebral ventricular drainage were the opposite variables. Conversely, in case 2, the 99.8% prediction of no edema increase was likely due to serum sodium levels exceeding 143.45 mmol/L, left parietal bleeding volume within the 0.13–0.67 mL range, alanine aminotransferase levels between 19.04 U/L and 27.92 U/L, and CRP levels within 1.58–3.61 mg/L (favoring variables). Sex (male), HDL levels exceeding 1.44 mmol/L, and a subdural hemorrhage volume >0.1 mL were the opposite variables.

Figure 7

Figure 7. GDBT is explained by a locally interpretable model. Features with green bars favor the results, while those with red bars contradict the results. The X axle shows how much of each feature is added or subtracted from the patient’s final probability value (i.e., a feature with a weight of 0.3 is equivalent to a 30% change in the probability of the outcome). Class 1 represents increased edema, and class 0 represents no increased edema.

Discussion

Hemorrhagic stroke, also known as cerebral hemorrhage, occurs when non-traumatic blood vessels in the brain rupture, leading to blood accumulation in the brain parenchyma. This condition constitutes 10–15% of all stroke cases and is characterized by rapid progression, severe neurological dysfunction, and a high mortality rate, particularly mortality rate during the acute phase (up to 100%). Increased intracranial pressure and cerebral herniation due to cerebral edema are major causes of death. Patients may also experience long-term neurological deficits, impacting their self-care ability and imposing substantial economic burdens on society and families. Early diagnosis and timely treatment are crucial for reducing mortality rates. To predict 72-h brain edema growth, we developed and validated machine-learning models using four different algorithms (GDBT, LR, RF, and XGBoost). Twenty key predictors were identified, and internal and external validation demonstrated the superior performance and clinical applicability of the GDBT model (25–28).

Our importance analysis identified serum sodium levels as the most significant predictor of 72-h brain edema growth risk, aligning with previous research (29–33). These findings support the role of serum sodium as a valuable prognostic indicator in brain edema. Previous studies suggest that edema around hematoma is predominantly vasogenic in the early stages, transitioning to cytotoxic edema later. Distal and contralateral edema is attributed to osmotic effects caused by the diffusion of edematous fluid and the accumulation of permeable substances within the bleeding area. Vasogenic cerebral edema results from blood–brain barrier impairment and increased permeability, leading to the leakage of plasma components, including sodium (Na⁺) and potassium (K⁺) ions. Cytotoxic edema arises from cytotoxic substances disrupting cell energy metabolism, leading to abnormal extracellular ion concentration gradients. Increased extracellular potassium ions are primarily removed through the blood–brain barrier via Na⁺-K⁺-ATPase-mediated Na⁺-K⁺ exchange, resulting in a net increase of cations. Our study demonstrated a correlation between lower serum sodium levels and increased edema volume. Potential explanations include the cytotoxic edema perspective: Despite constant serum sodium levels, cytotoxic substances may increase Na⁺ in edema fluid while decreasing it in plasma. Lower serum sodium levels may indicate more potent cytotoxic substances, leading to higher edema fluid osmotic pressure and increased edema volume.

Our importance analysis further identified high HDL values, hypertension history, alcohol history, and sex (male) as additional predictors of 72-h brain edema growth risk. Several studies and statistical analyses (34, 35) have reported that patients with cerebral hemorrhage accompanied by poorly controlled hypertension, alcohol consumption, or hyperlipidemia have a significantly higher likelihood of developing severe cerebral edema than healthy individuals. Histological studies have revealed that long-term hypertension can damage small-vessel wall structures. In addition, alcohol consumption, hyperlipidemia, and sex (male) are factors that can exacerbate this damage, contributing to morphological changes associated with cerebral hemorrhage and edema. Therefore, individuals with a history of hyperlipidemia, hypertension, alcohol consumption, or those who are male should be closely monitored for edema growth and receive timely treatment.

Our importance analysis further revealed that the volume of cerebral hematoma was the third most important factor associated with increased edema. In addition, cerebral ventricular drainage, hemostatic treatment, decompressive craniectomy, and antihypertensive treatment effectively reduced edema growth. The hematoma volume is a well-established marker influencing edema volume (36). Although it ranked third in our analysis, we speculate that this might be due to surgical interventions affecting hematoma volume, potentially altering the correlation between hematoma volume and edema growth. This could lead to a less consistent relationship between the two, making hematoma volume less consistently predictive of edema growth. However, further research is needed to confirm this hypothesis. Moreover, the demonstrated effectiveness of ventricle drainage, hemostasis, cranial pressure reduction, and antihypertensive treatment validates the reliability of our predictive model.

Our importance analysis identified CRP and ALT as factors promoting edema growth. Intracerebral hemorrhage is a common clinical condition characterized by rapid onset and progression, posing a significant threat to patient survival. Even if patients survive, they may experience adverse effects on multiple organ functions, leading to multiple organ failure syndrome and death. The primary cause is intracranial hypertension resulting from cerebral hemorrhage, leading to altered consciousness and systemic stress responses. This stimulates various humoral regulatory mechanisms, resulting in strong reactions. CRP and ALT can reflect the severity of the disease, and their elevation suggests a likely deterioration of the patient’s underlying condition, increasing the risk of worsening brain edema.

Based on these predictors, the GDBT model developed in this study demonstrated robust and consistent identification and calibration across the training, internal, and external validation cohorts. The selected results were interpretable and could be effectively applied in clinical practice. This model can potentially assist clinicians in identifying high-risk patients and informing clinical decision-making.

Conclusion

The GDBT model consistently demonstrated superior performance in predicting 72-h changes in cerebral edema across the training, internal, and external validation cohorts. The SHAP and LIME analysis revealed that the first three favorable factors associated with increased edema (100%) included the following: sex (male), cerebral subarachnoid hemorrhage volume within the range of 0.02–0.15 mL, and a history of alcohol use. Conversely, the first three favorable factors associated with no increase in edema (99.8%) included the following: serum sodium levels exceeding 143.45 mmol/L, left parietal bleeding volume within the 0.13–0.67 mL range, and alanine aminotransferase levels between 19.04 U/L and 27.92 U/L. These findings have the potential to assist clinicians in the early identification of patients at risk for severe cerebral edema, enabling the implementation of targeted preventive measures to reduce its prevalence.

Strengths

Our study has several advantages, including the inclusion of variables that closely reflect real-world human physiological conditions. The value of the volume of cerebral hemorrhage is collected until the hematoma disappears, no new hematoma occurs during the follow-up period, and the modeling process is a rigorous model development and a validation process, using multiple machine learning algorithms to identify the model with the best performance. Various evaluation measures and model interpretability techniques, such as SHAP and LIME, are used to ensure transparency and facilitate the interpretation of the results, and the model results can be effectively interpreted.

Limitations

Despite its strengths, our study has several limitations. First, the sample size was not validated across multiple centers, and the ROC curve AUC of the constructed model was only 66%, potentially due to the inclusion of many complex and variable factors. The retrospective design of the study and the absence of some data may lead to the exclusion of potentially relevant predictors, such as hypoperfusion due to hypotension, high intracranial pressure, and ischemia or hypoxia due to blood pressure management based on arterial stenosis.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

JX: Writing – original draft, Writing – review & editing, Conceptualization, Data curation. CY: Conceptualization, Writing – original draft. GY: Data curation, Writing – original draft. HL: Investigation, Writing – review & editing. QD: Conceptualization, Writing – review & editing. DM: Conceptualization, Writing – original draft. CZ: Conceptualization, Writing – original draft. XY: Conceptualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was funded by Science and Technology Program of QuZhou, China (grant no.2021Y003).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2024.1419608/full#supplementary-material

References

1. Yilmaz, C, Kabatas, S, Gulsen, S, Cansever, T, Gurkanlar, D, Caner, H, et al. Spontaneous supratentorial intracerebral hemorrhage: does surgery benefit comatose patients? Ann Indian Acad Neurol. (2010) 13:184–7. doi: 10.4103/0972-2327.70881

PubMed Abstract | Crossref Full Text | Google Scholar

2. Zeiser, V, Khalaveh, F, Cho, A, Reinprecht, A, Herta, J, Rössler, K, et al. Risk factors for unfavorable outcome after spontaneous intracerebral hemorrhage in elderly patients. Clin Neurol Neurosurg. (2024) 240:108253. doi: 10.1016/j.clineuro.2024.108253

PubMed Abstract | Crossref Full Text | Google Scholar

3. Al-Khaled, M, Awwad, S, and Brüning, T. Nontraumatic spontaneous intracerebral hemorrhage: baseline characteristics and early outcomes. Brain Behav. (2020) 10:e01512. doi: 10.1002/brb3.1512

PubMed Abstract | Crossref Full Text | Google Scholar

4. Zaryczańska, K, Pawlukowska, W, Nowacki, P, Zwarzany, Ł, Bagińska, E, Kot, M, et al. Statins and 90-day functional performance and survival in patients with spontaneous intracerebral hemorrhage. J Clin Med. (2023) 12:6608. doi: 10.3390/jcm12206608

PubMed Abstract | Crossref Full Text | Google Scholar

5. Yao, Z, Ma, L, You, C, and He, M. Decompressive Craniectomy for spontaneous intracerebral hemorrhage: a systematic review and Meta-analysis. World Neurosurg. (2018) 110:121–8. doi: 10.1016/j.wneu.2017.10.167

PubMed Abstract | Crossref Full Text | Google Scholar

6. Peng, W-J, Li, Q, Tang, J-H, Reis, C, Araujo, C, Feng, R, et al. The risk factors and prognosis of delayed perihematomal edema in patients with spontaneous intracerebral hemorrhage. CNS Neurosci Ther. (2019) 25:1189–94. doi: 10.1111/cns.13219

PubMed Abstract | Crossref Full Text | Google Scholar

7. Gu, Y, Zhou, C, Piao, Z, Yuan, H, Jiang, H, Wei, H, et al. Cerebral edema after ischemic stroke: pathophysiology and underlying mechanisms. Front Neurosci. (2022) 16:988283. doi: 10.3389/fnins.2022.988283

PubMed Abstract | Crossref Full Text | Google Scholar

8. Muscari, A, Faccioli, L, Lega, MV, Lorusso, A, Trossello, MP, Puddu, GM, et al. Predicting cerebral edema in ischemic stroke patients. Neurol Sci. (2019) 40:745–52. doi: 10.1007/s10072-019-3717-y

Crossref Full Text | Google Scholar

9. Foroushani, HM, Hamzehloo, A, Kumar, A, Chen, Y, Heitsch, L, Slowik, A, et al. Quantitative serial CT imaging-derived features improve prediction of malignant cerebral edema after ischemic stroke. Neurocrit Care. (2020) 33:785–92. doi: 10.1007/s12028-020-01056-5

PubMed Abstract | Crossref Full Text | Google Scholar

10. He, W, Fu, X, and Chen, S. Advancing polytrauma care: developing and validating machine learning models for early mortality prediction. J Transl Med. J Transl Med. (2023) 21:664. doi: 10.1186/s12967-023-04487-8

PubMed Abstract | Crossref Full Text | Google Scholar

11. Jayatilake, S, and Ganegoda, GU. Involvement of machine learning tools in healthcare decision making. J Healthc Eng. (2021) 2021:1–20. doi: 10.1155/2021/6679512

Crossref Full Text | Google Scholar

12. Ley, C, Martin, RK, Pareek, A, Groll, A, Seil, R, and Tischer, T. Machine learning and conventional statistics: making sense of the diferences. Knee Surg Sports Traumatol Arthrosc. (2022) 30:753–7. doi: 10.1007/s00167-022-06896-6

PubMed Abstract | Crossref Full Text | Google Scholar

13. Giordano, C, Brennan, M, Mohamed, B, Rashidi, P, Modave, F, and Tighe, P. Accessing artificial intelligence for clinical decision-making. Front Digit Health. (2021) 3:645232. doi: 10.3389/fdgth.2021.645232

Crossref Full Text | Google Scholar

14. Ngiam, KY, and Khor, IW. Big data and machine learning algorithms for health- care delivery. Lancet Oncol. (2019) 20:e262–73. doi: 10.1016/S1470-2045(19)30149-4

Crossref Full Text | Google Scholar

15. Ma, C, Zhang, Y, Niyazi, T, Wei, J, Guocai, G, Liu, J, et al. Radiomics for predicting hematoma expansion in patients with hypertensive intraparenchymal hematomas. Eur J Radiol. (2019) 115:10. doi: 10.1016/j.ejrad.2019.04.001

Crossref Full Text | Google Scholar

16. Wei, L, Lin, C, Zhou, Z, Zhang, J, Tan, Q, Zhang, Y, et al. Analysis of different hematoma expansion shapes caused by different risk factors in patients with hypertensive intracerebral hemorrhage. Clin Neurol Neurosur. (2020) 194:105820. doi: 10.1016/j.clineuro.2020.105820

PubMed Abstract | Crossref Full Text | Google Scholar

17. Guo, DC, Gu, J, He, J, Chu, HR, Dong, N, and Zheng, YF. External validation study on the value of deep learning algorithm for the prediction of hematoma expansion from noncontrast CT scans. BMC Med Imaging. (2022) 22:45. doi: 10.1186/s12880-022-00772-y

PubMed Abstract | Crossref Full Text | Google Scholar

18. Cao, L, Liu, M, Wang, M, Ding, J, Mao, K, Liu, K, et al. 3D slicer-based calculation of hematoma irregularity index for predicting hematoma expansion in intracerebral hemorrhage. BMC Neurol. (2022) 22:452. doi: 10.1186/s12883-022-02983-w

PubMed Abstract | Crossref Full Text | Google Scholar

19. Lv, CX, An, SY, Qiao, BJ, and Wu, W. Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model. BMC Infect Dis. (2021) 21:839. doi: 10.1186/s12879-021-06503-y

PubMed Abstract | Crossref Full Text | Google Scholar

20. Wang, J, Xu, Y, Liu, L, Wu, W, Shen, C, Huang, H, et al. Comparison of LASSO and random forest models for predicting the risk of premature coronary artery disease. BMC Med Inform Decis Mak. (2023) 23:297. doi: 10.1186/s12911-023-02407-w

PubMed Abstract | Crossref Full Text | Google Scholar

21. Wen, Y, Wang, X, Li, D, Zhang, Q, Deng, B, and Chen, Y. Rapid detection of phenytoin sodium by partial-least squares and linear regression models combined with surface-enhanced Raman spectroscopy. J Pharmaceut Biomed. (2022) 223:115160. doi: 10.1016/j.jpba.2022.115160

PubMed Abstract | Crossref Full Text | Google Scholar

22. Zhou, Z, and Zare, RN. Personal information from latent fingerprints using desorption electrospray ionization Mass spectrometry and machine learning. Anal Chem. (2017) 89:1369–72. doi: 10.1021/acs.analchem.6b04498

PubMed Abstract | Crossref Full Text | Google Scholar

23. Van den Broeck, G, Lykov, A, Schleich, M, and Suciu, D. On the tractability of SHAP explanations. Proc Conf AAAI Artif Intell. (2021) 35:6505–13. doi: 10.1609/aaai.v35i7.16806

Crossref Full Text | Google Scholar

24. Zafar, M, and Khan, N. Deterministic local interpretable model-agnostic explanations for stable. Explainability Mach learn Knowl Extr. (2021) 3:525–41. doi: 10.3390/make3030027

Crossref Full Text | Google Scholar

25. Hu, P, Li, Y, Liu, Y, Guo, G, Gao, X, Su, Z, et al. Comparison of conventional logistic regression and machine learning methods for predicting delayed cerebral ischemia after aneurysmal subarachnoid hemorrhage: a multicentric observational cohort study. Front Aging Neurosci. (2022) 14:857521. doi: 10.3389/fnagi.2022.857521

PubMed Abstract | Crossref Full Text | Google Scholar

26. Hines, K, Mouchtouris, N, al Saiegh, F, Hafazalla, K, Mazza, J, Phan, P, et al. Prediction of hematoma expansion in spontaneous intracerebral hemorrhage: our institutional experience. J Clin Neurosci. (2021) 86:271–5. doi: 10.1016/j.jocn.2021.01.046

PubMed Abstract | Crossref Full Text | Google Scholar

27. Zhang, CY, Huang, SR, Wang, SY, Shen, L, and Deng, BQ. Clinical study of intracranial and extracranial atherosclerotic stenosis in spontaneous intracerebral hemorrhage patients. J Stroke Cereb Dis. (2018) 27:286–90. doi: 10.1016/j.jstrokecerebrovasdis.2017.08.011

PubMed Abstract | Crossref Full Text | Google Scholar

28. Liu, Q, Li, X, Wang, N, Yang, J, Wang, K, Chen, S, et al. A novel coagulation classification and postoperative bleeding in severe spontaneous intracerebral hemorrhage patients on antiplatelet therapy. Front Aging Neurosci. (2022) 14:793129. doi: 10.3389/fnagi.2022.793129

PubMed Abstract | Crossref Full Text | Google Scholar

29. Guo, R, Zhang, R, Liu, R, Liu, Y, Li, H, Ma, L, et al. Machine learning-based approaches for prediction of Patients' functional outcome and mortality after spontaneous intracerebral hemorrhage. J Pers Med. (2022) 12:112. doi: 10.3390/jpm12010112

PubMed Abstract | Crossref Full Text | Google Scholar

30. Wen, M, Ye, J, Han, Y, Huang, L, Yang, H, Jiang, W, et al. Hypertonic saline regulates microglial M2 polarization via mi R-200b/KLF4 in cerebral edema treatment. Biochem Biophys Res Commun. (2018) 499:345–53. doi: 10.1016/j.bbrc.2018.03.161

PubMed Abstract | Crossref Full Text | Google Scholar

31. Chen, S-Y, Thompson-Leduc, P, Sawyer, RN, Fakih, I, Cheung, HC, Macheca, M, et al. Outcomes and resource use of patients with large hemispheric infarction and cerebral edema: analysis of real-world data. Curr Med Res Opin. (2021) 37:781–8. doi: 10.1080/03007995.2021.1900090

PubMed Abstract | Crossref Full Text | Google Scholar

32. Doron, O, Zadka, Y, Barnea, O, and Rosenthal, G. Interactions of brain, blood, and CSF: a novel mathematical model of cerebral edema. Fluids Barriers CNS. (2021) 18:42. doi: 10.1186/s12987-021-00274-z

PubMed Abstract | Crossref Full Text | Google Scholar

33. Hsia, DS, Tarai, SG, Alimi, A, Coss-Bu, JA, and Haymond, MW. Fluid management in pediatric patients with DKA and rates of suspected clinical cerebral edema. Pediatr Diabetes. (2015) 16:338–44. doi: 10.1111/pedi.12268

PubMed Abstract | Crossref Full Text | Google Scholar

34. Miao, J, Song, X, Sun, W, Qiu, X, Lan, Y, and Zhu, Z. Predictors of malignant cerebral edema in cerebral artery infarction: a meta-analysis. J Neurol Sci. (2019) 409:116607. doi: 10.1016/j.jns.2019.116607

PubMed Abstract | Crossref Full Text | Google Scholar

35. Wang, Y-F, and Parpura, V. Astroglial modulation of Hydromineral balance and cerebral edema. Front Mol Neurosci. (2018) 11:204. doi: 10.3389/fnmol.2018.00204

PubMed Abstract | Crossref Full Text | Google Scholar

36. Kang, J, Huang, Q, and Liu, Y. Advance in research on the genetic etiology of spontaneous intracerebral hemorrhage. Chin J Med Genet. (2016) 33:702–7. doi: 10.3760/cma.j.issn.1003-9406.2016.05.028

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: SICH, cerebral edema, random forest, GDBT, XGBoost

Citation: Xu J, Yuan C, Yu G, Li H, Dong Q, Mao D, Zhan C and Yan X (2024) Predicting cerebral edema in patients with spontaneous intracerebral hemorrhage using machine learning. Front. Neurol. 15:1419608. doi: 10.3389/fneur.2024.1419608

Received: 10 June 2024; Accepted: 18 September 2024;
Published: 03 October 2024.

Edited by:

Alejandro Rabinstein, Mayo Clinic, United States

Reviewed by:

Ping Hu, Second Affiliated Hospital of Nanchang University, China
Muhannad Seyam, University of Vermont, United States

Copyright © 2024 Xu, Yuan, Yu, Li, Dong, Mao, Zhan and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xinjiang Yan, MTU4MjM0NDEyNUBxcS5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Predicting cerebral edema in patients with spontaneous intracerebral hemorrhage using machine learning

Introduction

Methods

Data source

Feature selection

Model development and validation

Dataset selection

Tuning of hyperparameters

XGBoost

Random forest

Linear regression

Gradient-boosted decision tree (GBDT)

Statistical analysis

Results

Patient characteristics

Feature selection

Tuning of hyperparameters

Development and validation of prediction models

Model explainability

Discussion

Conclusion

Strengths

Limitations

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good