- Department of Neurology, Changhai Hospital, Second Military Medical University, Shanghai, China
Background and purpose: Corpus callosum (CC) infarction is an extremely rare subtype of cerebral ischemic stroke, however, the symptoms of cognitive impairment often fail to attract early attention of patients, which seriously affects the long-term prognosis, such as high mortality, personality changes, mood disorders, psychotic reactions, financial burden and so on. This study seeks to develop and validate models for early predicting the risk of subjective cognitive decline (SCD) after CC infarction by machine learning (ML) algorithms.
Methods: This is a prospective study that enrolled 213 (only 3.7%) CC infarction patients from a nine-year cohort comprising 8,555 patients with acute ischemic stroke. Telephone follow-up surveys were carried out for the patients with definite diagnosis of CC infarction one-year after disease onset, and SCD was identified by Behavioral Risk Factor Surveillance System (BRFSS) questionnaire. Based on the significant features selected by the least absolute shrinkage and selection operator (LASSO), seven ML models including Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), Gaussian Naïve Bayes (GNB), Complement Naïve Bayes (CNB), and Support vector machine (SVM) were established and their predictive performances were compared by different metrics. Importantly, the SHapley Additive exPlanations (SHAP) was also utilized to examine internal behavior of the highest-performance ML classifier.
Results: The Logistic Regression (LR)-model performed better than other six ML-models in SCD predictability after the CC infarction, with the area under the receiver characteristic operator curve (AUC) of 77.1% in the validation set. Using LASSO and SHAP analysis, we found that infarction subregions of CC infarction, female, 3-month modified Rankin Scale (mRS) score, age, homocysteine, location of angiostenosis, neutrophil to lymphocyte ratio, pure CC infarction, and number of angiostenosis were the top-nine significant predictors in the order of importance for the output of LR-model. Meanwhile, we identified that infarction subregion of CC, female, 3-month mRS score and pure CC infarction were the factors which independently associated with the cognitive outcome.
Conclusion: Our study firstly demonstrated that the LR-model with 9 common variables has the best-performance to predict the risk of post-stroke SCD due to CC infarcton. Particularly, the combination of LR-model and SHAP-explainer could aid in achieving personalized risk prediction and be served as a decision-making tool for early intervention since its poor long-term outcome.
1. Introduction
The corpus callosum (CC) is the largest commissural bridge of white-matter fibers between bilateral hemispheres (1), accompanied by a unique anterior and posterior double circulation system and abundant collateral arteries (2). Because of the sufficient blood supply, CC infarction is extremely rare and accounts for barely 2.3–8.0% of cerebral ischemic stroke (3–5). Because of its unique physiological structure and function, the manifestations of CC infarction are variable and lacking of specificity. Due to these special and complex characteristics, misdiagnosis and delayed treatment are not uncommon for CC infarction (6). Interestingly, we previously found that, compared to general basal ganglia infarction, patients with CC infarction had lower National Institutes of Health Stroke Scale scores and better recovery at the time of discharge, while the one-year mortality is higher with poorer long-term prognosis (5). Cognitive impairment is one of the main causes of poor long-term prognosis in patients with CC cerebral infarction. Unfortunately, due to it’s occult exacerbation process, patients often do not pay enough attention to it in the early stage, and miss the optimal intervention period, resulting in irreversible cognitive impairment.
Subjective cognitive decline (SCD) is an individual’s self-report of cognitive decline and is nowadays thought to be a precursor to various common cognitive disorders in clinic, such as mild cognitive impairment (MCI) (7) and Alzheimer’s disease (AD) (8). Recent researches have revealed that compared with age-matched healthy controls, patients with SCD suffer a 4.5–6 times higher risk of developing into MCI or AD (9, 10). Compared to universally-known post-stroke cognitive impairment (PSCI), SCD places more emphasis on the patient’s subjective perceptions and timely feedback from caregivers, making it easier to identify and intervene early. Meanwhile, our, as well as others’ previous studies have proved that, white matter lesions (WMLs) are important pathological mechanisms for cognitive dysfunctions (5, 11–14). As an extremely rare subtype of stroke with prominent WMLs, CC infarction is likely to become a potential driver of SCD and other symptomatic cognitive decline. Therefore, aiming to restore brain health and cognitive abilities as long as possible, this at-risk group is recognized as an eligible target population for early intervention strategies (15, 16).
The role of physicians has always been to synthesize the data available to them to identify prognosis patterns that guide early intervention. Machine learning (ML) is a new rising technical foundation of artificial intelligence, which enables the computer to learn the rules hidden in the data automatically (17). Several studies have revealed that ML-based models are promising in predicting the diagnosis, prognosis or recurrence of ischemic stroke, what’s more, those models are also widely used in the field of psychology, biomechanics and so on (18–23). Nevertheless, it still lacks of ML-based evidence on SCD prediction after cerebral infarction. What’s more, the “black-box” character of ML-technique hinders clinicians to have a good understand of the predictive decision, namely failure in accountability (24). To this end, we proposed an interpretable strategy combining ML algorithm with SHapley Additive exPlanations (SHAP) to provide consistent and locally accurate attribution values for each feature within each prediction model. It’s calculated by comparing the predicting discrepancy in all possible combinations containing and withholding each feature and provide a unique report individually (25).
Here, with the largest sample of CC infarction to date, this is an exploratory study that for the first time emphasizes the clinical feasibility to individually predict the occurrence of one-year SCD after CC infarction by using ML methods. We also attempt to apply SHAP-value for explaining the importance and influence of each predictor contributing to the optimal model’s outcome. We expect this ML-derived early warning system and SHAP-based framework of interpretation could help clinicians to better counsel patients, conduct targeted follow-up and determine personalized interventional measures.
2. Methods
2.1. Participants
The design of this study is presented in Figure 1. A total of 8,555 ischemic stroke patients were collected from Shanghai Changhai Hospital between July 2012 and June 2021. Among them, 314 (3.7%) patients with acute CC infarction were enrolled. The exclusion criteria were as follows: (i) age under 30 or above 80 years old, (ii) cognitive impairment precedes CC infarction, (iii) follow-up period was less than 1 year, or loss to follow-up, (iv) serious medical complications, (v) incomplete neuroimaging materials, (vi) acceptance of thrombolytic therapy or interventional therapy, and (vii) failure to sign written informed consent. Ultimately, 213 patients with acute CC infarction were included for final analysis. This study was approved by the Changhai Hospital Ethics Committee (NO. CHEC2021-1021).
2.2. Clinical and imaging assessment
Basic clinical and imaging information of enrolled patients were obtained from Electronic Medical Record (EMR) management system. A list of these variables was shown in Supplementary Table S1, including: demographic characteristics (age, sex, body mass index[BMI]), vascular risk factors (hypertension, diabetes mellitus, prior stroke or transient ischemic attack, heart diseases, smoking, alcoholism), stroke severity on admission (time from onset to hospital, NIH stroke scale [NIHSS] scores), laboratory tests (alanine transaminase [ALT], low-density lipoprotein [LDL], high-density lipoprotein [HDL], cholestenone, triglyceride, creatine, urea, uric acid, Glucose [Glu]), (thyroid-stimulating hormone [TSH], triiodothyronine [T3], thyroxine [T4], erythrocyte, leukocyte, neutrophil to lymphocyte ratio [NLR], hemoglobin, thrombocyte, erythrocyte sedimentation rate [ESR], C-reactive protein [CRP], homocysteine [Hcy], glycosylated hemoglobin [HbA1c], fibrinogen [FIB], D-dimer), imaging examination assessment (pure CC infarction, infarction subregion of CC, other infarction areas, location of angiostenosis, number of angiostenosis, extracranial carotid plague, TOAST subtype (26)), functional status (Modified Rankin scale [mRS] at 3-month), secondary prevention and recurrence (rehabilitation treatment, regular secondary prevention and recurrent stroke).
In detail, rehabilitation here referred to a series of standardized rehabilitation therapy obtained in rehabilitation hospitals, which mainly focuses on the motor and language function. Moreover, it also included lifestyle modification and taking medication exactly as prescribed at Discharge Notes, as well as additional carotid surgery or stenting, repairment for closure of patent foramen ovale, and surgery for intracranial or vertebral stenosis if necessary (27).
Additionally, the corresponding neuroimaging evidences were collected from both (i) MRI (Magneton Impact 3.0 T, Siemens, Berlin, Germany), including T1-imaging, T2-imaging and diffusion-weighted imaging (DWI), and (ii) MR-angiography (MAGNETOM Skyra 3.0 T, Siemens) or CT-angiography (Aquillion One, Toshiba, Tokyo, Japan). As shown in Figure 2, the patients could be divided into 2 groups according to DWI patterns: pure callosal infarcts and complex callosal infarcts. The former was further subdivided into following subgroups: (i) Pure genu infarction of the corpus callosum, (ii) Pure body infarction of the corpus callosum, and (iii) Pure splenium infarction of the corpus callosum according to the subregions of CC.
Figure 2. Representative images of pure and complex callosal infarction. (A) Complex callosal infarction. (B) Pure genu infarction of the corpus callosum. (C) Pure body infarction of the corpus callosum. (D) Pure splenium infarction of the corpus callosum.
2.3. Cognitive dysfunction definition
Telephone follow-up surveys were carried out for the patients with definite diagnosis of CC infarction one-year after onset. According to the cognitive decline module of the Behavioral Risk Factor Surveillance System (BRFSS), which is the largest ongonging health survey system in the world (28). SCD was identified by the question of BRFSS, “During the past 12 months, have you experienced confusion or memory loss that is happening more often or is getting worse?” (29–31). If respondents had a clear cognitive complaint compared with the self-perception before stroke, they were classified as suffering from post-stroke SCD, otherwise they were distinguished as non-SCD. Additionally, there were five detailed questions of aggravating confusion or memory decline mentioned in the BRFSS questionnaire, including: (1) the frequency of giving up daily household activities or common chores, (2) the frequency of requirement of assistance with these daily activities, (3) the frequency of getting help, just as you wanted, (4) the frequency of work, volunteer, or social activities disturbed by the confusion or memory disorder, and (5) whether having sought medical attention for this (29, 31). These SCD-related outcomes evaluated by a five-point scale (Always, usually, sometimes, rarely, never) were dichotomized to determine if these outcomes were challenge (assigned as 1) vs.if they rarely or never happened (assigned as 0) (28). Consequently, the patients would better realize whether they had problems with post-stroke SCD and SCD related functional impairment through our telephone survey.
2.4. Machine learning
2.4.1. Features selection
Least absolute shrinkage and selection operator (LASSO) was used to select variables among high-dimensional data based on the penalty method. The originally small coefficients were compressed to 0 after compressing (32). Thereafter, regarded as non-significant variables, the corresponding variables of these coefficients were directly discarded (33). LASSO regression is also usually characterized by variable selection and complexity adjustment for construction of ML models while avoiding overfitting. However, the most ML methods could not process data with missing values, so we imputed the dataset by KNN before LASSO regression. In our study, this binary logistic regression (LASSO) model is helpful to screen out significant predictors of SCD after acute CC infarction.
2.4.2. Machine learning models
Then, the dataset was randomly divided into training set and validation set. As in most cases, the training set accounted for 70% and the validation set accounted for 30% (34). Seven comprehensive and up-to-date ML algorithms were thereafter used to develop the predictive models, including Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), Gaussian Naïve Bayes (GNB), Complement Naïve Bayes (CNB), and Support vector machine (SVM). For each ML-based model, five-fold cross-validation was performed to evaluate the generalization ability (35), and the optimal hyperparameters were selected subsequently. Additionally, the following indicators are calculated to comprehensively evaluate the performance of different models: area under the curve (AUC)-value, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 scores.
2.4.3. Personalized interpretation
Specifically, we further utilized a novel approach to explain the output of the highest-performing ML model, namely Shapley Additive explanation (SHAP), rooted in Shapley value. Calculated the marginal contribution of a feature when it is added to the “black-box” model, then the SHAP value takes the average value considering the different marginal contribution of the feature in all permutations of individuals (36). A feature with a positive SHAP value improves the output value, and those larger numerical values make greater contributions (37). In our study, the SHAP summary plot, the importance ranking, and the SHAP dependence plot of the relevant covariates were used to improve the interpretability. SHAP explainer was suitable to visualize the black-box ML algorithms on the basis of the cooperative game theory (36). The advantage of SHAP method is be able to explain how much and in which direction each predictor influences the optimal ML-model’s output. It concluded that, the core idea of SHAP-explainer is to calculate the marginal contribution of features to model output, and then to explain the “black-box” model from global and local levels (38).
2.5. Data preprocessing
Firstly, indicators including ESR, CRP, TSH, T3, T4 were excluded because of the high missing ratio (over 30% (39), respectively). Secondly, categorical variables were encoded into dummy variables, and the details were as follows: (i) TOAST subtypes were converted into range 1–5 (LAA = 1, CE = 2, SAO = 3, ODC = 4, UND = 5), (ii) Infarction region of CC were divided into range 1–5 (rostrum = 1, genu = 2, body = 3, splenium = 4, at least two of rostrum, genu, body and splenium = 5), and (iii) Other infarction areas were turned into range 0–5 (none = 0, frontal lobe = 1, parietal lobe = 2, temporal lobe = 3, occipital lobe = 4, others = 5), (iv) Location of angiostenosis were encoded into range 0–4 (none = 0, ICA = 1, VBA = 2, both of ICA and VBA = 3), etc. After that, remaining indicators were processed by K-nearest-neighbor (KNN) analysis to impute their missing values (40). In the end, the Borderline-1 SMOTE (BLSMOTE) algorithm was also adopted to balance the samples between the SCD group and non-SCD group in an absolute fairness (for 50%, respectively), which would improve the reliability or classifying performance of the ML-models (41).
2.6. Statistical analysis
Continuous data were uniformly described as mean (SD) or median (IQR), while categorical data were presented as n (%). Baseline characteristics were compared between the SCD group and non-SCD group after CC infarction by Chi-square test (categorical variables), two-sample t-test (continuous variables with symmetric distribution), Mann–Whitney U test (continuous variables with asymmetric distribution), or Welch’s t-test (continuous variables with heterogeneity of variance), as appropriate. Then, variables with a relatively remarkable (p < 0.1) association with cognitive outcome in univariable analysis were further analyzed by multivariable analysis with a traditional forward stepwise selection. All statistical analyzes were performed using programming language R package (version 3.6.3, https://cran.r-project.org/bin/windows/base/) and all ML-relevant workflows were performed using python (version 3.7, https://www.python.org/getit/); p < 0.05 indicates statistical significance.
3. Results
3.1. Demographics
The baseline demographical, clinical, biochemical and neuroimaging characteristics of 213 patients (75 female) with acute CC infarction were summarized in Supplementary Table S1. The average age at baseline was 63 [55, 69] years. After 1 year follow-up period, 110 subjects developed into post-stroke SCD, while the remaining were no-complaint (NC) patients. Compared to NC participants, SCD patients tended to be slightly older (63 [58, 71] vs. 61 [51, 68] years, p = 0.012), had a higher percentage of female (45.4% vs. 24.2%, p = 0.001), and higher mRS scores at 3 month (1 [1–3] vs. 1 [0–2], p <0.001). Pure CC infarction seems to be more likely to cause post-stroke SCD (p = 0.030). Meanwhile, the group with more than two subregions involvement of CC infarction were especially vulnerable to post-stroke SCD (p = 0.001). Furthermore, patients with post-stroke SCD were prone to have multiple angiostenosis with both of internal carotid artery (ICA) and vertebral basilar artery (VBA) involved (p = 0.009).
3.2. Multivariable analysis of risk factors
According to traditional forward selection, we found that female (OR: 3.344; 95% CI: 1.656–6.998; p = 0.001), 3-month mRS scores (OR: 1.380; 95% CI: 1.109–1.736; p = 0.005), pure CC infarction (OR: 4.823; 95% CI: 1.531–17.919; p = 0.011) were the eligible independent risk factors for SCD after acute CC infarction (Table 1). Compared with the patients with acute infarction of the genu, patients with infarction of the splenium but not rostrum or body were more likely to have cognitive deterioration during follow-up (OR: 3.058; 95% CI: 1.221–8.183; p = 0.020). Furthermore, the patients with at least two subregions of CC infarction were more susceptible to post-stroke SCD than those only with lesions in the genu (OR: 7.370; 95% CI: 2.649–22.124; p < 0.001).
Table 1. Multivariate logistic regression for the risk factors of post-stroke SCD after callosal infarction.
3.3. Performance of machine learning models
Based on the predictors selected by LASSO in the supplementary materials (Supplementary Figure S1), different artificial intelligence (AI) -derived models were constructed (Table 2). According to the metrics, the AUC and accuracy of the LR model were obviously better than those of the other six models, respectively. Therefore, the logistic model was selected as the most prominent one for predicting SCD after acute CC infarction, which achieved an AUC of 0.771 (±0.042) and accuracy of 0.703 (±0.050) in the validation set. The ROC-curves and Forest map of AUC values for the LR and the other models were shown in Figure 3.
Figure 3. The ROC-curves and Forest map of AUC values for seven models. (A) The ROC curves for the different machine learning algorithms, and LR model yielded the greatest AUC among all the models. (B) The Forest map of AUC values of the seven models. The dots mean the AUC-value of each model, and the confidence intervals are depicted by the vertical lines.
Moreover, we calculated the contribution of each predictor to LR model by SHAP algorithm, which can simultaneously reveal the power and direction of these factors. Thereafter, features were ranked on the basis of the absolute SHAP values over all samples (Figure 4A). As is depicted in Figure 4B, high values of infarction subregions of CC, female, 3-month mRS score and pure CC infarction have positive impact on the output of LR model, indicating the acceleration of cognitive deficit after acute CC infarction. Importantly also, age, HCY, NLR, location and number of angiostenosis were the other top-9 predictors for post-stroke SCD based on Shapely value.
Figure 4. Matrix plots of the top nine important features and the SHAP plots for two selected patients. (A) The SHAP summary plot of LR model. Each dot represents a SHAP-value for a feature. The red color means high value, while the blue means low. The positive SHAP-value represents an increased risk of post-stroke SCD for the output of LR model, and vice versa. (B) The histogram of mean absolute SHAp values of top-nine important features of LR model. The longer the bar, the larger impact the feature has on the output. (C,D) SHAP force plots for two selected patients. Feature values colored red are pushing the predictive outcome towards cognitive impairment, while feature values colored blue are just the opposite. The associated Shapley value of each feature is visualized by the length of an arrow, and the longer of the arrow, the more significant the feature value is.
SHAP model is a relatively all-powerful ML-model interpretation method, which can also be used for personalized interpretation. That means, individual patient predictions can be extracted to visualize which features played a role in their cognitive decline and what their feature values were. For instance, Figure 4C exhibits a subject with a predicted possibility of 74% for SCD after CC infarction by LR-model. The plot explains that location of angiostenosis = 3.0 (both of ICA and VBA), infarction subregion of CC = 5.0 (at least two of rostrum, genu, body, and splenium) and female = 1 (female) are the most remarkable values contributing to the increased chance of cognitive disorder, while 3-month mRS score = 0 is just the opposite. Ultimately, the result indicated a high-risk of post-stroke SCD for this subject, and the follow-up result confirmed cognitive impairment outcome, which means true positive. Similarly, Figure 4D exhibits a case with a predicted possibility of 37% for post-stroke SCD, in other words, that means a possibility of 63% for non-SCD after CC infarction. The most essential positive contributors towards adverse cognitive outcome are NLR = 1.9 and HCY = 12.3. Inversely, the negative contributors involve location of angiostenosis = 0.0 (none) and age = 62.0. Therefore, the LR-based algorithm’s result was low-risk of SCD after CC infarction for this subject, and the actual outcome was identified as non-cognitive impairment (true negative).
4. Discussion
The presence of SCD is known to be associated with a high risk for objective cognitive decline and even clinical progression to symptomatic disease stages (42, 43). Effective intervention to delay or prevent pathologic cognitive decline may best to targeted at the earliest symptomatice disease stage, such as SCD, in which cognitive function is still relatively preserved (44). This is an exploratory study that for the first time focuses on post-stroke SCD of rare CC infarction via an interpretable machine learning-derived early warning strategy.
After multivariate adjustment for potential confonders, we found that female, 3-month mRS scores, pure CC infarction and infarction subregion of CC independently correlated with the incidence of SCD. Interestingly, our previous study has reported that males had a higher incidence of CC infarction (5), while in the current cohort, we found females were more susceptible to SCD after this specific infarction. Reasons for this phenomenon may include: (1) females in the present study had an older onset-age of CC infarction than males (64 [58,71] vs. 62 [55,69]), (2) women are usually considered to have a lager corpus callosum volume (45–47), indicating that callosum may play a more important role in maintaining brain function of females, (3) women tend to have higher cortisol but lower estradiol levels in menopausal period (48). Indeed, scholars have well-clarified that higher serum cortisol is correlated with more severe microstructural WMLs, particularly in CC, while estrogen are thought to promote the remyelination, and the latter in turn is strongly associated with general cognitive capacity (49–51). Meanwhile, a strong interaction between serum cortisol and cerebral atrophy among females, but not males was also identified (52). Richa et al. (53) once reported that the MoCA scores (between 4–8 weeks post-infarct) were obviously correlated to mRS scores (at the same follow-up points) among the stroke patients. Then, our results showed that 3-month mRS scores were related to longer-time cognitive outcome due to CC infarction.
The structure of CC can be divided into four classical parts in the order from front to back: rostrum, genu, body and splenium (54). Consistent with previous reports (5), we found that the incidence of ‘pure’ CC infarction was rare, while the mental disturbance and cognitive dysfunction were more prominent than ‘complex’ CC infarction. The mechanisms of the discrepancy are still unclear, perhaps the atypical symptoms and insufficient distinguishment of MRI scan made it difficult to draw sufficient attention and appropriate prevention of ‘pure’ CC infarction. Meanwhile, we reported for the first time that acute infarction in the splenium had a higher tendency of cognitive decline than that in the genu. As the most vulnerable area of the CC infarction, the splenium is more vulnerable to insufficient blood supply, and the splenium lesions were known to be related with cognitive disorder, aphasia, homonymous hemianopsia, alien hand syndrome and so on (54). Therefore, we believe that the splenium plays an important role in the high incidence of SCD caused by CC infarction to some extent. What’s more, patients with at least two subregions of CC infarction were more susceptible to SCD than those only had lesions in the genu. This result is well understood given that the more structural damage CC is, the more disrupted the fiber connections and information transmission between the bilateral hemispheres. Besides, evidence showing that the infarction in body or splenium of CC could lead to disturbed executive capacity, attention and calculation (55), which may provide a side note for our viewpoint.
Besides multivariable analysis, LASSO analysis was also adopted to select potential risk predictors by eliminating irrelevant features. It is universally accepted that age was a risk factor of cognition damage after various types of ischemic stroke (56). Except of age, evidence linking high HCY(HHCY) and cognitive decline is profuse (57). It is known that, HHCY is not only associated with WMLs, but also the progression of WMLs (58). In the meantime, extensive intracranial vascular stenosis is another promotor for SCD after CC infarction. Cerebral angiostenosis/occlusion has already been proved to induce hypoperfusion and impaired executive dysfunctions, such as working memory, attention, cognitive flexibility, planning, thought organization and implementation (59). This phenomenon indicates that appropriate increase of cerebral blood flow may help prevent post-stroke SCD. Interestingly, NLR is often known as a risk factor for PSCI (60). However, we found that NLR is negatively associated with self-report cognitive decline, indicating that NLR is likely to act as a compensatory neuroprotective response in the early stage of CC infarction. Biological mechanisms between NLR and risk of post-stroke SCD have not been explored before and warrants further clarifications, especially in the condition of CC infarction.
In our study, the combination of LASSO regression and ML-based models was beneficial to identify the optimal configuration to predict whether it is vulnerable to develop SCD after CC infarction or not. Then, the seven ML algorithms were assessed by several metrics, comprising AUC-value, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), as well as F1 scores. Apart from global explanation, the well-accepted local explanation, SHAP was also implemented to interpret how a complex black-box ML model makes a prediction (61). By incorporating the individualized patient profile, the level of contribution and directionality of specific input features were visualized (62). As shown in the Table 1, the LR model seemed to be the best-performance classifier with the highest scores of AUC- value (77.1%), accuracy (70.3%), sensitivity (76.3%) and NPV (76.5%). In addition, acceptable values of specificity, PPV and F1 score (all above 65.0%) were achieved in the validation set. Taken together, we selected the LR model as the optimal algorithm with the best generalization ability. At the same time, we suggested that we should treat this problem dialectically and choose appropriate predictive classifier according to different clinical needs.
The strength of our research is that the cohort has the largest sample of CC infarction in the world, and the datasets are non-synthetic, which is more likely to be objective and effective as a screening tool. Unlike studies focused on each risk factor individually or its pathophysiological interpretation (63, 64), we aimed to encompass a large combination of variables from real-world clinical situations once. The variables we used, including demographics, laboratory and radiological findings were all easily accessible for clinicians, which could assist with the early prediction and prevention for suspected post-stroke SCD. Additionally, an interpretable and explainable ML model was created with the help of SHAP-explainer, promoting to make individualized clinical decisions.
There are some limitations that still needed to be ironed out in our study. Firstly, although this investigation had the maximal population of patients with acute CC infarction, the sample size was still needed to be added. Secondly, we did not exploit the different cognitive abilities separately, such as orientation, calculation, executive abilities, long-term and short-term memory and attention, etc. Thirdly, the follow-up period is not long enough to verify the proportion of patients with SCD who eventually converted to PSCI. Therefore, muti-center prospective cohorts with detailed cognitive domains impairment are needed in the future.
5. Conclusion
In conclusion, the present study screened out 9 key features associated with post-SCD and developed a LR-model which can improve the prediction accuracy of one-year SCD after CC infarction. What’s more, the individual report generated by SHAP facilitate the early implementation of primary prevention measures. Based on these techniques, we are even expected to continue to individually predict the long-term effects of different clinical drugs on cognitive impairment to shape a brighter future for patients with CC infarction.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.
Ethics statement
The studies involving human participants were reviewed and approved by Changhai Hospital Ethics Committee (No. CHEC2021-1021). The patients/participants provided their written informed consent to participate in this study.
Author contributions
YX, XS, and XB: conceived and designed the study. YX, XS, YL, YH, ML, RS, GY, CS, QD, BD, and XB: performed the study. YX, XS, YL, BD, and XB: revised the article for intellectual content. YX, XS, and BD: wrote the article. All authors contributed to the article and approved the submitted version.
Funding
The work was financially supported by the National Natural Science Foundation of China (81871040 and 82101563), the Clinical Research Plan of SHDC (no. SHDC2020CR1038B), Scientific research project of Shanghai Health Commission (20214Y0500), and the Youth Program of Naval Medical University (2021JCQN10).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2023.1123607/full#supplementary-material
References
1. De León Reyes, NS, Bragg-Gonzalo, L, and Nieto, M. Development and plasticity of the corpus callosum. Development. (2020) 147:dev189738. doi: 10.1242/dev.189738
2. Türe, U, Yaşargil, MG, and Krisht, AF. The arteries of the corpus callosum: a microsurgical anatomic study. Neurosurgery. (1996) 39:1075–84. doi: 10.1097/00006123-199612000-00001
3. Chrysikopoulos, H, Andreou, J, Roussakis, A, and Pappas, J. Infarction of the corpus callosum: computed tomography and magnetic resonance imaging. Eur J Radiol. (1997) 25:2–8. doi: 10.1016/s0720-048x(96)01155-2
4. Giroud, M, and Dumas, R. Clinical and topographical range of callosal infarction: a clinical and radiological correlation study. J Neurol Neurosurg Psychiatry. (1995) 59:238–42. doi: 10.1136/jnnp.59.3.238
5. Sun, X, Li, J, Fan, C, Zhang, H, Si, Y, and Fang, X. Clinical, neuroimaging and prognostic study of 127 cases with infarction of the corpus callosum. Eur J Neurol. (2019) 26:1075–81. doi: 10.1111/ene.13942
6. Gelibter, S, Genchi, A, Callea, M, Anzalone, N, Galantucci, S, and Volonté, MA. Corpus callosum infarction: radiological and histological findings. J Neurol. (2020) 267:3418–20. doi: 10.1007/s00415-020-10224-8
7. Jessen, F, Spottke, A, Boecker, H, Brosseron, F, Buerger, K, and Catak, C. Design and first baseline data of the DZNE multicenter observational study on predementia Alzheimer's disease (DELCODE). Alzheimers Res Ther. (2018) 10:15. doi: 10.1186/s13195-017-0314-2
8. Rabin, LA, Smart, CM, Crane, PK, Amariglio, RE, Berman, LM, and Boada, M. Subjective cognitive decline in older adults: an overview of self-report measures used across 19 international research studies. J Alzheimers Dis. (2015) 48:S63–86. doi: 10.3233/JAD-150154
9. Abner, EL, Kryscio, RJ, Caban-Holt, AM, and Schmitt, FA. Baseline subjective memory complaints associate with increased risk of incident dementia: the PREADVISE trial. J Prev Alzheimers Dis. (2015) 2:11–6. doi: 10.14283/jpad.2015.37
10. Reisberg, B, Shulman, MB, Torossian, C, Leng, L, and Zhu, W. Outcome over seven years of healthy adults with and without subjective cognitive impairment. Alzheimers Dement. (2010) 6:11–24. doi: 10.1016/j.jalz.2009.10.002
11. Des Portes, V, Rolland, A, Velazquez-Dominguez, J, Peyric, E, Cordier, MP, and Gaucherand, P. Outcome of isolated agenesis of the corpus callosum: a population-based prospective study. Eur J Paediatr Neurol. (2018) 22:82–92. doi: 10.1016/j.ejpn.2017.08.003
12. Huynh-Le, MP, Tibbs, MD, Karunamuni, R, Salans, M, Tringale, KR, and Yip, A. Microstructural injury to Corpus callosum and Intrahemispheric White matter tracts correlate with attention and processing speed decline after brain radiation. Int J Radiat Oncol Biol Phys. (2021) 110:337–47. doi: 10.1016/j.ijrobp.2020.12.046
13. Platten, M, Martola, J, Fink, K, Ouellette, R, Piehl, F, and Granberg, T. MRI-based manual versus automated Corpus callosum volumetric measurements in multiple sclerosis. J Neuroimaging. (2020) 30:198–204. doi: 10.1111/jon.12676
14. Sidtis, JJ, Volpe, BT, Holtzman, JD, Wilson, DH, and Gazzaniga, MS. Cognitive interaction after staged callosal section: evidence for transfer of semantic activation. Science (New York, NY). (1981) 212:344–6. doi: 10.1126/science.6782673
15. Jessen, F, Amariglio, RE, van Boxtel, M, Breteler, M, Ceccaldi, M, and Chételat, G. A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer's disease. Alzheimers Dement. (2014) 10:844–52. doi: 10.1016/j.jalz.2014.01.001
16. Smart, CM, Karr, JE, Areshenkoff, CN, Rabin, LA, Hudon, C, and Gates, N. Non-pharmacologic interventions for older adults with subjective cognitive decline: systematic review, Meta-analysis, and preliminary recommendations. Neuropsychol Rev. (2017) 27:245–57. doi: 10.1007/s11065-017-9342-8
17. Deo, RC. Machine learning in medicine. Circulation. (2015) 132:1920–30. doi: 10.1161/CIRCULATIONAHA.115.001593
18. Heo, J, Yoon, JG, Park, H, Kim, YD, Nam, HS, and Heo, JH. Machine learning-based model for prediction of outcomes in acute stroke. Stroke. (2019) 50:1263–5. doi: 10.1161/STROKEAHA.118.024293
19. Lee, H, Lee, EJ, Ham, S, Lee, HB, Lee, JS, and Kwon, SU. Machine learning approach to identify stroke within 4.5 hours. Stroke. (2020) 51:860–6. doi: 10.1161/STROKEAHA.119.027611
20. Vodencarevic, A, Weingärtner, M, Caro, JJ, Ukalovic, D, Zimmermann-Rittereiser, M, and Schwab, S. Prediction of recurrent ischemic stroke using registry data and machine learning methods: the Erlangen stroke registry. Stroke. (2022) 53:2299–306. doi: 10.1161/STROKEAHA.121.036557
21. Xu, D, Song, Y, Meng, Y, István, B, and Gu, Y. Relationship between firefighter physical fitness and special ability performance: predictive research based on machine learning algorithms. Int J Environ Res Public Health. (2020) 17:7689. doi: 10.3390/ijerph17207689
22. Vélez, JI. Machine learning based psychology: advocating for a data-driven approach. Int J Psychol Res (Medellin). (2021) 14:6–11. doi: 10.21500/20112084.5365
23. Phellan, R, Hachem, B, Clin, J, Mac-Thiong, JM, and Duong, L. Real-time biomechanics using the finite element method and machine learning: review and perspective. Med Phys. (2021) 48:7–18. doi: 10.1002/mp.14602
24. Watson, DS, Krutzinna, J, Bruce, IN, Griffiths, CE, McInnes, IB, and Barnes, MR. Clinical applications of machine learning algorithms: beyond the black box. BMJ. (2019) 364:l886. doi: 10.1136/bmj.l886
25. Liu, S, Schlesinger, JJ, McCoy, AB, Reese, TJ, Steitz, B, Russo, E, et al. New onset delirium prediction using machine learning and long short-term memory (LSTM) in electronic health record. J Am Med Inform Assoc. (2022) 30:120–31. doi: 10.1093/jamia/ocac210
26. Adams, HP Jr, Bendixen, BH, Kappelle, LJ, Biller, J, Love, BB, and Gordon, DL. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of org 10172 in acute stroke treatment. Stroke. (1993) 24:35–41. doi: 10.1161/01.str.24.1.35
27. Diener, HC, and Hankey, GJ. Primary and secondary prevention of ischemic stroke and cerebral hemorrhage: JACC focus seminar. J Am Coll Cardiol. (2020) 75:1804–18. doi: 10.1016/j.jacc.2019.12.072
28. Xu, P, Zhang, F, Cheng, J, Huang, Y, Ren, Z, Ye, R, et al. The relationship between physical activity and subjective cognitive decline: evidence from the behavioral risk factor surveillance system (BRFSS). J Affect Disord. (2023) 328:108–15. doi: 10.1016/j.jad.2023.02.045
29. Gupta, S. Racial and ethnic disparities in subjective cognitive decline: a closer look, United States, 2015-2018. BMC public health (2021) 21:1173. doi: 10.1186/s12889-021-11068-1
30. Burns, SP, Mueller, M, Magwood, G, White, BM, Lackland, D, and Ellis, C. Racial and ethnic differences in post-stroke subjective cognitive decline exist. Disabil Health J. (2019) 12:87–92. doi: 10.1016/j.dhjo.2018.08.005
31. Taylor, CA, Bouldin, ED, Greenlund, KJ, and McGuire, LC. Comorbid chronic conditions among older adults with subjective cognitive decline, United States, 2015-2017. Innov Aging. (2020) 4:igz045. doi: 10.1093/geroni/igz045
32. Martínez-Laperche, C, Buces, E, Aguilera-Morillo, MC, Picornell, A, González-Rivera, M, and Lillo, R. A novel predictive approach for GVHD after allogeneic SCT based on clinical variables and cytokine gene polymorphisms. Blood Adv. (2018) 2:1719–37. doi: 10.1182/bloodadvances.2017011502
33. Laukhtina, E, Schuettfort, VM, D'Andrea, D, Pradere, B, Quhal, F, and Mori, K. Selection and evaluation of preoperative systemic inflammatory response biomarkers model prior to cytoreductive nephrectomy using a machine-learning approach. World J Urol. (2022) 40:747–54. doi: 10.1007/s00345-021-03844-w
34. Tseng, PY, Chen, YT, Wang, CH, Chiu, KM, Peng, YS, and Hsu, SP. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit Care. (2020) 24:478. doi: 10.1186/s13054-020-03179-9
35. Ye, WL, Shen, C, Xiong, GL, Ding, JJ, Lu, AP, and Hou, TJ. Improving docking-based virtual screening ability by integrating multiple energy auxiliary terms from molecular docking scoring. J Chem Inf Model. (2020) 60:4216–30. doi: 10.1021/acs.jcim.9b00977
36. Czub, N, Pacławski, A, Szlęk, J, and Mendyk, A. Curated database and preliminary AutoML QSAR model for 5-HT1A receptor. Pharmaceutics. (2021) 13:1711. doi: 10.3390/pharmaceutics13101711
37. Lundberg, SM, Nair, B, Vavilala, MS, Horibe, M, Eisses, MJ, Adams, T, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. (2018) 2:749–60. doi: 10.1038/s41551-018-0304-0
38. Duckworth, C, Chmiel, FP, Burns, DK, Zlatev, ZD, White, NM, and Daniels, TWV. Using explainable machine learning to characterise data drift and detect emergent health risks for emergency department admissions during COVID-19. Sci Rep. (2021) 11:23017. doi: 10.1038/s41598-021-02481-y
39. Zhao, Y, Chen, Q, Liu, T, Luo, P, Zhou, Y, and Liu, M. Development and validation of predictors for the survival of patients with COVID-19 based on machine learning. Front Med. (2021) 8:683431. doi: 10.3389/fmed.2021.683431
40. Bania, RK, and Halder, A. R-HEFS: rough set based heterogeneous ensemble feature selection method for medical data classification. Artif Intell Med. (2021) 114:102049. doi: 10.1016/j.artmed.2021.102049
41. Fotouhi, S, Asadi, S, and Kattan, MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform. (2019) 90:103089. doi: 10.1016/j.jbi.2018.12.003
42. Jessen, F, Wolfsgruber, S, Wiese, B, Bickel, H, Mösch, E, and Kaduszkiewicz, H. AD dementia risk in late MCI, in early MCI, and in subjective memory impairment. Alzheimers Dement. (2014) 10:76–83. doi: 10.1016/j.jalz.2012.09.017
43. Snitz, BE, Wang, T, Cloonan, YK, Jacobsen, E, Chang, CH, and Hughes, TF. Risk of progression from subjective cognitive decline to mild cognitive impairment: the role of study setting. Alzheimers Dement. (2018) 14:734–42. doi: 10.1016/j.jalz.2017.12.003
44. Rabin, LA, Smart, CM, and Amariglio, RE. Subjective cognitive decline in preclinical Alzheimer's disease. Annu Rev Clin Psychol. (2017) 13:369–96. doi: 10.1146/annurev-clinpsy-032816-045136
45. Ardekani, BA, Figarsky, K, and Sidtis, JJ. Sexual dimorphism in the human corpus callosum: an MRI study using the OASIS brain database. Cereb Cortex. (2013) 23:2514–20. doi: 10.1093/cercor/bhs253
46. Eliot, L, Ahmed, A, Khan, H, and Patel, J. Dump the "dimorphism": comprehensive synthesis of human brain studies reveals few male-female differences beyond size. Neurosci Biobehav Rev. (2021) 125:667–97. doi: 10.1016/j.neubiorev.2021.02.026
47. Potvin, O, Mouiha, A, Dieumegarde, L, and Duchesne, S. Corrigendum to 'Normative data for subcortical regional volumes over the lifetime of the adult human brain' [NeuroImage 137 (2016) 9-20]. NeuroImage. (2018) 183:994–5. doi: 10.1016/j.neuroimage.2018.09.020
48. Mezzullo, M, Gambineri, A, Di Dalmazi, G, Fazzini, A, Magagnoli, M, and Baccini, M. Steroid reference intervals in women: influence of menopause, age and metabolism. Eur J Endocrinol. (2021) 184:395–407. doi: 10.1530/EJE-20-1147
49. He, Q, Luo, Y, Lv, F, Xiao, Q, Chao, F, and Qiu, X. Effects of estrogen replacement therapy on the myelin sheath ultrastructure of myelinated fibers in the white matter of middle-aged ovariectomized rats. J Comp Neurol. (2018) 526:790–802. doi: 10.1002/cne.24366
50. Luo, Y, Xiao, Q, Chao, F, He, Q, Lv, F, and Zhang, L. 17β-estradiol replacement therapy protects myelin sheaths in the white matter of middle-aged female ovariectomized rats: a stereological study. Neurobiol Aging. (2016) 47:139–48. doi: 10.1016/j.neurobiolaging.2016.07.023
51. Penke, L, Maniega, SM, Bastin, ME, Valdés Hernández, MC, Murray, C, and Royle, NA. Brain white matter tract integrity as a neural foundation for general intelligence. Mol Psychiatry. (2012) 17:1026–30. doi: 10.1038/mp.2012.66
52. Echouffo-Tcheugui, JB, Conner, SC, Himali, JJ, Maillard, P, DeCarli, CS, and Beiser, AS. Circulating cortisol and cognitive and structural brain measures: the Framingham heart study. Neurology. (2018) 91:e1961–70. doi: 10.1212/WNL.0000000000006549
53. Sharma, R, Mallick, D, Llinas, RH, and Marsh, EB. Early post-stroke cognition: in-hospital predictors and the association with functional outcome. Front Neurol. (2020) 11:613607. doi: 10.3389/fneur.2020.613607
54. Katsuki, M, Kato, H, Niizuma, H, Nakagawa, Y, and Tsunoda, M. Homonymous Hemianopsia due to the infarction in the splenium of the Corpus callosum. Cureus. (2021) 13:e19574. doi: 10.7759/cureus.19574
55. Huang, X, Du, X, Song, H, Zhang, Q, Jia, J, and Xiao, T. Cognitive impairments associated with corpus callosum infarction: a ten cases study. Int J Clin Exp Med. (2015) 8:21991–8.
56. Pendlebury, ST, and Rothwell, PM, Oxford Vascular Study. Incidence and prevalence of dementia associated with transient ischaemic attack and stroke: analysis of the population-based Oxford vascular study. Lancet Neurol. (2019) 18:248–58. doi: 10.1016/S1474-4422(18)30442-3
57. Hannibal, L, and Blom, HJ. Homocysteine and disease: causal associations or epiphenomenons? Mol Asp Med. (2017) 53:36–42. doi: 10.1016/j.mam.2016.11.003
58. Vermeer, SE, van Dijk, EJ, Koudstaal, PJ, Oudkerk, M, Hofman, A, and Clarke, R. Homocysteine, silent brain infarcts, and white matter lesions: the Rotterdam scan study. Ann Neurol. (2002) 51:285–9. doi: 10.1002/ana.10111
59. Zhao, JH, Tian, XJ, Liu, YX, Yuan, B, Zhai, KH, and Wang, CW. Executive dysfunction in patients with cerebral hypoperfusion after cerebral angiostenosis/occlusion. Neurol Med Chir. (2013) 53:141–7. doi: 10.2176/nmc.53.141
60. Lee, M, Lim, JS, Kim, CH, Lee, SH, Kim, Y, and Hun Lee, J. High neutrophil-lymphocyte ratio predicts post-stroke cognitive impairment in acute ischemic stroke patients. Front Neurol. (2021) 12:693318. doi: 10.3389/fneur.2021.693318
61. Zheng, Y, Guo, Z, Zhang, Y, Shang, J, Yu, L, and Fu, P. Rapid triage for ischemic stroke: a machine learning-driven approach in the context of predictive, preventive and personalised medicine. EPMA J. (2022) 13:285–98. doi: 10.1007/s13167-022-00283-4
62. Kim, S, Jeon, E, Yu, S, Oh, K, Kim, C, and Song, T. Interpretable machine learning for early neurological deterioration prediction in atrial fibrillation-related stroke. Sci Rep. (2021) 11:20610. doi: 10.1038/s41598-021-99920-7
63. Chander, RJ, Lim, L, Handa, S, Hiu, S, Choong, A, and Lin, X. Atrial fibrillation is independently associated with cognitive impairment after ischemic stroke. J Alzheimers Dis. (2017) 60:867–75. doi: 10.3233/JAD-170313
Keywords: corpus callosum infarction, cognitive impairment, machine learning, subjective cognitive decline, Shapley additive explanations
Citation: Xu Y, Sun X, Liu Y, Huang Y, Liang M, Sun R, Yin G, Song C, Ding Q, Du B and Bi X (2023) Prediction of subjective cognitive decline after corpus callosum infarction by an interpretable machine learning-derived early warning strategy. Front. Neurol. 14:1123607. doi: 10.3389/fneur.2023.1123607
Edited by:
Jean-Claude Baron, University of Cambridge, United KingdomCopyright © 2023 Xu, Sun, Liu, Huang, Liang, Sun, Yin, Song, Ding, Du and Bi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Bingying Du, MTU4MDA2MTQxNDJAMTYzLmNvbQ==; Xiaoying Bi, Yml4aWFveWluZzIwMTNAMTYzLmNvbQ==
†These authors have contributed equally to this work and share first authorship