Predicting Global Cognitive Decline in the General Population Using the Disease State Index

Cremers, Lotte G. M.; Huizinga, Wyke; Niessen, Wiro J.; Krestin, Gabriel P.; Poot, Dirk H. J.; Ikram, M. Arfan; Lötjönen, Jyrki; Klein, Stefan; Vernooij, Meike W.

doi:10.3389/fnagi.2019.00379

ORIGINAL RESEARCH article

Front. Aging Neurosci. , 23 January 2020

Sec. Alzheimer's Disease and Related Dementias

Volume 11 - 2019 | https://doi.org/10.3389/fnagi.2019.00379

Predicting Global Cognitive Decline in the General Population Using the Disease State Index

$\r\nLotte G. M. Cremers,&#x;$ Lotte G. M. Cremers^1,2†

Wyke Huizinga^1,3†

Wiro J. Niessen^1,3,4

Gabriel P. Krestin¹

Dirk H. J. Poot^1,3

M. Arfan Ikram^1,2,5

Jyrki Lötjönen^6,7

Stefan Klein^1,3‡

Meike W. Vernooij^1,2*‡

¹Department of Radiology and Nuclear Medicine, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
²Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
³Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, Netherlands
⁴Department of Imaging Science and Technology, Faculty of Applied Sciences, Delft University of Technology, Delft, Netherlands
⁵Department of Neurology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
⁶VTT Technical Research Centre of Finland, Tampere, Finland
⁷Combinostics, Tampere, Finland

Background: Identifying persons at risk for cognitive decline may aid in early detection of persons at risk of dementia and to select those that would benefit most from therapeutic or preventive measures for dementia.

Objective: In this study we aimed to validate whether cognitive decline in the general population can be predicted with multivariate data using a previously proposed supervised classification method: Disease State Index (DSI).

Methods: We included 2,542 participants, non-demented and without mild cognitive impairment at baseline, from the population-based Rotterdam Study (mean age 60.9 ± 9.1 years). Participants with significant global cognitive decline were defined as the 5% of participants with the largest cognitive decline per year. We trained DSI to predict occurrence of significant global cognitive decline using a large variety of baseline features, including magnetic resonance imaging (MRI) features, cardiovascular risk factors, APOE-ε4 allele carriership, gait features, education, and baseline cognitive function as predictors. The prediction performance was assessed as area under the receiver operating characteristic curve (AUC), using 500 repetitions of 2-fold cross-validation experiments, in which (a randomly selected) half of the data was used for training and the other half for testing.

Results: A mean AUC (95% confidence interval) for DSI prediction was 0.78 (0.77–0.79) using only age as input feature. When using all available features, a mean AUC of 0.77 (0.75–0.78) was obtained. Without age, and with age-corrected features and feature selection on MRI features, a mean AUC of 0.70 (0.63–0.76) was obtained, showing the potential of other features besides age.

Conclusion: The best performance in the prediction of global cognitive decline in the general population by DSI was obtained using only age as input feature. Other features showed potential, but did not improve prediction. Future studies should evaluate whether the performance could be improved by new features, e.g., longitudinal features, and other prediction methods.

Introduction

It is well established that neuropathological brain changes related to dementia accumulate over decades, and that the disease has a long preclinical phase. This may facilitate early disease detection and prediction (Jack et al., 2013). A large amount of literature on potential features and risk factors for dementia exists. However, clinicians often struggle to integrate all the data obtained from a single patient for diagnostic or prognostic purposes. Therefore, there is a need for information technologies and computer-based methods that support clinical decision making (Kloppel et al., 2008). Disease State Index (DSI) is a supervised machine learning method intended to aid clinical decision making (Mattila et al., 2011). This method compares a variety of patient variables with those variables from previously diagnosed cases, and computes an index that measures the similarity of the patient to the diagnostic group studied. The DSI method has previously been tested in specific patient populations and has shown to perform reasonably well in the early prediction of progression from mild cognitive impairment (MCI) to Alzheimer’s disease and has been successful in the classification of different dementia subtypes (Mattila et al., 2011, 2012; Munoz-Ruiz et al., 2014; Hall et al., 2015). In a recent study DSI has been validated in a population-based setting to predict late-life dementia (Pekkala et al., 2017). Identification of persons at risk for global cognitive decline may aid in early detection of persons at risk of dementia and may help to develop therapeutic or preventive measures to postpone or even prevent further cognitive decline and dementia (Blumenthal et al., 2017). This is especially important since previous research has shown that preventive interventions for dementia were more effective in persons at risk than in unselected populations (Moll van Charante et al., 2016). We therefore used DSI to predict global cognitive decline in the general population to select the persons at risk. The main aim of this study was to investigate whether multivariate data can predict global cognitive decline in the general population. If a high-risk group can be selected from the general population, a population screening program for this group might facilitate early detection of dementia. We evaluated the prediction performance using several sets of clinical features and brain magnetic resonance imaging (MRI) features, to assess whether the prediction is dependent on the combination of the input features. As brain MRI features we used all possible measures we could acquire: volumetric measures of gray matter, white matter, cerebrospinal fluid and white matter lesions, and a large variety of brain regions, both cortical as subcortical, diffusion measures, both globally as locally of a variety of tracts, cerebral blood flow measures, and the presence of microbleeds and infarcts. We used all these measures as we hypothesized that they could improve the prediction performance of cognitive decline. DSI was chosen as a classification method because this method is able to handle datasets with missing data, which is often the case in population study datasets. Also, this method has been successfully applied in previous studies and performed comparable to other state-of-the-art classifiers (Mattila et al., 2012; Pekkala et al., 2017).

Materials and Methods

Study Population

We included participants from three independent cohorts within the Rotterdam Study (RS), a prospective population-based cohort study in a suburb of Rotterdam, that investigates the determinants and occurrence of diseases in the middle-aged and elderly population (Ikram et al., 2017). Brain MRI-scanning was implemented in the study protocol since 2005 (Ikram et al., 2015). The Rotterdam Study has been approved by the medical ethics committee according to the Population Study Act Rotterdam Study, executed by the Ministry of Health, Welfare and Sports of the Netherlands. Written informed consent was obtained from all participants (Ikram et al., 2017). We used data from RS cohorts I, II, and III, of which each consists of multiple subcohorts. In this study a subcohort of RS cohort I, II, and III was used, to which we refer as sI, sII, and sIII, respectively. Baseline features of sI were collected during 2009–2011 and sII were collected during 2004–2006. The participants of the both cohorts were 55 years or older. For RS cohort III participants were 45 years or older at time of inclusion. Baseline features of sIII were collected during 2006–2008. Participants with prevalent dementia, MCI and MRI defined cortical infarcts at baseline were excluded for all analyses. In total, 4328 participants with baseline information on cognition, MRI and other features were included. Baseline MRI was acquired on average 0.3 ± 0.45 years after collecting the non-imaging features. Furthermore, diffusion-MRI was acquired. However, for a subset of 680 participants in RS cohort II diffusion-MRI data was obtained on average 3.5 ± 0.2 years later than the other baseline MRI features. Longitudinal data on global decline was available for 2,542 out of 4,328 participants. The follow-up cognitive assessment was on average 5.7 ± 0.6 years after the baseline visit.

Disease State Index

Prediction was performed with DSI (Mattila et al., 2011). This classifier derives an index indicating the disease state of the participant under investigation based on the available features of that participant. DSI has two major advantages: (1) it can cope with missing data and (2) it gives an interpretable result because DSI also provides a decision tree that can be quite well explained.

Disease State Index classifier is composed of the components: fitness and relevance (Mattila et al., 2011). Let N be the total number of negatives, P the total number of positives, FN(x) the number of false negatives, and FP(x) the number of false positives, when x is used as classification cut-off. Then the fitness function is estimated for each feature i as:

f_{i} (x) = \frac{{FNR}_{i} (x)}{{FNR}_{i} (x) + {FPR}_{i} (x)} = \frac{{FN}_{i} (x)}{{FN}_{i} (x) + \frac{P}{N} {FP}_{i} (x)}

where FNR(x)=FN(x)/P is the false negative rate and FPR(x)=FP(x)/N is the false positive rate in the training data when the feature value x is used as the classification cut-off. The fitness automatically accounts for the imbalance in class size making implicitly both classes equal in size, as the fraction P/N in the denominator scales the negative class [related to FP(x)] to correspond the size of the positive class. The fitness function is a classifier where the values <0.5 imply negative class and >0.5 positive class. The relevance of each feature is estimated by:

R = max {sensitivity + specificity - 1, 0},

which measures how good the feature is in differentiating the two classes. The lower the overlap between the distributions of positives and negatives, the higher R. Finally, DSI is computed from the equation:

DSI = \frac{\sum_{i} R_{i} f_{i}}{\sum_{i} R_{i}}

Disease State Index is a value between zero and one; somebody is classified as positive if DSI >0.5 and as negative if DSI <0.5. DSI is an ensemble classifier, meaning that it combines multiple independent classifiers (fitness functions) defined for each feature separately. Because of that, DSI can tolerate missing data. Features can be grouped in a hierarchical manner. The final DSI is a combination of the levels in the hierarchy. The fitness, relevance and their combination as a composite DSI are repeated recursively by grouping the data until a single DSI value is obtained. Therefore, the final DSI, which is used for the classification, depends on the hierarchy structure, as a different structure leads to a different averaging of the feature combinations. The top-level part of the hierarchy defined for this study is shown in Figure 1.

FIGURE 1

Figure 1. Feature categories shown in a hierarchy as used by the Disease State Index. Please note that not all individual features are included in this graph.

Baseline Features

Figure 1 shows the used categories of features in hierarchical manner. Please note that not all individual features are shown in this figure. The sections below describe all the used features (indicated in bold font) in detail.

MRI Features

Multi-sequence MR imaging was performed on a 1.5 Tesla MRI scanner (GE Signa Excite). The imaging protocol and sequence details were described extensively elsewhere (Ikram et al., 2015). Morphological imaging was performed with T1- weighted, proton density- weighted and fluid-attenuated inversion recovery (FLAIR) sequences. These sequences were used for an automated tissue segmentation approach to segment scans into gray matter, white matter, cerebrospinal fluid (CSF) and background tissue (Vrooman et al., 2007). Intracranial volume (ICV) (excluding the cerebellum and surrounding CSF cerebellar) was estimated by summing total gray and white matter and CSF. Brain tissue segmentation was complemented with a white matter lesion segmentation based on the tissue segmentation and the FLAIR image with extraction of white matter lesion voxels by intensity thresholding (de Boer et al., 2009). We obtained (sub)cortical structure volumes, cortical thickness, and curvature of the cortex and hippocampal volume using the publicly available FreeSurfer 5.1 software (Dale et al., 1999; Fischl et al., 2004; Desikan et al., 2006). For cerebral blood flow measurements, we performed a 2D phase-contrast imaging as previously described (Vernooij et al., 2008b). In short, blood flow velocity (mm/sec) was calculated based on regions of interest (ROI) drawn on the phase-contrast images in the carotid arteries and basilar artery at a level just under the skull base. The value of mean signal intensity in each ROI reflected the flow velocity with the cross-sectional area of the vessel. Flow was calculated by multiplying the average velocity with the cross-sectional area of the vessel (Vernooij et al., 2008b). A 3D T2^∗-weighted gradient-recalled echo was used to image cerebral microbleeds. Microbleeds were defined as focal areas of very low signal intensity, smaller than 10 mm in size and were rated by one of five trained raters who were blinded to other MRI sequences and to clinical data (Roob et al., 1999; Vernooij et al., 2008a). Lacunar infarcts were defined as focal parenchymal lesions >3 mm and <15 mm in size with the same signal characteristics as CSF on all sequences and with a hyperintense rim on the FLAIR image (supratentorially). Probabilistic tractography was used to segment 15 different white matter tracts in diffusion-weighted MR brain images, and we obtained mean fractional anisotropy (FA), mean diffusivity (MD), axial and radial diffusivity inside each white matter tract (de Groot et al., 2015).

Cardiovascular Risk Factors

Cardiovascular risk factors were based on information derived from home interviews and physical examinations during the center visit. Blood pressure was measured twice at the right brachial artery in sitting position using a random-zero sphygmomanometer. We used the mean of two measurements in the analyses. Information on the use of antihypertensive medication was obtained by using questionnaires and by checking the medication cabinets of the participants. Hypertension was defined as a systolic blood pressure >140 mmHg or a diastolic blood pressure >90 mmHg or the use of anti-hypertensive medication at baseline. Serum total cholesterol and high-density lipoprotein (hdL) cholesterol were measured in fasting serum, taking lipid-lowering medication into account. Smoking was assessed by interview and coded as never, former and current. Body-mass index (BMI) is defined as weight kilograms divided by height in meters squared. Diabetes mellitus status was defined as a fasting serum glucose level (>7.0 mmol/l) or, if unavailable, non-fasting serum glucose level (>11.1 mmol/l) or the use of anti-diabetic medication (Hofman et al., 2015). Alcohol consumption was acquired in a questionnaire. Prevalent stroke was ascertained as previously described (Akoudad et al., 2016). Educational level was assessed during a home interview and was categorized into seven categories, ranging from primary education only to university level (Hofman et al., 2015).

APOE-ε4 Allele Carriership

APOE-ε4 allele carriership was assessed on coded genomic DNA samples. APOE- genotype was in Hardy- Weinberg equilibrium. APOE-E4 allele carriership was coded positive in case of one or two APOE-E4 alleles (Wenham et al., 1991).

Gait Features

Gait was assessed by three walking tasks over a walkway: “normal walk,” “turn,” and “tandem walk” (heel to toe) (Lahousse et al., 2015). Using a principal component analysis we obtained the following gait factors which we used: rhythm, Variability, Phases, Pace, Base of Support, tandem, and turning (Verlinden et al., 2013).

Baseline Cognitive Function

We included the following objective memory and non-memory cognitive tests: 15-word Learning test immediate and delayed recall (Bleecker et al., 1988), Stroop tests (reading, color-naming and interference) (Golden, 1976; Goethals et al., 2004), the Letter-digit Substitution task (Lezak, 1984), word fluency test (Welsh et al., 1994) and the Purdue Pegboard test (Desrosiers et al., 1995). Subjective cognitive complaints were evaluated by interview. This interview included three questions on memory (difficulty remembering, forgetting what one had planned to do, and difficulty finding words), and three questions on everyday functioning (difficulty managing finances, problems using a telephone, and difficulty getting dressed) (Hoogendam et al., 2014).

Outcome: Definition of Cognitive Decline

A principal component analysis incorporating different cognitive tests was used to calculate a general cognitive factor (g-factor). For cognitive tests with multiple subtasks we chose only one subtask in order to prevent highly correlated tasks distorting the factor loadings. The following cognitive tests were included: color-word interference subtask of the Stroop test (which taps into information processing speed and executive functioning), LDST (testing executive function), verbal fluency test (tapping into executive functioning), delayed recall score of the 15-WLT (testing memory), and Purdue pegboard test (testing fine motor speed). The g-factor was identified as the first unrotated component of the principal component analysis and explained 49.2% of all variance in the cognitive tests. This is a typical amount of variance accounted for by the g-factor (Deary, 2012; Hoogendam et al., 2014). Cognitive decline was defined by the g-factor from the follow-up visit minus the g-factor from the baseline visit resulting in a delta g-factor. Since the follow-up time was not the same for each participant, the delta g-factor was divided by the follow-up time to obtain global cognitive decline per year. Significant global cognitive decline (yes/no) was defined as belonging to the 5% of participants with the highest cognitive decline (delta g-factor) per year. In the used dataset, consisting of 2,542 participants, this resulted in 127 participants with a positive class label.

Evaluation Experiments

Prediction Performance Evaluation

The performance of DSI in predicting occurrence of global cognitive decline was evaluated using cross-validation. The area under the receiver-operator curve (AUC) was determined using 500 repetitions of 2-fold cross-validation (CV) experiments. This means that with each repetition 50% of the study dataset was used for training and the other 50% was used for testing, and vice versa, keeping the class ratio in the training and test set equal. We report the mean AUC, and the uncertainty of the mean expressed by its 95% confidence interval, derived from the 1,000 resulting AUC values. The confidence interval was determined with the corrected resampled t-test for CV estimators of the generalization error (Nadeau and Bengio, 2003). AUCs were considered significantly different if the 95% confidence interval of their difference did not contain zero.

Please note that the sample size was the same for baseline and follow-up, since we constructed a delta g-factor based on two time points. Only the people with cognitive assessment at both baseline and follow up were included in the analysis.

Since global cognitive decline per year is age dependent, we expect that age is an important feature for the prediction. We therefore include age as feature in the model. However, since other features might depend on age, correcting these features might improve the prediction performance (Falahati et al., 2016). We therefore also assessed the prediction performance using age-corrected features. We corrected the non-binary features for age using a linear regression model (Koikkalainen et al., 2012). We evaluated four different models:

(1) age was included and no age-correction was performed on the non-binary features

(2) age was excluded and no age-correction was performed on the non-binary features

(3) age was included and non-binary features, except age, were corrected for age

(4) age was excluded and non-binary features, except age, were corrected for age.

To assess whether the performance of DSI was dependent on the combination of input features, we evaluated various feature combinations. In each cross-validation experiment the feature set was expanded with a feature or category of features. We analyzed four of such cumulative feature sets, differing in the order in which the feature set was expanded. Additionally, we analyzed MRI features separately and a set including all features but age.

Relevance Analysis

To gain insight into the relevance weight that DSI assigns to each feature, we calculated the feature relevance distribution over the 500 repetitions of 2-fold CV, for the top-level feature categories of the hierarchy: age, sex, cognitive tests, cardiovascular risk factors, gait, education, genetics, and MRI features.

Feature Selection on MRI Features

In this study, hundreds of MRI features were extracted from images. It is likely that many of those features are not very efficient in detecting cognitive decline. Typically feature selection is applied to exclude poor features which may induce noise to the classifier. In DSI, weighting with relevance suppresses the effect of such features. If the number of features is high, their cumulative effect may, however, be remarkable. Previous results have shown that when including many features with a low relevance, the performance of DSI may decrease (Pekkala et al., 2017). We therefore included an experiment evaluating the effect of feature selection on MRI features using their relevance. Due to averaging, feature noise reduces in higher levels of the feature hierarchy. The relevance of top- level feature categories may therefore be higher than lower-level, individual features. Therefore, due to the selection on the individual features, the top-level features may drop out, despite their high relevance. To prevent entire top-level feature categories to drop out of the model, we chose to only apply feature selection on the MRI features, which made up 80% of all input features, before selection. The relevance of the MRI features was determined on the entire dataset, before training. MRI features were selected by thresholding the relevance. Subsequently, an AUC distribution was determined in 10 repetitions of 2-fold CV. The following relevance thresholds were chosen: t∈{0.0, 0.01,..., 0.09, 0.1}. For each threshold we assessed three feature sets in which the relevance-based feature selection on the MRI features was applied: (1) all features, (2) all features but age, and (3) MRI features only.

Sub-Group Analyses

As subjects close to the decision boundary (DSI ∼0.5) are more likely to be misclassified, we evaluated classification performance when only accepting/providing the classification for test subjects with low (<0.2) or high (>0.8) DSI. In this way, the subjects with 0.2 < DSI < 0.8 are disregarded, which, in a clinical case, would mean that there is no diagnosis possible for these cases. We computed the AUC of this sub-group for DSI using all available features, both with age-correction and without age-correction. Furthermore, we performed a sensitivity analysis in which the diffusion-MRI of 680 participants in RS cohort II were ignored, because this data was obtained on average 3.47 ± 0.15 years later than the other baseline MRI features.

Results

Table 1 presents the characteristics of the study population. The mean age of the participants was 60.9 ± 9.1 years and 55.6% were females. The absolute decline, given in an average difference in g-factor per year (standard deviation), was −0.25 (0.08) for the positive group (N = 127) and −0.02 (0.07) for the negative group (N = 2415). The threshold at 5% with the steepest decline was set at −0.18.

TABLE 1

Table 1. Baseline features of the study population and their relevances.

Prediction Performance

Figure 2A shows the mean AUC (95% confidence interval) for several combinations of features in predicting global cognitive decline, without correcting the non-binary features for age. Other feature combinations are not shown, since they were not significantly different from the feature combinations shown in Figure 2A. When using only MRI features, the AUC was 0.75 (0.70–0.80). When using only age as baseline feature, the AUC was 0.78 (0.74–0.83). Using additional features on top of age resulted in an equal or slightly lower AUC (differences not statistically significant). When using all available features with DSI, the AUC was 0.77 (0.72–0.82). The mean AUC of DSI without age as baseline predictor was 0.75 (0.70–0.80).

FIGURE 2

Figure 2. (A) Graphs showing the AUC and 95% confidence interval for classification of cognitive decline for different combinations of features in the DSI model. Other feature combinations are not shown, since they were not significantly different from the feature combinations shown in this figure. The features have not been corrected for age. AUC performance for the different combinations of features are shown on the y-axis. Abbreviations are used for the various features: cognitive tests (ct), cardiovascular risk factors (cvr), MRI features (mri), genetics (APOE-E4 carrier-ship) (gen), and educational level (edu). Note that the y-axis scale ranges from 0.65–085. (B) Graphs showing the AUC and 95% confidence interval for classification of cognitive decline for different combinations of features in the DSI model. Other feature combinations are not shown, since they were not significantly different from the feature combinations shown in this figure. The binary features have been corrected for age. AUC performance for the different combinations of features are shown on the y-axis. Abbreviations are used for the various features: cognitive tests (ct), cardiovascular risk factors (cvr), MRI features (mri), genetics (APOE-E4 carrier-ship) (gen), and educational level (edu). Note that the y-axis scale ranges from 0.45–085.

Figure 2B shows the mean AUC (95% confidence interval) for the same combinations of features as in Figure 2A, but here the non-binary features were corrected for age. The AUC for MRI features only was significantly lower with age-correction compared to without age correction, with an AUC of 0.55 (0.50–0.61). For the other feature sets, the AUC of the models where age correction was applied was not statistically significantly different, compared to not using age correction. When the effect of age was totally removed from the model, i.e., model IV, the AUC was 0.65 (0.58–0.73).