- 1Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States
- 2Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
- 3Wisconsin Alzheimer’s Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
- 4School of Psychology, Liverpool John Moores University, Liverpool, United Kingdom
- 5Waisman Laboratory for Brain Imaging and Behavior, University of Wisconsin-Madison, Madison, WI, United States
- 6Department of Medical Physics, University of Wisconsin-Madison, Madison, WI, United States
- 7Geriatric Research Education and Clinical Center, William S. Middleton Veterans Hospital, Madison, WI, United States
- 8Department of Neurology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
Background: Story recall (SR) tests have shown variable sensitivity to rate of cognitive decline in individuals with Alzheimer’s disease (AD) biomarkers. Although SR tasks are typically scored by obtaining a sum of items recalled, item-level analyses may provide additional sensitivity to change and AD processes. Here, we examined the difficulty and discrimination indices of each item from the Logical Memory (LM) SR task, and determined if these metrics differed by recall conditions, story version (A vs. B), lexical categories, serial position, and amyloid status.
Methods: n = 1,141 participants from the Wisconsin Registry for Alzheimer’s Prevention longitudinal study who had item-level data were included in these analyses, as well as a subset of n = 338 who also had amyloid positron emission tomography (PET) imaging. LM data were categorized into four lexical categories (proper names, verbs, numbers, and “other”), and by serial position (primacy, middle, and recency). We calculated difficulty and discriminability/memorability by item, category, and serial position and ran separate repeated measures ANOVAs for each recall condition, lexical category, and serial position. For the subset with amyloid imaging, we used a two-sample t-test to examine whether amyloid positive (Aβ+) and amyloid negative (Aβ−) groups differed in difficulty or discrimination for the same summary metrics.
Results: In the larger sample, items were more difficult (less memorable) in the delayed recall condition across both story A and story B. Item discrimination was higher at delayed than immediate recall, and proper names had better discrimination than any of the other lexical categories or serial position groups. In the subsample with amyloid PET imaging, proper names were more difficult for Aβ+ than Aβ−; items in the verb and “other” lexical categories and all serial positions from delayed recall were more discriminate for the Aβ+ group compared to the Aβ− group.
Conclusion: This study provides empirical evidence that both LM stories are effective at discriminating ability levels and amyloid status, and that individual items vary in difficulty and discrimination by amyloid status, while total scores do not. These results can be informative for the future development of sensitive tasks or composite scores for early detection of cognitive decline.
Introduction
Alzheimer’s disease research studies are increasingly focused on identifying those participants who are at the earliest stages on the continuum of Alzheimer’s disease (AD), when AD pathology is present but cognitive decline is subtle or absent (Arenaza-Urquijo and Vemuri, 2018). It is during this timeframe when treatments are likely to show the most benefit in slowing or preventing AD clinical signs and symptoms (Food and Drug Administration, 2018). To this end, it is important to identify cognitive measures that are highly sensitive to cognitive decline at the preclinical phase. Most long-standing neuropsychological tests used in AD studies were originally designed to detect decline associated with Mild Cognitive Impairment (MCI, often the precursor to dementia) or dementia, but are often insensitive to subtle changes associated with AD pathology when overt symptoms may not be present, but still fall within the normative range (i.e., “preclinical AD”; Mortamais et al., 2017; Jutten et al., 2021). The National Institute on Aging - Alzheimer’s Association (NIA-AA) research framework for Alzheimer’s disease defines this as Stage 2, when cognitive decline may be documented by evidence of subtle decline on longitudinal testing, subjective cognitive complaints, or both (Jessen et al., 2014, 2020; Jack et al., 2018).
Performance on commonly utilized neuropsychological tests is typically described and analyzed by calculating an aggregate of correctly recalled or answered items into a total score. This is true for tests of episodic memory, such as word list learning and memory [e.g., Rey Auditory Verbal Learning Test (R-AVLT); Schmidt, 1996] and non-verbal figure learning and memory [e.g., Brief Visuospatial Memory Test (BVMT); Benedict et al., 1996], as well as for tests of semantic memory such as category fluency tests (e.g., “name as many animals as you can think of in 60 s”) or confrontation naming tasks (e.g., Boston Naming Test; Goodglass and Kaplan, 1983). However, multiple studies have shown that detailed, item-level analyses of these data can provide additional information that is either more sensitive than the total score alone, informative about the underlying mechanisms of task performance in both disease and typical aging, or both. For example, while impairment in category fluency tasks (as measured by total score) is a well-known distinguishing factor between dementia, MCI, and typical aging (Putcha et al., 2020), the mechanisms of this impairment and whether or not the difficulty stems from degradation of the semantic store (i.e., temporal lobe memory functions), or from search and selection retrieval processes (i.e., frontal lobe executive control processes), is under investigation through item-level analyses (Weakley and Schmitter-Edgecombe, 2014; Papp et al., 2016, 2017). Specifically, in category fluency tasks, the kinds of words recalled are analyzed according to subcategories (“clusters”), and the temporal processes of moving from one cluster to the next are referred to as “switches,” with the latter representing the executive control portion of the task and cluster size representing the semantic storage component (Troyer et al., 1998). Other item-level approaches to memory and language testing include measuring the serial position effect in list learning tasks (Bruno et al., 2016, 2018), or analyzing the types of cues needed for naming tasks (phonemic vs. semantic cues; Balthazar et al., 2008; Lin et al., 2014), all with the goal of understanding the basis of dysfunction. A potential primary endpoint for these item-level approaches is the development of more sensitive measures for early detection of cognitive decline based on the patterns of neuropathology and their associated functions.
Recently our group deconstructed another commonly utilized episodic memory test for early detection of decline due to AD: the story recall task, “Logical Memory” from the Wechsler Memory Scale-Revised, stories A and B (WMS-R; Wechsler, 1987). In this task, the participant listens to a story read aloud and is instructed to “tell me everything I read to you, using as close to the same words as you can, begin at the beginning,” immediately after hearing the story, and again after a 30-min delay. In our first paper (Mueller et al., 2020), we examined whether recall of items from stories A and B that belonged to a particular lexical category (proper names, verbs, or numerical expressions) was more likely to be associated with cognitively unimpaired participants at substantially higher risk of AD dementia due to positivity for amyloid-beta (Aβ+) vs. those who were amyloid negative (Aβ−). We found a compelling association between Aβ+ and proper names, such that participants who were Aβ+ were less likely to recall proper names (across stories A and B) at the 30-min delay than those who were Aβ−. We did not find this association with the total score. Interestingly, the two groups did not differ on proper name recall at the immediate delay condition, suggesting a deficit with retrieval and/or storage, but not learning.
Another prior study using data from this cohort examined item-level data from Logical Memory to determine if the serial position of the items’ presentation was associated with progression to clinical MCI or with Aβ+/−. In typical aging, items at the beginning of the list (i.e., primacy items) and items at the end of the list (i.e., recency items) are recalled more easily than items in the middle, but in persons with MCI and dementia, recall of the primacy items tends to be poorer (La Rue et al., 2008; Bruno et al., 2013; Talamonti et al., 2020), and there is a prominent loss of recency recall between immediate and delayed testing (Bruno et al., 2016, 2018). In this second study, we calculated serial position (primacy, middle, and recency; i.e., the end of the story) effects in the Logical Memory story and found a loss of recall for the primacy items from immediate to delayed recall in individuals who progressed to Aβ+ status (Bruno et al., 2020).
Although evidence shows that there is similar sensitivity and specificity in both immediate and delayed recall conditions in discriminating between dementia, MCI, and healthy controls, this prior research evaluated total scores (Weissberger et al., 2017). Similarly, even in nonverbal tasks, participants with AD dementia performed worse on immediate, delayed and recognition tasks than healthy controls or participants with depression (Contador et al., 2010). Furthermore, there is controversy regarding whether rates of encoding (learning) vs. disrupted storage of learned material are the primary deficit in AD dementia (Christensen et al., 1998). This and other previous research have involved patients with clinical impairment (i.e., dementia), and many of these studies have evaluated aggregated scores as opposed to item-level or process scores. It is largely unknown how these memory processes are affected very early in the disease continuum (i.e., at the stage when AD neuropathology is developing but cognition is not clinically impaired, or “preclinical AD”). It is possible that item-level analyses allow for more fine-grained understanding of early cognitive changes.
Neural correlates and neural network theories are compelling explanations as to why we saw a proper name effect in persons who were Aβ+: first, proper name recall has been localized to the inferior anterior temporal lobe (Ross et al., 2010; Semenza, 2011; Fresnoza et al., 2022), adjacent to regions such as the perirhinal and entorhinal cortices, which are sites of early AD neuropathology accumulation (Braak et al., 2011). Second, the neural networks (attributes and similarities that aid in recall) are sparse for names of people and places compared to regular nouns. However, a potential confound exists, in that the Logical Memory task has a high concentration of proper names at the beginning of the two stories (story A and story B). Thus, the need to disambiguate proper name effects from their position in the story is important for understanding the mechanistic principles underlying deficits in story recall due to ADRD. One method for understanding contributing factors to disparate performance on proper name recall between Aβ groups is by examining the item-level difficulty, as was done by Salthouse (2017). In that study, item recall patterns were compared across differing age groups, differing baseline memory ability groups, and groups showing longitudinal decline. The study found uniform differences in item difficulty across age, ability and longitudinal decline groups. The study also included memorability analyses across different serial positions, in which item accuracy in the poorer-performing group was plotted as a function of item accuracy in the better-performing group.
Results showed lower memorability of items in the primacy and recency positions for delayed recall than for immediate recall (Salthouse, 2017). Whether item-level difficulty patterns from story recall differ between groups at increased/decreased risk for Alzheimer’s disease is unknown and has the potential to provide information about sensitive measures for AD-related cognitive decline. By identifying specific items or groups of items that are most sensitive to AD-related decline, shortened versions of tests or automated scoring algorithms can be developed for screening, early detection, and disease monitoring.
The present study had two aims: first, using a large sample of late-middle-aged adults from the Wisconsin Registry for Alzheimer’s Prevention (WRAP; n = 1,141, cognitively unimpaired at baseline), we calculated difficulty and discrimination indices of each item by study visit and recall condition (immediate and delayed) from the Logical Memory story recall task. We then examined whether these metrics differed between recall conditions, story versions (stories A vs. B), lexical categories, or serial position groups. For the second aim, we used the subset that had completed positron emission tomography (PET) amyloid imaging (n = 338) and calculated difficulty and discrimination indices separately for the Aβ+ (n = 79) and Aβ− (n = 259) groups. We then examined whether these metrics differed between Aβ+ and Aβ− groups by recall condition, story version, lexical categories, and serial position groups.
Materials and Methods
Participants
Participants were drawn from WRAP, a longitudinal cohort study enriched for parental history of late-onset sporadic AD (Sager et al., 2005; Johnson et al., 2018). WRAP visits began in 2001; participants are excluded from enrollment if they have a prior diagnosis of dementia or evidence of dementia at baseline testing. The baseline mean age is 54 years, 73% have a parent with AD dementia, and 40% of the total sample are APOE ε4 carriers. Participants complete detailed neuropsychological testing, medical examinations, and health and lifestyle questionnaires at each biennial visit (n = 1778, range of visits = 1–7). To track subtle, preclinical and/or clinically significant decline, WRAP researchers developed a “robust” norms approach in which internal normative distributions for cognitive test scores are generated adjusting for age, sex, and literacy, where the normative group is non-declining over time. An algorithm was created according to the robust norms to “flag” participants who are declining outside the range of the internal norms (1.5 SDs below the robust normative means). The flagged participants’ cognitive test performance, medical history, subjective and informant appraisals of memory, and medical examinations are reviewed and one of four determinations of cognitive status are made, based on NI Aβ-AA criteria (Albert et al., 2011; McKhann et al., 2011; Jack et al., 2018): “cognitively unimpaired—stable,” “cognitively unimpaired—declining,” “MCI,” “Impaired not MCI,” or “dementia.” Further details regarding these approaches are detailed elsewhere (Koscik et al., 2014, 2019; Clark et al., 2016; Jonaitis et al., 2019; Langhough Koscik et al., 2021).
Participants were included in the present study if they were native English speakers, had complete item level data from the Logical Memory test for at least one visit, were clinically unimpaired (no diagnosis of MCI or dementia) at their baseline Logical Memory visit (median = visit 2), were free from neurological disorders at any visit including Parkinson’s disease, multiple sclerosis, stroke, or epilepsy/seizures (Figure 1; n = 1,141). A subset of participants who had completed amyloid PET scans (completed near WRAP visit median = 3) and met the above-described inclusion criteria (n = 338) were used for the second aim. All activities for this study were approved by the University of Wisconsin-Madison Institutional Review Board and completed in accordance with the Declaration of Helsinki.
Figure 1. Flowchart indicating the study analysis inclusion/exclusion criteria applied to the Wisconsin Registry for Alzheimer’s Prevention longitudinal cohort.
Items and Variables From Logical Memory Story Recall
Logical Memory is a story recall subtest from the WMS-R (Wechsler, 1987), a standardized, norm-referenced assessment of learning and episodic memory. Logical Memory was introduced to the WRAP battery at the median visit 2; thus, “baseline” in the present study refers to each participant’s first Logical Memory assessment. Standardized test administration procedures for both stories A and B were followed in accordance with the WMS-R manual. Participants were read the following instructions prior to reading each story verbatim: “I am going to read you a story of just a few lines, and when I am through, tell the story back to me, using as close to the same words as you can remember; you should tell me all you can remember, even if you are not sure.” Participants immediately recalled each story following presentation (immediate recall) and again after a 25–35-min delay (delayed recall). The traditional scoring procedure includes 25 items or “idea units,” which comprise the item-level data used for these analyses. For the lexical categories which are described in detail elsewhere (Mueller et al., 2020), we assigned idea units into one of three lexical categories and summed across the two stories: proper names (n = 9), verbs (n = 14), and numerical expressions (n = 4; from here on, referred to as “numbers”). All other items were characterized as “other” (n = 23). Finally, following Bruno et al. (2020), we defined serial position in the following manner: “primacy” consisted of the first eight items in each story, “middle” included the next nine items, and the last eight items were defined as “recency.”
Difficulty and Discrimination Indices
Item “difficulty” is defined as the proportion of participants who answer an item correctly (Hambleton et al., 1991). The difficulty of each item from Stories A (n = 25) and B (n = 25) from Logical Memory was calculated by dividing the number of correct responses by the total number of responses (n = 50; Crocker and Algina, 1986). A difficulty index between 0.2 and 0.8 is usually considered acceptable (Golden et al., 1984). Item “discrimination” is the extent to which items distinguish between high vs. low performers on the test; item discrimination was calculated by corrected item-total correlations for each item with the remaining items. The acceptable values are 0.2 or higher; the closer to 1, the better the discrimination (Golden et al., 1984). Items with very high or very low difficulty values will therefore often have low discrimination values. For Aim 1, we calculated difficulty and discrimination indices for each item, lexical category, and serial position group for each visit with at least one Logical Memory assessment and used these in analyses described in section “Statistical Analyses.” For Aim 2, we selected the Logical Memory assessment closest to the most recent PET assessment for each person with at least one PET amyloid scan, and we used these values to calculate difficulty and discrimination indices for Aim 2 analyses.
Molecular Neuroimaging
All participants in the Aim 2 analyses underwent a [11C] Pittsburgh compound B (PiB) PET scan on a Siemens EXACT HR+ scanner; PiB processing and quantification methods are described in detail elsewhere (Johnson et al., 2014). A 70-min dynamic acquisition using reference Logan graphical analysis (cerebellum gray matter reference region) was used to estimate the PiB distribution volume ratio (DVR). A previously defined global DVR threshold of >1.19 (Sprecher et al., 2015) was used to dichotomize individuals as amyloid positive or negative (Aβ+/−).
Statistical Analyses
Participant demographics and clinical characteristics are presented overall, as well as by those with vs. without a PET amyloid scan. In the subset with PET amyloid data, the Aβ+ vs. Aβ− groups are described using tests appropriate for the distribution of the variables (e.g., t-tests, chi-square tests, or ANCOVA).
Difficulty and discrimination indices were calculated for each visit as described in “Difficulty and Discrimination Indices” section using “sjPlot”.1 For Aim 1 analyses testing whether item difficulty or discrimination indices differ by recall condition, we conducted repeated measures ANOVAs of the paired item-level differences (immediate minus delayed recall; separate models for differences in difficulty and discrimination), adjusting for repeated measures across visits. We included a story version group variable to test whether paired differences in immediate to delay difficulty or discrimination indices were the same across story versions A and B. We plotted the item difficulty and discrimination differences (mean across visits and by visits) and qualitatively described which items differ most from immediate to delayed condition.
For analyses examining whether each of the two psychometric indices (difficulty and discrimination) differed by story version, lexical category, or serial position within a recall condition, we ran separate repeated measures ANOVAs for immediate recall and delayed recall difficulty and discrimination. After observing that the residuals of the models failed the normality assumption, we reran the analyses using general linear mixed effect models (R package “glmmTMB”; we used R package “DHARMa” to run residual diagnostics for these models). Post hoc analysis (e.g., pairwise comparisons following a significant omnibus test for a group variable with more than two groups) and effect size were calculated by R package “emmeans.”
For Aim 2 analyses testing whether item difficulty or discrimination indices differed by amyloid status, we calculated the item-level difficulty and discrimination indices separately for the Aβ+ and Aβ− groups using the item-level data for the Logical Memory visit closest to the PET PiB scan. To examine whether Aβ+ and Aβ− groups differed in difficulty or discrimination, we used a two-sample t-test if the normality and homogeneity of variances assumptions were satisfied; otherwise, a Mann–Whitney U test was used. We followed this procedure for each recall condition, and within recall condition, for each story version, lexical category, and serial position group. For qualitative inspection of differences, we calculated the paired item-level differences in difficulty and discrimination indices between the Aβ+ and Aβ− groups for each item, story version, and recall condition and then used paired t-tests or Wilcoxon signed rank tests to test whether items within a subset of items differed in difficulty or discrimination between Aβ+ and Aβ− (item subsets for each recall condition included story version, lexical categories, and serial position groups).
For all models, magnitudes of between-group differences were characterized using Cliff’s delta, which were calculated using the “effsize” package in R (Torchiano and Torchiano, 2020). Cliff’s delta is a non-parametric effect size measure that quantifies the amount of difference between two groups of observations beyond the values of p interpretation, which is less susceptible to outliers and skewness than Hedges’ g or Cohen’s d and better in circumstances where the homogeneity of variance assumption does not hold (Cliff, 1993). The magnitude is assessed using the thresholds provided in Romano et al., (2006), i.e., |d| < 0.147 “negligible,” |d| < 0.33 “small,” |d| < 0.474 “medium,” otherwise “large.” Analyses were performed in R 4.0.2. Significance level was set at p < 0.05.
Results
Participant demographics and clinical characteristics are presented overall for the Aim 1 sample (n = 1,141) and overall and by amyloid status for the Aim 2 subsample (n = 338) in Table 1. The overall sample had an average age of 58.6 (SD = 6.6) at the first Logical Memory visit, 6% identified as Black or African American, 92% identified as non-Hispanic White, 2% identified as Hispanic, Asian, Native American/Indian, or other; the sample overall had 16 years of education (SD = 2.3).
Table 1. Demographic and clinical characteristics by total sample and subsample with amyloid imaging.
Aim 1: Difficulty and Discrimination Indices in the Full Sample
Difficulty Indices and Differences Between Recall Condition
Item-level mean difficulty indices across visits for Stories A and B are presented in Figure 2 by immediate (left) and delayed recall (right); colored circles indicate lexical categories, and vertical dotted lines delineate serial position subgroups (Supplementary Figure S1 shows the same, by visit). The triangles in the right-hand panel represent the difference in percent correct between immediate and delayed recall for each item; negative values indicate increased difficulty for delayed relative to immediate recall condition. Qualitatively, items 1 and 2 show the largest drops in proportion correct within each story (i.e., showed the largest increase in item difficulty from immediate to delayed recall). Mean(SD) change in difficulty between immediate and delayed recall was 0.056(0.08), indicating a significant increase in difficulty at delayed recall (generalized linear mixed model adjusting for multiple visits, intercept beta = 0.56; p < 0.001). The change in difficulty between recall conditions did not differ between stories A and B (story version beta = −0.01; p = 0.39).
Figure 2. Item difficulty plots (averaged across visits) according to the serial position (primacy, mid, and recency) as well as the lexical category of the items, by story A and story B. Across the primacy, mid, and recency positions, proper name recall shows a drop in percent correct (increase in difficulty) for both story A and story B. The triangles in the right-hand panels are the mean delayed condition percent correct minus mean immediate percent correct for story A and story B. The horizontal dashed lines are desirable difficulty values (between 0.2 and 0.8). Supplementary Figure S1 shows item difficulties by visit, revealing a consistent pattern across all study visits.
Difficulty Indices: Differences Within Recall Condition Between Story, Serial Position, and Lexical Category
Boxplots of item difficulties are shown separately for immediate and delayed recall conditions in Figure 3 by story (left), lexical category (middle), and serial position group (right). GLMM’s showed that lexical category was a significant predictor of difficulty for both immediate and delayed recall conditions (p < 0.0001; Table 2); serial position group and story version were not significant predictors in either recall condition. Boxplots of item difficulties (Figure 3) depict across-visit mean difficulties by story version, lexical category, and serial position. Post hoc pairwise differences between lexical categories showed significantly lower proportions correct in the “Other” category compared to each of the other lexical categories at both immediate and delayed recall. At delayed recall, proper names were significantly more difficult than Numerical Expressions (Table 2; Figure 3).
Figure 3. Item difficulty plots at all visits according to the story (A and B), serial position (primacy, mid, and recency) as well as the lexical category (proper names, verbs, numbers, and others) of the items, by immediate recall and delayed recall. The corresponding model information is in Table 2. The Y-axis values represent proportion correct (and thus, lower values indicate more difficult items). Post hoc pairwise group differences at unadjusted p < 0.05 noted as *< 0.05, **< 0.01, ***<0.001, and ****<0.0001.
Table 2. GLMM with the difficulty indices for immediate recall and delayed recall predicted by story, lexical category, and serial position.
Item Level Discrimination Indices and Differences Between Recall Condition
Item-level mean discrimination indices across visits for Stories A and B are presented in Figure 4 by immediate (left) and delayed recall (right); colored circles indicate lexical categories and vertical dotted lines delineate serial position subgroups (Supplementary Figure S2 shows the same, by visit). The triangles in the right-hand panel represent the difference in discrimination indices between immediate and delayed recall for each item; positive values indicate increased discrimination for delayed relative to immediate recall condition. Qualitatively, all story A items, and most story B items show an increase in discrimination for the delayed recall condition. Mean(SD) change in discrimination indices between immediate and delayed recall was 0.043(0.05), indicating a significant increase in discrimination at delayed recall (generalized linear mixed model adjusting for multiple visits, intercept beta = 0.22; p < 0.001). The change in discrimination between recall conditions did differ between stories A and B (story version beta = 0.01; p = 0.04), indicating a significant increase in discrimination at story B delayed recall.
Figure 4. Item discrimination plots (averaged across visits) according to the serial position (primacy, mid, and recency) as well as the lexical category of the items, by story A and story B. Higher discrimination values = better discrimination. Across the primacy, mid and recency positions, proper name recall shows an increase in discrimination for both story A and story B. The triangles are the mean difference between recall condition for story A and story B. The horizontal dashed lines are desirable discrimination values (>0.2). Supplementary Figure S2 shows item discrimination by visit, revealing a consistent pattern across all study visits.
Discrimination Indices: Differences Within Recall Condition Between Story, Serial Position, and Lexical Category
Boxplots of item discrimination indices are shown separately for immediate and delayed recall conditions in Figure 5 by story (left), lexical category (middle) and serial position group (right). GLMM’s showed that lexical category was a significant predictor of discrimination for both Immediate and delayed recall conditions (p = 0.012 and p < 0.0001 respectively; Table 3); serial position group were also significant predictors in immediate (p = 0.006) and delayed recall conditions (p = 0.027); story version was a significant predictor in immediate recall condition only (p < 0.001). Boxplots of item discrimination (Figure 5) depict across-visit mean discriminations by story version, lexical category, and serial position. Post hoc pairwise differences between story versions showed significantly higher discriminations in story B at immediate recall, the differences between lexical categories showed lower discriminations in PNs at delayed recall compared to each of the other categories. At immediate recall, PNs discriminated a bit less than the “other” category, too. Verbs had higher discriminations compared to “other” category, and the recency serial position had higher discriminations compared to primacy and mid position at both immediate and delayed recall (Table 3; Figure 5).
Figure 5. Item discrimination plots at all visits according to the story (A and B), serial position (primacy, mid, and recency) as well as the lexical category (proper names, verbs, numbers, and others) of the items, by immediate recall and delayed recall. The corresponding model information is in Table 3. Post hoc pairwise group differences at unadjusted p < 0.05 noted as *< 0.05, **< 0.01, ***<0.001, and ****<0.0001.
Table 3. GLMM with the discrimination indices for immediate recall and delayed recall predicted by story, lexical category and serial position.
Aim 2: Difficulty and Discrimination Indices in PET Subsample
Table 2 shows demographic and clinical characteristics stratified by those individuals who completed PET amyloid scans (n = 338) vs. those who did not (n = 803), as well as by Aβ+ (n = 79, 23%) and Aβ− (n = 259, 77%). Those participants who completed a PET scan had significantly higher WRAT-3 reading standard scores (109 vs. 107), reported more education, and had higher baseline Logical Memory total scores (immediate and delayed) than those who did not complete PET scans. Relative to the Aβ− group, the Aβ+ group was significantly older at Logical Memory baseline (61 vs. 58), had a higher percentage of parental history of AD (85% vs. 71%), and had more APOE-ε4 carriers (69% vs. 30%). Aβ+ did not differ from Aβ− on any of the cognitive measures at baseline.
Difficulty Indices
Figure 6 depicts the difficulty indices by Aβ+ vs. Aβ− for the Logical Memory closest to each person’s last PET scan by story (top = story A and bottom = story B) and recall condition (left = Immediate and right = delayed). Boxplots of item difficulty indices are shown separately for immediate (left) and delayed recall (right) conditions in Figure 7 by story (top), lexical category (middle), and serial position group (below). Descriptive statistics for paired t tests or Wilcoxon signed rank tests are summarized in Table 4; briefly, the difficulty indices of Aβ+ and Aβ− are significantly different in proper names in delayed recall (large Cliff’s delta effect sizes), but not in story versions, other lexical categories, and serial positions both in immediate recall and delayed recall (negligible or small effect sizes).
Figure 6. Item difficulty plots by amyloid status according to the serial position (primacy, mid, and recency) as well as the lexical category of the items, by story A and story B. The colored circles indicate lexical categories, vertical dotted lines delineate serial position subgroups, and line types are Aβ+ and Aβ− groups. The horizontal dashed lines are desirable difficulty values (between 0.2 and 0.8). Overall, the mean(SD) immediate recall difficulty was 0.540(0.22) for the Aβ+ group compared with 0.594(0.23) in the Aβ− group (w = 1425.5; p = 0.24; Cliff’s delta = 0.14). The mean(SD) delayed recall difficulty was 0.485(0.21) for the Aβ+ group compared with 0.545(0.24) in the Aβ− group (w = 1466.5; p = 0.14; Cliff’s delta = 0.17).
Figure 7. Item difficulty plots by amyloid status according to the story (A and B), serial position (primacy, mid, and recency) as well as the lexical category of the items, by immediate recall and delayed recall. *< 0.05, **< 0.01, ***<0.001, and ****<0.0001.
Table 4. The difficulty indices difference between Aβ+ and Aβ− group for immediate recall and delayed recall by story, lexical category, and serial position.
Discrimination Indices
Figure 8 depicts the discrimination indices for the Logical Memory closest to each person’s last PET scan by story (top = story A and bottom = story B) and recall condition (left = Immediate; right = delayed). Boxplots of item discrimination indices are shown separately for immediate (left) and delayed recall (right) conditions in Figure 9 by Story (top), lexical category (middle) and serial position group (bottom). Descriptive statistics for paired t tests or Wilcoxon signed rank tests are summarized in Table 5; briefly, the discrimination indices differed between Aβ+ and Aβ− by story versions, proper names, “other” lexical categories, and all serial positions, with large or medium Cliff’s delta effect sizes.
Figure 8. Item discrimination plots according to the serial position (primacy, mid, and recency) as well as the lexical category of the items, by story A and story B. The colored circles indicate lexical categories, vertical dotted lines delineate serial position subgroups and line types are Aβ+ and Aβ− group. The horizontal dashed lines are desirable discrimination values (>0.2). For immediate recall, the mean(SD) discrimination index was 0.540(0.22) for the Aβ+ group compared with 0.594(0.23) in the Aβ− group (w = 850.5; p = 0.0059; Cliff’s delta = −0.32). For delayed recall, discrimination was 0.485(0.21) for the Aβ+ group compared with 0.545(0.24) in the Aβ− group (w = 530.5; p < 0.0001; Cliff’s delta = −0.58).
Figure 9. Item discrimination plots by amyloid status according to the story (A and B), serial position (primacy, mid, and recency) as well as the lexical category of the items, by immediate recall and delayed recall. *< 0.05, **< 0.01, ***<0.001, and ****<0.0001.
Table 5. The discrimination indices difference between Aβ+ and Aβ− group for immediate recall and delayed recall by story, lexical category, and serial position.
Discussion
The current study investigated the item-level difficulty and discrimination indices from a classic widely used neuropsychological measure to assess episodic memory function, the Logical Memory story recall task from the Wechsler Memory Scale—Revised (Wechsler, 1987). This test was first published in 1945, with revisions in 1987, 1997, and 2009, thus we draw attention to its longevity and long-standing usage in the field of neuropsychology, aging, and cognitive disorders. The indices were calculated for two story versions, A and B, and for the immediate and delayed recall conditions. We further examined items by other process scores, including the lexical categories to which the items belonged (proper names, verbs, and numerical expressions) and the serial position in which the items were presented. Finally, we evaluated the degree to which the process score groupings differed in their difficulty and discrimination between amyloid positive and negative groups. It was anticipated that item difficulty and discrimination would vary by position in the story (serial position) and/or the lexical category to which the item belonged (e.g., proper names and verbs), as well as by amyloid status.
In a large sample with longitudinal Logical Memory data, item difficulty dropped (i.e., became more difficult) by an average of 10% from the immediate to delayed recall across both story A and story B. This drop did not differ between the two story versions. Poorer delayed recall vs. immediate recall is an unsurprising finding, given that the delayed recall of Logical Memory and other learning tasks such as the Auditory Verbal Learning Test (AVLT) have been shown to be sensitive to MCI and dementia, and are included in widely utilized composite scores (Donohue et al., 2014; Knopman et al., 2019). Although several studies have demonstrated that list learning tasks such as AVLT are more sensitive to decline than story recall (Weissberger et al., 2017), the item-level approach we show here may spur renewed interest in evaluating existing measures or implementing new story recall tasks in future AD studies. Because AD treatments are most likely to be beneficial at the earliest stage of disease, it is important to develop more sensitive measures of cognitive decline for clinical trials (Snyder et al., 2014). The Federal Drug Administration has indicated the need for improved outcomes for AD clinical trials, not only for those that are more sensitive to change, but also for those that measure functional abilities (U.S. Department of Health and Human Services, 2018). Story recall tasks have an element of ecological validity that learning a list of 10 unrelated items does not. By developing new story recall scoring metrics or tasks that weigh semantic/lexical properties, serial position, and item difficulty and discrimination, we may be able to increase sensitivity to AD-related cognitive decline, while maximizing an ecologically valid task.
Our findings also highlight that there was no difference in delayed recall item difficulty between story A and story B. Previous studies examining alternate forms of story recall have shown similar diagnostic sensitivity to one another (Cunje et al., 2007). To our knowledge, our study is the first to empirically confirm the similarity in difficulty of items for story A and story B of Logical Memory delayed recall. This finding is important, because many worldwide AD studies are utilizing Logical Memory, administering only Story A, only story B, or both (Toga et al., 2016). Therefore, this empirically derived information may be useful for other studies utilizing (or planning to implement) various forms of Logical Memory in longitudinal, aging cohorts. Moreover, the results presented here offer support for the prospect of using Story A and Story B as alternate versions of one another in a test–retest scenario.
Item difficulty on immediate recall differed between lexical categories, with the “other” category being more difficult than the other three lexical categories (proper names, verbs, and numerical expressions) on both recall conditions. This may relate to the fact that many of the items in the “other” category are less concrete (i.e., imageable), than proper names, nouns, and verbs; for example, the idea unit “the night before” presents as more difficult than the idea unit/verb “robbed.” Furthermore, some of the items with the highest emotional valence tended to be verbs (“had not eaten”); abundant evidence indicates that individuals tend to encode items with emotional valence over those without (Kensinger and Corkin, 2004; Thomas and Hasher, 2006; Satler et al., 2007; Petrican et al., 2008).
We did not see overall differences in item difficulty by their position in the stories, in either immediate or delayed recall. However, there was higher discrimination for items in the recency position as compared to the middle and primacy positions in both the immediate and delayed recall conditions. In other words, more recent items were better discriminated among ability levels than items in the primacy or middle positions. The typical pattern in list learning tasks is that performance is better for stimuli learned at the beginning (primacy) or at the end (recency), as compared with items in the middle (Murdock, 1962), while individuals with mild cognitive impairment or dementia tend to show a pronounced deficit at the recency position when comparing immediate to delayed recall conditions (Carlesimo et al., 1995; Bruno et al., 2016, 2018). The fact that our analyses showed that items in the recency position were best at discriminating between ability levels may reflect differences in underlying cognitive abilities (or decline in abilities) in this at-risk cohort.
Item discrimination was higher at delayed than the immediate recall condition, with Story B having a significantly higher discrimination than Story A. On immediate recall, average item discrimination was higher for Story B compared to A; for “other” compared to proper names. On delayed recall, proper names had better discrimination than each of the other lexical categories. Proper name recall in conversation is a common complaint of older individuals (Burke et al., 1991; Gollan et al., 2005; van Harten et al., 2018), and proper name recall has been shown to decline with age (Maylor and Valentine, 1992; Burke et al., 2004). However, whether there is an age differential in the actual difficulty in learning and recall of proper names vs. other lexical categories in aging is up for debate (Cohen and Faulkner, 1986; Cohen and Burke, 1993; James, 2006). The results of the present study indicate that proper names are better able to discriminate among ability levels than other lexical categories and may provide further evidence for utilizing semantic memory tasks that target proper names for early detection of subtle cognitive decline (Fine et al., 2011; Papp et al., 2014; Rubiño and Andrés, 2018; Alegret et al., 2020).
In the subset with PET amyloid imaging, item-level analyses suggest that all items in the delayed recall condition of Logical Memory (both stories A and B) discriminate well between Aβ+ and Aβ−, which is consistent with reports of the story recall tasks’ sensitivity to stages of cognitive decline and AD pathology, and helps explain why the task is featured in popular AD memory composite scores (Donohue et al., 2014; Knopman et al., 2019). With respect to item difficulty, proper names at delayed recall were significantly more difficult for Aβ+ than Aβ−. This finding is consistent with our previous study showing an association between delayed recall of proper names and amyloid positivity (Mueller et al., 2020). Although most items of both stories in both conditions appear to be more difficult in the Aβ+ group, none of the other lexical categories or any of the serial position difficulty indices were significantly different between the two groups.
Analyses also revealed the items in the verb and “other” lexical categories and all serial positions from delayed recall were more discriminate for the Aβ+ group compared to the Aβ− group. That proper names were not significantly more discriminate than the other lexical categories (but were more difficult) may indicate an earlier “loss” of these items in the Aβ+ group. When applying item response theory to items of the Mini-Mental Status Examination (MMSE; Folstein et al., 1975), Ashford et al. described difficulty as a continuum of ability, and discrimination as how well an item can differentiate between examinees with a range of ability levels. Applying these concepts to the MMSE, difficulty indicates a loss of ability underlying performance, while discrimination is an indicator of how quickly that function is lost, such that high difficulty and low discrimination indicates early loss across a longer range of progression. Items on the MMSE with the highest difficulty and lowest discrimination in that study were the three words at delayed recall (ball, flag, and tree), indicating that delayed memory was the earliest ability lost on the continuum of dementia severity (Ashford et al., 1989). Another item-level analysis of the MMSE-37 in a Spanish speaking population found that language items were among the best at discriminating between groups with dementia and healthy controls (Prieto et al., 2012). Although we did not examine people with dementia, dementia severity, or progression of AD, it is possible that proper name recall is an ability that is particularly vulnerable to early amyloid pathology; future studies can evaluate item sensitivity to estimated age of onset or projected rate of amyloid accumulation using methods developed by our group (Koscik et al., 2020; Betthauser et al., 2021).
Items significantly discriminated between Aβ+ and Aβ− groups, but when comparing amyloid groups using the typical total score from Logical Memory, there were no significant differences [Table 1; mean(SD) Aβ+ = 27(7), Aβ− = 27(6)]. Here, we show that by performing item difficulty and discrimination indices, sensitivity of specific items to Aβ+ may be higher than using the total score alone. By understanding the item’s characteristics and properties, a more sensitive test, or a more sensitive scoring algorithm than total score, can be developed. This approach of utilizing item response theory has been applied toward groups of items from the Mini-Mental Status Examination (Fillenbaum et al., 1994), where sets of four items were able to discriminate among controls, participants with MCI, and those with dementia with high sensitivity and specificity (Fillenbaum et al., 1994). Additionally, item response theory has been used to create new global cognitive function measures from an array of existing measures (Mungas and Reed, 2000; Mungas et al., 2003; Gershon et al., 2010). Because story recall tasks have an ecologically valid component (the task simulates conversations that often need to be recalled later), the development of a more sensitive story that includes types of items that best discriminate among individuals with evidence of AD pathology would make a needed metric for evaluating response to treatment or disease monitoring in clinical trials (Posner et al., 2017).
Strengths of this study include the large sample size, the longitudinal cohort, the subsample with neuroimaging data, and the detailed analysis of item difficulty and discrimination for two different stories of Logical Memory. Further, this is the first study to characterize these indices by amyloid status in a group of cognitively unimpaired individuals.
A limitation of this study is that the lexical categories of the stories are not balanced or equal in scores, which may bias the results. Additionally, the sample is a highly educated (~16 years education), predominantly white (91%), self-selected cohort of individuals at risk for AD; therefore, the results of this work need to be replicated in diverse cohorts to be able to generalize the findings. The number of individuals who are amyloid positive is relatively small compared to those who are amyloid negative (23% positive vs. 77% negative). Although these percentages are representative of the general population at this early stage of AD neuropathological development, i.e., 25%–30% of individuals in this age group are purported to be amyloid positive (Jack et al., 2018), this likely reduces power to detect significant effect sizes. Furthermore, for the amyloid analyses, we selected the Logical Memory test closest to the PET scan for each participant. For the amyloid positive group, the mean difference in time was 1.07 years, for the amyloid negative group, the mean difference was 0.55 years between Logical Memory and PET scan. Although it is unlikely that many participants were on the cusp of amyloid positivity, it is possible that a small number of participants may be very close to the amyloid positivity cutoff. Future analyses that potentially include longitudinal modeling of AD biomarkers may help address this potential confound. Finally, we did not address practice effects in our amyloid models, which may either skew results for some participants, or may miss important differences in others (Jutten et al., 2020). Future analyses will examine whether practice effects vary by amyloid status.
In sum, we provide empirical evidence that both stories of the Logical Memory task are effective at discriminating ability levels, as well as amyloid status, and that individual items vary in difficulty and discrimination by amyloid status, while total scores do not. These results can be informative for the future development of sensitive tasks or composite scores for early detection, disease monitoring, and response to treatment for clinical trials.
Data Availability Statement
The datasets presented in this article are not readily available because data are available through a data request process. Requests to access the datasets should be directed to https://wrap.wisc.edu/data-requests/.
Ethics Statement
The studies involving human participants were reviewed and approved by the University of Wisconsin-Madison Internal Review Board. The patients/participants provided their written informed consent to participate in this study.
Author Contributions
RK, LD, KM, and BH designed the analyses. LD, RK, and KM analyzed the data. SJ, BC, and TB oversaw data collection and data processing. KM, LD, DB, and RK wrote the manuscript. All authors contributed to the article and approved the submitted version.
Funding
This work was funded by the following grants from the National Institutes of Health: NIH 1R01AG070940, R01 AG021155, R01AG027161, R01-AG054059, NIH P50 AG033514, and NIH U54 HD090256 and the following grant from the Alzheimer’s Association: AARF-19-614533.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.908651/full#supplementary-material
Footnotes
References
Albert, M. S., DeKosky, S. T., Dickson, D., Dubois, B., Feldman, H. H., Fox, N. C., et al. (2011). The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 270–279. doi: 10.1016/j.jalz.2011.03.008
Alegret, M., Muñoz, N., Roberto, N., Rentz, D. M., Valero, S., Gil, S., et al. (2020). A computerized version of the short form of the face-name associative memory exam (FACEmemory®) for the early detection of Alzheimer’s disease. Alzheimers Res. Ther. 12, 1–11. doi: 10.1186/s13195-020-00594-6
Arenaza-Urquijo, E. M., and Vemuri, P. (2018). Resistance vs resilience to Alzheimer disease: clarifying terminology for preclinical studies. Neurology 90, 695–703. doi: 10.1212/WNL.0000000000005303
Ashford, J. W., Kolm, P., Colliver, J. A., Bekian, C., and Hsu, L.-N. (1989). Alzheimer patient evaluation and the mini-mental state: item characteristic curve analysis. J. Gerontol. 44, P139–P146. doi: 10.1093/geronj/44.5.P139
Balthazar, M. L., Cendes, F., and Damasceno, B. P. (2008). Semantic error patterns on the Boston naming test in normal aging, amnestic mild cognitive impairment, and mild Alzheimer’s disease: is there semantic disruption? Neuropsychology 22, 703–709. doi: 10.1037/a0012919
Benedict, R. H., Schretlen, D., Groninger, L., Dobraski, M., and Shpritz, B. (1996). Revision of the brief visuospatial memory test: studies of normal performance, reliability, and validity. Psychol. Assess. 8, 145–153. doi: 10.1037/1040-3590.8.2.145
Betthauser, T. J., Bilgel, M., Koscik, R. L., Jedynak, B. M., An, Y., Kellett, K. A., et al., (2021). Multi-method investigation of factors influencing amyloid onset and impairment in three cohorts. medRxiv [Preprint].
Braak, H., Thal, D. R., Ghebremedhin, E., and Del Tredici, K. (2011). Stages of the pathologic process in Alzheimer disease: age categories from 1 to 100 years. J. Neuropathol. Exp. Neurol. 70, 960–969. doi: 10.1097/NEN.0b013e318232a379
Bruno, D., Koscik, R. L., Woodard, J. L., Pomara, N., and Johnson, S. C. (2018). The recency ratio as predictor of early MCI. Int. Psychogeriatr. 30, 1883–1888. doi: 10.1017/s1041610218000467
Bruno, D., Mueller, K. D., Betthauser, T., Chin, N., Engelman, C. D., Christian, B., et al. (2020). Serial position effects in the logical memory test: loss of primacy predicts amyloid positivity. J. Neuropsychol. 15, 448–461. doi: 10.1111/jnp.12235
Bruno, D., Reichert, C., and Pomara, N. (2016). The recency ratio as an index of cognitive performance and decline in elderly individuals. J. Clin. Exp. Neuropsychol. 38, 967–973. doi: 10.1080/13803395.2016.1179721
Bruno, D., Reiss, P. T., Petkova, E., Sidtis, J. J., and Pomara, N. (2013). Decreased recall of primacy words predicts cognitive decline. Arch. Clin. Neuropsychol. 28, 95–103. doi: 10.1093/arclin/acs116
Burke, D. M., Locantore, J. K., Austin, A. A., and Chae, B. (2004). Cherry pit primes Brad Pitt: homophone priming effects on young and older adults’ production of proper names. Psychol. Sci. 15, 164–170. doi: 10.1111/j.0956-7976.2004.01503004.x
Burke, D. M., MacKay, D. G., Worthley, J. S., and Wade, E. (1991). On the tip of the tongue: what causes word finding failures in young and older adults? J. Mem. Lang. 30, 542–579. doi: 10.1016/0749-596X(91)90026-G
Carlesimo, G. A., Sabbadini, M., Fadda, L., and Caltagirone, C. (1995). Different components in word-list forgetting of pure amnesics, degenerative demented and healthy subjects. Cortex 31, 735–745. doi: 10.1016/s0010-9452(13)80024-x
Christensen, H., Kopelman, M. D., Stanhope, N., Lorentz, L., and Owen, P. (1998). Rates of forgetting in Alzheimer dementia. Neuropsychologia 36, 547–557. doi: 10.1016/S0028-3932(97)00116-4
Clark, L. R., Koscik, R. L., Nicholas, C. R., Okonkwo, O. C., Engelman, C. D., Bratzke, L. C., et al. (2016). Mild cognitive impairment in late middle age in the Wisconsin registry for Alzheimer’s prevention study: prevalence and characteristics using robust and standard neuropsychological normative data. Arch. Clin. Neuropsychol. 31, 675–688. doi: 10.1093/arclin/acw024
Cliff, N. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 114:494.
Cohen, G., and Burke, D. M. (1993). Memory for proper names: a review. Memory 1, 249–263. doi: 10.1080/09658219308258237
Cohen, G., and Faulkner, D. (1986). Memory for proper names: age differences in retrieval. Br. J. Dev. Psychol. 4, 187–197. doi: 10.1111/j.2044-835X.1986.tb01010.x
Contador, I., Fernández-Calvo, B., Cacho, J., Ramos, F., and Lopez-Rolon, A. (2010). Nonverbal memory tasks in early differential diagnosis of Alzheimer’s disease and unipolar depression. Appl. Neuropsychol. 17, 251–261. doi: 10.1080/09084282.2010.525098
Crocker, L., and Algina, J. (1986). Introduction to Classical and Modern Test Theory. Mason, Ohio: Cengage Learning.
Cunje, A., Molloy, D. W., Standish, T. I., and Lewis, D. L. (2007). Alternate forms of logical memory and verbal fluency tasks for repeated testing in early cognitive changes. Int. Psychogeriatr. 19, 65–75. doi: 10.1017/s1041610206003425
Donohue, M. C., Sperling, R. A., Salmon, D. P., Rentz, D. M., Raman, R., Thomas, R. G., et al. (2014). The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol. 71, 961–970. doi: 10.1001/jamaneurol.2014.803
Fillenbaum, G. G., Wilkinson, W. E., Welsh, K. A., and Mohs, R. C. (1994). Discrimination between stages of Alzheimer’s disease with subsets of mini-mental state examination items: an analysis of consortium to establish a registry for Alzheimer’s disease data. Arch. Neurol. 51, 916–921. doi: 10.1001/archneur.1994.00540210088017
Fine, E. M., Delis, D. C., Paul, B. M., and Filoteo, J. V. (2011). Reduced verbal fluency for proper names in nondemented patients with Parkinson’s disease: a quantitative and qualitative analysis. J. Clin. Exp. Neuropsychol. 33, 226–233. doi: 10.1080/13803395.2010.507185
Folstein, M. F., Folstein, S. E., and McHugh, P. R. (1975). “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12, 189–198. doi: 10.1016/0022-3956(75)90026-6
Food and Drug Administration (2018). Early Alzheimer’s Disease: Developing Drugs for Treatment: Guidance for Industry. Food and Drug Administration.
Fresnoza, S., Mayer, R.-M., Schneider, K. S., Christova, M., Gallasch, E., and Ischebeck, A. (2022). Modulation of proper name recall by transcranial direct current stimulation of the anterior temporal lobes. Sci. Rep. 12, 1–13. doi: 10.1038/s41598-022-09781-x
Galvin, J. E. (2015). The quick dementia rating system (QDRS): a rapid dementia staging tool. Alzheimers Dement. 1, 249–259. doi: 10.1016/j.dadm.2015.03.003
Gershon, R. C., Cella, D., Fox, N. A., Havlik, R. J., Hendrie, H. C., and Wagster, M. V. (2010). Assessment of neurological and behavioural function: the NIH toolbox. Lancet Neurol. 9, 138–139. doi: 10.1016/S1474-4422(09)70335-7
Golden, C. J., Sawicki, R. F., and Franzen, M. D. (1984). Assessment and Test Construction. Research Methods in Clinical Psychology. eds. A. S. Bellack and M. Hersen (New York: Pergamon Press).
Gollan, T. H., Montoya, R. I., and Bonanni, M. P. (2005). Proper names get stuck on bilingual and monolingual speakers’ tip of the tongue equally often. Neuropsychology 19, 278–287. doi: 10.1037/0894-4105.19.3.278
Goodglass, H., and Kaplan, E. (1983). Boston Diagnostic Aphasia Examination Booklet. Philidelphia, PA: Lea & Febiger.
Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of Item Response Theory. New York, NY: SAGE Publication.
Jack, C. R. Jr., Bennett, D. A., Blennow, K., Carrillo, M. C., Dunn, B., Haeberlein, S. B., et al. (2018). NIA-AA research framework: toward a biological definition of Alzheimer’s disease. Alzheimers Dement. 14, 535–562. doi: 10.1016/j.jalz.2018.02.018
James, L. E. (2006). Specific effects of aging on proper name retrieval: now you see them, now you don’t. J. Gerontol. B Psychol. Sci. Soc. Sci. 61, P180–P183. doi: 10.1093/geronb/61.3.P180
Jessen, F., Amariglio, R. E., Buckley, R. F., van der Flier, W. M., Han, Y., Molinuevo, J. L., et al. (2020). The characterisation of subjective cognitive decline. Lancet Neurol. 19, 271–278. doi: 10.1016/s1474-4422(19)30368-0
Jessen, F., Amariglio, R. E., van Boxtel, M., Breteler, M., Ceccaldi, M., Chetelat, G., et al. (2014). A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease. Alzheimers Dement. 10, 844–852. doi: 10.1016/j.jalz.2014.01.001
Johnson, S. C., Christian, B. T., Okonkwo, O. C., Oh, J. M., Harding, S., Xu, G., et al. (2014). Amyloid burden and neural function in people at risk for Alzheimer’s disease. Neurobiol. Aging 35, 576–584. doi: 10.1016/j.neurobiolaging.2013.09.028
Johnson, S. C., Koscik, R. L., Jonaitis, E. M., Clark, L. R., Mueller, K. D., Berman, S. E., et al. (2018). The Wisconsin registry for Alzheimer’s prevention: A review of findings and current directions. Alzheimers Dement. 10, 130–142. doi: 10.1016/j.dadm.2017.11.007
Jonaitis, E. M., Koscik, R. L., Clark, L. R., Ma, Y., Betthauser, T. J., Berman, S. E., et al. (2019). Measuring longitudinal cognition: individual tests versus composites. Alzheimers Dement. 11, 74–84. doi: 10.1016/j.dadm.2018.11.006
Jutten, R. J., Grandoit, E., Foldi, N. S., Sikkes, S. A. M., Jones, R. N., Choi, S. E., et al. (2020). Lower practice effects as a marker of cognitive performance and dementia risk: A literature review. Alzheimers Dement. 12:e12055. doi: 10.1002/dad2.12055
Jutten, R. J., Sikkes, S. A. M., Amariglio, R. E., Buckley, R. F., Properzi, M. J., Marshall, G. A., et al. (2021). Identifying sensitive measures of cognitive decline at different clinical stages of Alzheimer’s disease. J. Int. Neuropsychol. Soc. 27, 426–438. doi: 10.1017/S1355617720000934
Kensinger, E. A., and Corkin, S. (2004). Two routes to emotional memory: distinct neural processes for valence and arousal. Proc. Natl. Acad. Sci. 101, 3310–3315. doi: 10.1073/pnas.0306408101
Knopman, D. S., Lundt, E. S., Therneau, T. M., Vemuri, P., Lowe, V. J., Kantarci, K., et al. (2019). Entorhinal cortex tau, amyloid-β, cortical thickness and memory performance in non-demented subjects. Brain 142, 1148–1160. doi: 10.1093/brain/awz025
Koscik, R. L., Betthauser, T. J., Jonaitis, E. M., Allison, S. L., Clark, L. R., Hermann, B. P., et al. (2020). Amyloid duration is associated with preclinical cognitive decline and tau PET. Alzheimers Dement. 12:e12007. doi: 10.1002/dad2.12007
Koscik, R. L., Jonaitis, E. M., Clark, L. R., Mueller, K. D., Allison, S. L., Gleason, C. E., et al. (2019). Longitudinal standards for mid-life cognitive performance: identifying abnormal within-person changes in the Wisconsin registry for Alzheimer’s prevention. J. Int. Neuropsychol. Soc. 25, 1–14. doi: 10.1017/S1355617718000929
Koscik, R. L., La Rue, A., Jonaitis, E. M., Okonkwo, O. C., Johnson, S. C., Bendlin, B. B., et al. (2014). Emergence of mild cognitive impairment in late middle-aged adults in the Wisconsin registry for Alzheimer’s prevention. Dement. Geriatr. Cogn. Disord. 38, 16–30. doi: 10.1159/000355682
Langhough Koscik, R., Hermann, B. P., Allison, S., Clark, L. R., Jonaitis, E. M., Mueller, K. D., et al. (2021). Validity evidence for the research category, “cognitively unimpaired—declining,” as a risk marker for mild cognitive impairment and Alzheimer’s disease. Front. Aging Neurosci. 13:688478. doi: 10.3389/fnagi.2021.688478
La Rue, A., Hermann, B., Jones, J. E., Johnson, S., Asthana, S., and Sager, M. A. (2008). Effect of parental family history of Alzheimer’s disease on serial position profiles. Alzheimers. Dement. 4, 285–290. doi: 10.1016/j.jalz.2008.03.009
Lin, C. Y., Chen, T. B., Lin, K. N., Yeh, Y. C., Chen, W. T., Wang, K. S., et al. (2014). Confrontation naming errors in Alzheimer’s disease. Dement. Geriatr. Cogn. Disord. 37, 86–94. doi: 10.1159/000354359
Maylor, E. A., and Valentine, T. (1992). Linear and nonlinear effects of aging on categorizing and naming faces. Psychol. Aging 7, 317–323. doi: 10.1037/0882-7974.7.2.317
McKhann, G. M., Knopman, D. S., Chertkow, H., Hyman, B. T., Jack, C. R. Jr., Kawas, C. H., et al. (2011). The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 263–269. doi: 10.1016/j.jalz.2011.03.005
Morris, J. C. (1997). Clinical dementia rating: a reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int. Psychogeriatr. 9, 173–176. doi: 10.1017/S1041610297004870
Mortamais, M., Ash, J. A., Harrison, J., Kaye, J., Kramer, J., Randolph, C., et al. (2017). Detecting cognitive changes in preclinical Alzheimer’s disease: a review of its feasibility. Alzheimers Dement. 13, 468–492. doi: 10.1016/j.jalz.2016.06.2365
Mueller, K. D., Koscik, R. L., Du, L., Bruno, D., Jonaitis, E. M., Koscik, A. Z., et al. (2020). Proper names from story recall are associated with beta-amyloid in cognitively unimpaired adults at risk for Alzheimer’s disease. Cortex 131, 137–150. doi: 10.1016/j.cortex.2020.07.008
Mungas, D., and Reed, B. R. (2000). Application of item response theory for development of a global functioning measure of dementia with linear measurement properties. Stat. Med. 19, 1631–1644. doi: 10.1002/(SICI)1097-0258(20000615/30)19:11/12<1631::AID-SIM451>3.0.CO;2-P
Mungas, D., Reed, B. R., and Kramer, J. H. (2003). Psychometrically matched measures of global cognition, memory, and executive function for assesment of cognitive decline in older persons. Neuropsychology 17, 380–392. doi: 10.1037/0894-4105.17.3.380
Murdock, B. B. Jr. (1962). The serial position effect of free recall. J. Exp. Psychol. 64, 482–488. doi: 10.1037/h0045106
Papp, K. V., Amariglio, R. E., Dekhtyar, M., Roy, K., Wigman, S., Bamfo, R., et al. (2014). Development of a psychometrically equivalent short form of the face–name associative memory exam for use along the early Alzheimer’s disease trajectory. Clin. Neuropsychol. 28, 771–785. doi: 10.1080/13854046.2014.911351
Papp, K. V., Mormino, E. C., Amariglio, R. E., Munro, C., Dagley, A., Schultz, A. P., et al. (2016). Biomarker validation of a decline in semantic processing in preclinical Alzheimer’s disease. Neuropsychology 30, 624–630. doi: 10.1037/neu0000246
Papp, K. V., Rentz, D. M., Orlovsky, I., Sperling, R. A., and Mormino, E. C. (2017). Optimizing the preclinical Alzheimer’s cognitive composite with semantic processing: The PACC5. Alzheimers Dement. 3, 668–677. doi: 10.1016/j.trci.2017.10.004
Petrican, R., Moscovitch, M., and Schimmack, U. (2008). Cognitive resources, valence, and memory retrieval of emotional events in older adults. Psychol. Aging 23, 585–594. doi: 10.1037/a0013176
Posner, H., Curiel, R., Edgar, C., Hendrix, S., Liu, E., Loewenstein, D. A., et al. (2017). Outcomes assessment in clinical trials of Alzheimer’s disease and its precursors: readying for short-term and long-term clinical trial needs. Innov. Clin. Neurosci. 14, 22–29.
Prieto, G., Contador, I., Tapias-Merino, E., Mitchell, A. J., and Bermejo-Pareja, F. (2012). The Mini-Mental-37 test for dementia screening in the Spanish population: an analysis using the Rasch model. Clin. Neuropsychol. 26, 1003–1018. doi: 10.1080/13854046.2012.704945
Putcha, D., Dickerson, B. C., Brickhouse, M., Johnson, K. A., Sperling, R. A., and Papp, K. V. (2020). Word retrieval across the biomarker-confirmed Alzheimer’s disease syndromic spectrum. Neuropsychologia 140:107391. doi: 10.1016/j.neuropsychologia.2020.107391
Romano, J., Kromrey, J. D., Coraggio, J., Skowronek, J., and Devine, L. (2006). “Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’sd indices the most appropriate choices.” In Annual Meeting of the Southern Association for Institutional Research. Citeseer, 1–51.
Ross, L. A., McCoy, D., Wolk, D. A., Coslett, H. B., and Olson, I. R. (2010). Improved proper name recall by electrical stimulation of the anterior temporal lobes. Neuropsychologia 48, 3671–3674. doi: 10.1016/j.neuropsychologia.2010.07.024
Rubiño, J., and Andrés, P. (2018). The face-name associative memory test as a tool for early diagnosis of Alzheimer’s disease. Front. Psychol. 9:1464. doi: 10.3389/fpsyg.2018.01464
Sager, M. A., Hermann, B., and La Rue, A. (2005). Middle-aged children of persons with Alzheimer’s disease: APOE genotypes and cognitive function in the Wisconsin registry for Alzheimer’s prevention. J. Geriatr. Psychiatry Neurol. 18, 245–249. doi: 10.1177/0891988705281882
Salthouse, T. A. (2017). Item analyses of memory differences. J. Clin. Exp. Neuropsychol. 39, 326–335. doi: 10.1080/13803395.2016.1226267
Satler, C., Garrido, L., Sarmiento, E., Leme, S., Conde, C., and Tomaz, C. (2007). Emotional arousal enhances declarative memory in patients with Alzheimer’s disease. Acta Neurol. Scand. 116, 355–360. doi: 10.1111/j.1600-0404.2007.00897.x
Schmidt, M. (1996). Rey Auditory Verbal Learning Test: A Handbook. Western Psychological Services Los Angeles, CA.
Semenza, C. (2011). Naming with proper names: the left temporal pole theory. Behav. Neurol. 24, 277–284. doi: 10.1155/2011/650103
Snyder, P. J., Kahle-Wrobleski, K., Brannan, S., Miller, D. S., Schindler, R. J., DeSanti, S., et al. (2014). Assessing cognition and function in Alzheimer’s disease clinical trials: do we have the right tools? Alzheimers Dement. 10, 853–860. doi: 10.1016/j.jalz.2014.07.158
Sprecher, K. E., Bendlin, B. B., Racine, A. M., Okonkwo, O. C., Christian, B. T., Koscik, R. L., et al. (2015). Amyloid burden is associated with self-reported sleep in nondemented late middle-aged adults. Neurobiol. Aging 36, 2568–2576. doi: 10.1016/j.neurobiolaging.2015.05.004
Talamonti, D., Koscik, R., Johnson, S., and Bruno, D. (2020). Predicting early mild cognitive impairment with free recall: the primacy of primacy. Arch. Clin. Neuropsychol. 35, 133–142. doi: 10.1093/arclin/acz013
Thomas, R. C., and Hasher, L. (2006). The influence of emotional valence on age differences in early processing and memory. Psychol. Aging 21, 821–825. doi: 10.1037/0882-7974.21.4.821
Toga, A. W., Neu, S. C., Bhatt, P., Crawford, K. L., and Ashish, N. (2016). The global Alzheimer’s Association interactive network. Alzheimers Dement. 12, 49–54. doi: 10.1016/j.jalz.2015.06.1896
Troyer, A. K., Moscovitch, M., Winocur, G., Alexander, M. P., and Stuss, D. (1998). Clustering and switching on verbal fluency: The effects of focal frontal- and temporal-lobe lesions. Neuropsychologia 36, 499–504. doi: 10.1016/S0028-3932(97)00152-8
U.S. Department of Health and Human Services (2018). U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation Early Alzheimer’s Disease: Developing Drugs For Treatment, Guidelines for Industry. Available at: https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM596728.pdf
van Harten, A. C., Mielke, M. M., Swenson-Dravis, D. M., Hagen, C. E., Edwards, K. K., Roberts, R. O., et al. (2018). Subjective cognitive decline and risk of MCI: the Mayo Clinic study of aging. Neurology 91, e300–e312. doi: 10.1212/WNL.0000000000005863
Weakley, A., and Schmitter-Edgecombe, M. (2014). Analysis of verbal fluency ability in Alzheimer’s disease: the role of clustering, switching and semantic proximities. Arch. Clin. Neuropsychol. 29, 256–268. doi: 10.1093/arclin/acu010
Weissberger, G. H., Strong, J. V., Stefanidis, K. B., Summers, M. J., Bondi, M. W., and Stricker, N. H. (2017). Diagnostic accuracy of memory measures in Alzheimer’s dementia and mild cognitive impairment: a systematic review and meta-analysis. Neuropsychol. Rev. 27, 354–388. doi: 10.1007/s11065-017-9360-6
Keywords: Alzheimer’s disease, mild cognitive impairment, language, dementia, positron emission tomography, amyloid beta, cognitive decline and dementia
Citation: Mueller KD, Du L, Bruno D, Betthauser T, Christian B, Johnson S, Hermann B and Koscik RL (2022) Item-Level Story Recall Predictors of Amyloid-Beta in Late Middle-Aged Adults at Increased Risk for Alzheimer’s Disease. Front. Psychol. 13:908651. doi: 10.3389/fpsyg.2022.908651
Edited by:
Matteo De Marco, Brunel University London, United KingdomReviewed by:
Israel Contador, University of Salamanca, SpainManuel Fuentes Casañ, Caritas-Klinik Dominikus, Germany
Copyright © 2022 Mueller, Du, Bruno, Betthauser, Christian, Johnson, Hermann and Koscik. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kimberly D. Mueller, a2RtdWVsbGVyQHdpc2MuZWR1
†These authors share first authorship