Older adults recover more marginal knowledge and use feedback more effectively than younger adults: evidence using “I don’t know” vs. “I don’t remember” for general knowledge questions

Umanath, Sharda; Barrett, Talia E.; Kim, Stacy; Walsh, Cole A.; Coane, Jennifer H.

doi:10.3389/fpsyg.2023.1145278

ORIGINAL RESEARCH article

Front. Psychol., 31 May 2023

Sec. Psychology of Aging

Volume 14 - 2023 | https://doi.org/10.3389/fpsyg.2023.1145278

Older adults recover more marginal knowledge and use feedback more effectively than younger adults: evidence using “I don’t know” vs. “I don’t remember” for general knowledge questions

Sharda Umanath¹^*

Talia E. Barrett²

Stacy Kim²

Cole A. Walsh²

Jennifer H. Coane²

¹Department of Psychological Science, Claremont McKenna College, Claremont, CA, United States
²Department of Psychology, Colby College, Waterville, ME, United States

Through three experiments, we examined older and younger adults’ metacognitive ability to distinguish between what is not stored in the knowledge base versus merely inaccessible. Difficult materials were selected to test this ability when retrieval failures were very frequent. Of particular interest was the influence of feedback (and lack thereof) in potential new learning and recovery of marginal knowledge across age groups. Participants answered short-answer general knowledge questions, responding “I do not know” (DK) or “I do not remember” (DR) when retrieval failed. After DKs, performance on a subsequent multiple-choice (Exp. 1) and short-answer test following correct-answer feedback (Exp. 2) was lower than after DRs, supporting self-reported not remembering reflects failures of accessibility whereas not knowing captures a lack of availability. Yet, older adults showed a tendency to answer more DK questions correctly on the final tests than younger adults. Experiment 3 was a replication and extension of Experiment 2 including two groups of online participants in which one group was not provided correct answer feedback during the initial short-answer test. This allowed us to examine the degree to which any new learning and recovery of access to marginal knowledge was occurring across the age groups. Together, the findings indicate that (1) metacognitive awareness regarding underlying causes of retrieval failures is maintained across different distributions of knowledge accessibility, (2) older adults use correct answer feedback more effectively than younger adults, and (3) in the absence of feedback, older adults spontaneously recover marginal knowledge.

Introduction

Older adults (OAs) typically report a decline in their ability to learn and remember information (e.g., Hertzog and Dixon, 1994). Behavioral data bear out this subjective experience (Balota et al., 2000). Though many aspects of memory decline even in healthy aging, such as explicit memory for specific events (episodic memory), knowledge remains intact and even expands into very old age (Park, 2000). Knowledge includes vocabulary, schemas, facts, and general knowledge about the world. This knowledge influences OAs’ remembering in a variety of ways, sometimes bolstering their accurate remembering and sometimes leading them astray (for a review, see Umanath and Marsh, 2014). OAs also experience more retrieval-related difficulties, reporting more word-finding failures and tip-of-the-tongue states (TOTs; Burke et al., 1991) than do younger adults (YAs). Similar to memory overall, OAs generally perform as well as YAs on metamemory judgments concerning semantic memory or general knowledge (Morson et al., 2015). In contrast, on some episodic tasks, OAs often do not calibrate as accurately as YAs (Souchay et al., 2007), though the literature is mixed (Hertzog and Dunlosky, 2011).

Metacognitive functioning is essential for guiding behavior: Knowing what one knows or does not know enables an individual to determine behaviors. It is also fundamental for learning, in that different efforts, resources, and strategies may be needed based on the basic understanding of whether information is stored in memory or not. That is, what is the cause of this retrieval failure? Is it that I have never learned this and need to now allocate resources to do so, or is it stored but temporarily inaccessible to me? Such metacognition related to the experience of retrieval failures is of particular interest here. Most of the commonly used measures of metamemory do not clearly discriminate between causes of retrieval failure. Specifically, the answer to a question might simply not be stored in memory (i.e., it is unavailable), or it could be stored, but not retrievable at the moment (i.e., it is inaccessible; Tulving and Pearlstone, 1966). Most measures of metamemory tend to ask participants to rate, on a numerical scale, the extent to which they believe they know the answer to a question they cannot answer in the moment. One major example is the “feeling of knowing” measure (Hart, 1965; Koriat, 1993): Participants report how likely they would be to recognize the answer to a question they cannot recall. Such measures have been used extensively and have shown consistent evidence for the strengths and weaknesses in OAs’ memory and metamemory. However, these measures of metacognition do not explicitly determine the “cause” of the failure—the numerical value assigned is assumed to reflect a probability of retrieval, but does not inform the researcher of the underlying phenomenological experience of the participant or of their evaluation as to whether the information is unavailable or inaccessible.

In Coane and Umanath (2019), a novel metamemory tool was developed and tested, capitalizing on natural language use. In that work, OAs’ and YAs’ metacognitive distinction between what is not stored in memory versus merely inaccessible was examined. Knowledge that is merely inaccessible and can be recovered has been termed “marginal knowledge” (Berger et al., 1999). In an initial task, participants answered general knowledge questions in a short-answer format, responding “I don’t know” (DK) or “I don’t remember” (DR) when retrieval failed. When given an opportunity to answer the questions again in a final multiple-choice test, items that had been identified as DR were recognized better than those that had been identified as DK. Critically, OAs and YAs performed quite similarly. These results suggested that DR is associated with a failure in accessibility, whereas DK with a failure in availability. Qualitative analyses of participants’ definitions of what they meant when they used these terms confirmed the underlying phenomenological distinction between not remembering and not knowing. Indeed, both these empirical and qualitative findings have been replicated with other materials (Umanath et al., 2023; see also Lukasik et al., 2020), providing support for the reliability and validity of participant usage of DR and DK to capture the phenomenological experiences of a lack of accessibility versus availability.

However, one anomalous finding in Coane and Umanath (2019) emerged in their Experiment 3, when the final test was a short-answer task and correct answer feedback had been presented during the initial task. Specifically, OAs ostensibly underestimated their knowledge, as reflected by recovery of a high number of items that they originally claimed they did not know (DK—were not available in memory), far exceeding chance performance. That is, after initially identifying these items as not known, OAs answered several of these questions correctly on the final test, more so than did YAs. In fact, final test accuracy for DR and DK items were similar in OAs. There are several possible explanations for such a finding, including but not limited to OAs making a metacognitive error and underestimating the content of their knowledge bases. For example, OAs could be using a potential “face saving” mechanism, whereby admitting lack of knowledge might be less threatening than a retrieval failure (Smith and Clark, 1993), they may have more sophisticated guessing strategies than YAs (particularly when faced with multiple-choice questions; Cyr and Anderson, 2015), they may have more related knowledge with which to integrate new learning from the correct answer feedback (Sitzman et al., 2020), and/or they could experience fluctuations in knowledge accessibility with information coming in and out of accessible range, so to speak. Indeed, with TOTs, OAs are likely to spontaneously recover the answer later, showing “pop-ups” and generally recovering more answers if given more time (Cohen and Faulkner, 1986; Burke et al., 1991). Even across an hour, OAs gained access to more previously non-retrievable general knowledge than did YAs (Umanath, 2016).

Here, through three experiments, we aimed to explore the underlying causes of OAs’ recovery of knowledge that was previously inaccessible—their marginal knowledge. We address two related questions: First, does the overall range of difficulty of the questions matter? Relatively easy questions, like those used in Coane and Umanath (2019), might have led participants, particularly OAs, to rely more on the phenomenological experience of ease of retrieval in making their judgments. Such shifts in comparative evaluation based on phenomenology would be consistent with variability in performance observed in the Remember/Know paradigm typically used in episodic recognition tasks. In that case, performance varies based on what labels (and definitions) are provided to participants (Geraci and McCabe, 2006; Geraci et al., 2009; Williams and Lindsay, 2019). In our previous work, the majority of items were well-known to OAs, potentially resulting in a shift in participants’ phenomenological experiences and skewing their responses. That is, OAs answered over 60% of the questions correctly when they were first presented, leaving few responses to fall in the DR and DK categories. It is possible that OAs in Coane and Umanath (2019) misjudged the contents of their knowledge because of this systematic bias in the selected materials. Specifically, when the majority of items are successfully retrieved and/or perceived as easy, a slightly less familiar item (i.e., more difficult) might be judged as not remembered because it cannot be accessed immediately. An item that is even less familiar might be judged as not known, not because it is unavailable, but because its accessibility is judged relative to the easy retrieval of other items. Alternatively, the relatively low number of DR and DK responses OAs provided overall (ranging across experiments from 0.11 to 0.16) means that they simply had fewer answers to remember or learn compared to YAs. Therefore, in Experiments 1 and 2, we replicated Coane and Umanath (2019) with normatively difficult questions that were selected to test usability of DR and DK to distinguish accessibility- versus availability-based retrieval failures when retrieval failures are very frequent. Specifically, what happens when all items require more extensive searches through the knowledge base? On the one hand, the difference between what is not accessible (DR) and what is not available (DK) could disappear if the resulting phenomenological experiences are essentially “compressed” due to difficulty. On the other hand, the distinction could also become more salient as the internal comparisons of phenomenology shift away from anchoring on the mental experience of successful retrieval. Simply put, the general knowledge questions used in all studies were more difficult, to extend the search and retrieval space. Thus, this is a strong test of the reliability and generalizability of the DR/DK method.

The second question focused specifically on the role of correct answer feedback in potential new learning and recovery of access to stored general knowledge. For new learning, both basic empirical research and applied research in educational settings highlight the importance of corrective feedback, especially when initial errors are made (Kornell et al., 2009; see Metcalfe, 2017, for a review). The benefits of corrective feedback are especially powerful when the error is semantically related to the correct answer (Kang et al., 2011; Huelser and Metcalfe, 2012). One possibility is that participants attend more to the feedback when it contradicts a response participants thought was correct and therefore encode it more effectively (Potts et al., 2019). This suggests there is a strong episodic component to learning from feedback. According to episodic context accounts of the testing effect (Karpicke et al., 2014), retrieval promotes integration of the specific episodic details, thereby creating a richer memory trace. This richer trace is more resistant to interference (Jacoby and Wahlheim, 2013). In fact, not only do participants retrieve answers better following feedback; they also are more likely to recall specific contextual details of the feedback itself, consistent with an episodic account (Overman et al., 2021). This suggests that one important role of feedback is in updating missing or incorrect knowledge (Metcalfe and Huelser, 2020) and that the ability to encode and retrieve the feedback is critical.

In the context of knowledge-based retrieval failures, feedback can act as an opportunity for new learning or as a cue or reminder of the correct answer, facilitating the recovery of marginal knowledge. Given the importance of episodic contributions to the benefits of feedback and evidence regarding new learning in general (Balota et al., 2000), one might expect that OAs would show a reduced benefit: Deficits in episodic memory should undermine OAs’ ability to acquire and integrate the feedback, especially when the errors are potentially integrated into their knowledge base. For YAs, prior work has shown that feedback can be powerful for stabilizing access to marginal knowledge (Berger et al., 1999). Even a multiple-choice-based retrieval attempt can help recover access for YAs (Cantor et al., 2015). Could the same be true for OAs?

As proposed above, one explanation for OAs’ recovery of marginal knowledge is that they initially underestimated their knowledge bases, making a metacognitive error by labeling items that were actually DR (and inaccessible) but perhaps felt especially difficult to retrieve as DK (and therefore, unavailable). However, it is also possible that after retrieval failures, OAs might have attended more to the feedback provided during the initial phase (Metcalfe et al., 2015). Examination of the response latencies from the final test in Coane and Umanath’s Experiment 3 is consistent with the latter hypothesis: Whereas YAs retrieved correct answers on the final test at the same speed regardless of whether they had responded DR or DK originally, OAs were significantly slower at producing the correct answer for those items given a DK response compared to a DR response, suggesting they were engaging in an effortful search through memory. An alternative is that even if the OAs truly did not have the answers stored in memory, they may have had more related or relevant knowledge in memory, thereby facilitating the acquisition of the new information (Umanath and Marsh, 2014). To address these questions, in a final experiment, we replicated the short-answer test version of the study with online samples and manipulated the presence of correct answer feedback. Importantly, systematically manipulating the presence of feedback allows us to examine the extent to which OAs are misjudging the availability of information in memory. Items that are truly not known would not be expected to be answered correctly, unless feedback is provided and participants learn from that feedback. If OAs demonstrate such a distinction in correct responses on the final test for DR versus DK items when no feedback is provided with more DR items being answered correctly than DK ones, it would provide evidence against the hypothesis that OAs are simply making metacognitive errors.

In sum, in the experiments reported here, we examined whether participants are accurate in determining whether an item is inaccessible (not remembered) or unavailable (not known) when retrieval failures are quite frequent. As in previous work (Coane and Umanath, 2019; Umanath et al., 2023), we expected final test accuracy to be higher for items initially judged as not remembered than those judged as not known for both age groups. When the final test is a multiple-choice format, a temporarily inaccessible item should be correctly recognized more often than an item that is not part of the knowledge base. When the final test requires effortful retrieval (i.e., a short-answer test) and feedback is given, the feedback should serve as a “reminder” and be subsequently retrieved at a greater rate for marginal knowledge (not remembered) than when the feedback acts as a new learning opportunity (when the item was deemed not known), perhaps especially for OAs who routinely show deficits in episodic learning. But particularly, in the absence of feedback, items not remembered should be more likely to be correctly answered, due to spontaneous retrieval or continued search in memory, compared to items not known. This is the key question we addressed in the final experiment.

Experiment 1

Experiments 1 and 2 served as replications and extensions of prior work (Coane and Umanath, 2019) to test participants’ abilities to distinguish failures in accessibility from failures in availability (Tulving & Pearlstone, 196) when items are unfamiliar or obscure (i.e., when retrieval is less likely to succeed and when the difference between these causes of failures could be less apparent). In Experiment 1, participants were not given correct answer feedback after initial exposure to short-answer general knowledge questions and were administered a final multiple-choice recognition test.

Method

Participants

In Exp. 1, 56 YAs (35 women) participated via participant pools at Claremont McKenna and Colby Colleges, earning course credit or $10 for their participation, and 33 community-dwelling OAs (27 women) from both surrounding communities participated for $10/h. Sample size was determined based on the effect size for OAs (who had a smaller effect size) for the difference in accuracy between initial DR and DK responses on the final test in Experiment 2 in Coane and Umanath (2019) and estimated power of 0.9. The minimum sample was 27 in each age group; YAs were over-sampled because both labs were recruiting and testing simultaneously. All but one of the OAs also completed the Mini Mental State Exam (MMSE; Folstein et al., 1975). OAs scored higher than YAs in vocabulary (Shipley, 1940), t(74) = 6.75, p < 0.001, d = 1.48, and had more years of education, t(32.07) = 11.95, p < 0.001, d = 2.62. See Table 1 for full demographic information.

TABLE 1

Table 1. Participant demographic information for experiments 1, 2, and 3.

Materials

To elicit a high rate of “don’t remember” (DR) and “don’t know” (DK) responses from both YAs and OAs, 70 difficult general knowledge questions (GKQs) from Tauber et al. (2013) were selected on the basis of particularly low reported retrieval rates in YAs (M = 0.26, SD = 0.14, range = 0.00–0.58). Based on norming data from OAs (Coane and Umanath, 2021), multiple-choice (MC) accuracy for these questions ranged from 0.10–0.98 (M = 0.53). Thirty additional GKQs from Tauber et al. with high retrieval rates were used as filler items (M = 0.76, SD = 0.08, range = 0.51–0.93). The questions had simple one- or two-word answers, (e.g., What is the last name of the author who wrote “Our Town”?, answer: Wilder). As a filler task, participants were provided a packet that contained simple arithmetic problems and Sudoku puzzles.

Procedure

The experiment was conducted using E-Prime software (Schneider et al., 2012), with YAs tested in the lab and OAs tested both in the lab and at a local senior college, all tested individually. Participants answered 100 GKQs in a short-answer format after being provided an example and told that some may be difficult. They were asked to respond “don’t know” (DK) or “don’t remember” (DR) if they were unsure of an answer. Critically, participants were not provided with explicit instructions on the difference between these two responses but simply told to use their best judgment (see Coane and Umanath, 2019). Questions were presented individually in random order. The task was self-paced, and participants typed their response directly into the computer. After a 5-min filled delay (participants could freely choose to do arithmetic or Sudoku), participants completed a multiple-choice (MC) test of the same GKQs in a different random order with five possible responses: The correct response and four plausible alternatives. The position of the correct answer varied across all five options an equal number of times across all questions. Participants then answered two open-ended questions about their use of DK and DR in a randomized order. Specifically, they were asked “What did you mean when you used “I don’t know/I don’t remember” in the first part of the study?.” Results from these questions are not reported here; they were examined to ensure that participants discriminated between the two options, which most participants did. Finally, all participants completed the Shipley (1940) vocabulary task, and OAs also completed the MMSE (Folstein et al., 1975).

Results and discussion

The analyses below, and in subsequent experiments, only include responses to the difficult questions; analyses on fillers are not reported. Where relevant, we applied a Bonferroni correction for multiple comparisons, and in cases of violations of assumptions of sphericity or normality, corrected degrees of freedom are reported (Greenhouse–Geisser for ANOVAs). For effect sizes, we report partial η² for ANOVAs and Cohen’s d for t-tests.

Initial short-answer performance

All responses were coded as incorrect (including both errors of omission and commission), correct (including minor spelling errors or morphological variations), DR, or DK. Errors of omission only occurred on nine trials (accounting for 0.001 of all trials). Omission errors were made by three YAs and three OAs; all these participants provided similar rates of correct, incorrect, DR, and DK responses to the participants who did not make any omission errors all ts ≤ 1.66, ps ≥ 0.101. Given the lack of independence in the responses (i.e., a higher rate of correct responses would necessarily result in fewer DR or DK responses), we report a series of independent samples t-tests comparing response proportions as a function of age (cf. Umanath et al., 2023). See Table 2 for means.

TABLE 2

Table 2. Participant performance on exposure phase of Experiments 1, 2, and 3 (standard errors in parentheses).

The overall low accuracy rate does confirm that the items were quite difficult. OAs answered more questions correctly than YAs, t(38.39) = 5.89, p < 0.001, d = 1.56. They were also more likely than YAs to respond incorrectly, t(87) = 5.64, p < 0.001, d = 1.24. In contrast, YAs responded DK more often than OAs, t(87) = −8.89, p < 0.001, d = −1.95. The use of DR did not differ across age groups, t(87) = 1.10, p = 0.273, d = 0.24.

Final MC test performance

A 4 (Response) × 2 (Age) ANOVA was conducted to examine the proportion of correct selection on the final MC test as a function of the initial test response. Data from 30 OAs and 44 YAs were included due to empty cells (i.e., some participants did not have data for the final test because they never provided one or more of the response options during the exposure phase). As seen in Figure 1, OAs answered more questions correctly (M = 0.56, SE = 0.01) than YAs (M = 0.47, SE = 0.01), F(1, 72) = 26.63, p < 0.001, η_p² = 0.27. Importantly, MC accuracy significantly varied as a function of participants’ initial test responses, F(2.47, 178.38) = 271.38, p < 0.001, η_p² = 0.79. Participants were able to maintain correct responses, recognizing the vast majority of final questions correctly if they were able to generate the correct answer initially (M = 0.90, SE = 0.02). Initial DR response items had the next highest mean selection accuracy rate (M = 0.48, SE = 0.02), followed by initial DK responses (M = 0.33, SE = 0.01) and by initially incorrect responses (M = 0.35, SE = 0.02). All pairwise comparisons were significant (ps < 0.001, all ds ≥ 0.71), other than the difference between initial DK and incorrect responses (p > 0.999, d = 0.11). Thus, our results provide evidence that both age groups successfully discriminated between DK and DR responses, replicating Coane and Umanath (2019: Exp. 2) when material was much more difficult and the distribution of phenomenology shifted away from mainly retrieval success.

FIGURE 1

Figure 1. Older and younger participants’ Experiment 1 accuracy on the final multiple-choice test as a function of initial response. Error bars represent standard error of the mean.

The analyses further revealed a significant Age by Response interaction, F(2.47, 178.38) = 11.81, p < 0.001, η_p² = 0.14 (see Figure 1). Tests of simple effects indicated that, in both age groups, the main effect of Response was significant: For OAs, F(3, 70) = 98.78, p < 0.001, η_p² = 0.81, and for YAs, F(3, 70) = 327.09, p < 0.001, η_p² = 0.93. The interaction was driven by the fact that, whereas for YAs, all pairwise comparisons were significant (all ps ≤ 0.043, all ds > 0.30), OAs’ final test accuracy for initial DK and Incorrect items did not differ (p > 0.999, d = 0.20), whereas all other pairwise comparisons were significant (all ps ≤ 0.010, all ds > 0.69). Importantly, for both age groups, items initially not remembered were correctly recognized more often than those originally not known (both ps ≤ 0.005, d_YA = 0.69 and d_OA = 0.76). The lack of a difference for OAs between DK and Incorrect items might reflect a true lack of knowledge (i.e., an initial guess led to an incorrect response – but the lack of prior knowledge did not indicate an ability to recognize the correct answer among foils). These results confirm those of Coane and Umanath (2019) and extend the differences in self-reported inaccessibility and unavailability to a new set of very difficult questions.

Experiment 2

In Experiment 2, after attempting to answer the short-answer GKQs initially, participants were given correct answer feedback regardless of their response. In the case of temporary retrieval failures or errors, the feedback should serve as a reminder and facilitate subsequent retrieval. In contrast, if the information is truly not known (is unavailable), the feedback should act as a new learning episode. As discussed above, using difficult GKQs may facilitate separation of initial DR versus DK items, which was not observed in Coane and Umanath (2019: Exp. 3) using easier questions. By modifying the materials in this way, we might increase the salience of information that is not available vs. not accessible, reducing the number of DK responses that refer to very difficult to access but ultimately available knowledge. Furthermore, for very obscure knowledge, the feedback would be more likely to be the first time individuals are exposed to this information. It is also possible that obscure knowledge has fewer connections to prior knowledge, which could make the feedback less effective because OAs would not be able to capitalize on their ability to integrate new learning (Metcalfe, 2017).