Response Format, Not Semantic Activation, Influences the Failed Retrieval Effect

Tanaka, Saeko; Miyatani, Makoto; Iwaki, Nobuyoshi

doi:10.3389/fpsyg.2019.00599

ORIGINAL RESEARCH article

Front. Psychol., 04 April 2019

Sec. Cognition

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.00599

Response Format, Not Semantic Activation, Influences the Failed Retrieval Effect

Saeko Tanaka¹^*

Makoto Miyatani²

Nobuyoshi Iwaki³

¹Department of Childhood Education, Tokushima Bunri University, Tokushima, Japan
²Department of Psychology, Hiroshima University, Hiroshima, Japan
³Faculty of Education, Iwate University, Iwate, Japan

In educational settings, tests are mainly used to measure the extent to which learners’ knowledge and skill have been acquired. However, the act of taking a test also promotes learning itself. In particular, making errors on tests (i.e., searching for erroneous information) promotes learning. This is called the “failed retrieval effect” (FRE) and has been the subject of considerable study. Previous research shows that enhanced learning does not occur if feedback correcting an error is delayed. This is attributed to the relative absence of activated information. In this study, we manipulated both the amount of information to be retrieved prior to learning and the delay time until feedback is given to investigate their effects on learning. As a result, even when multiple incorrect answers were given to increase the degree of semantic activation, learning was not promoted beyond that found with traditional procedures that rely on only one incorrect response. The timing of feedback (immediate, short-delay, long-delay) also did not impact FRE. However, the manipulation of response format for erroneous information resulted in degraded performance when responses were typed and feedback was delayed. Based on this result, we suggested that the failed retrieval effect was not affected by semantic activation at the time of retrieval but was affected by response format. Moreover, the processing necessary for typing may affect FRE under the delayed feedback condition.

Introduction

In educational settings, tests are implemented to evaluate the extent to which learners’ knowledge and skills have been acquired; in the context of memory research, recall and recognition tests are used to evaluate performance. However, taking tests itself also promotes learning. When initial testing is conducted on the material to be memorized, performance on subsequent tests improves, beyond what is learned through re-reading this material. This phenomenon, variously called the “testing effect” or “retrieval-based learning,” has generated an extensive literature (e.g., Dempster, 1996; Roediger and Karpicke, 2006a,b; Putnam and Roediger, 2013; Smith et al., 2013; Carpenter and Yeung, 2017; Sundqvist et al., 2017). Notably, enhancing memory via testing, contrary to both learners’ ordinary sense and behavioral learning theory, is not limited to when students answer correctly; it also occurs when they answer incorrectly. This is known as the “failed (unsuccessful) retrieval effect (FRE)” or the “pretesting effect.”

Kornell et al. (2009) conducted an experiment on FRE using paired association tasks involving weakly associated words. For example, the cue word, “tide,” is weakly related to the target word, “beach” in that the cue evokes the target in less than 5% of people sampled. In Kornell et al. (2009), participants learned word pairs under either a retrieval condition (test) or a control condition (read-only), after which a cued-recall final test was given. In the retrieval condition, an initial test was given in which a target word was answered based on a cue word, and then the word pair to be memorized was presented (correct feedback). In the control condition, the word pair was presented without an initial test. Because participants did not know which word was paired with the cue word before the initial test, they were unable to answer with the correct target word (i.e., failed retrieval) for almost 95% of words. However, in the final test, cued-recall performance for word pairs was significantly higher in the retrieval condition as compared to the control condition.

FRE occurs for a variety of stimuli and participants, including word pairs (e.g., Kornell et al., 2009; Grimaldi and Karpicke, 2012; Hays et al., 2013; Yan et al., 2014); short phrases or sentences (Richland et al., 2009; Kornell, 2014); word pairs of unfamiliar English words and their definitions or foreign vocabulary and their translations (Potts and Shanks, 2014); and trivia questions with multiple choice answers in children (Marsh et al., 2012). However, learning promotion does not occur equally for all stimuli; the type of stimulus and the timing of the correct answer feedback interact. In other words, although no effect occurs when word pairs are used as a stimulus without immediate feedback (e.g., Experiment 3 in Kornell et al., 2009; Grimaldi and Karpicke, 2012; Experiment 1 in Kornell, 2014) and when sentences such as trivia questions are used, learning is promoted even when the correct answer feedback is delayed (by about 1 day) (e.g., Experiment 3 in Richland et al., 2009; Kornell, 2014).

Why do such interactions occur? The answer is the FRE mechanism. Grimaldi and Karpicke (2012) conducted three experiments using word pairs as stimuli. In Experiment 1, they compared learning outcomes of a weakly related word pair (as in Kornell et al., 2009) and a completely unrelated word pair (e.g., “pillow—leaf”). In Experiment 2, in the initial test, they compared the conditions under which participants were free to answer the target word and conditions under which the experimenter presented a similar word called a “lure.” In Experiment 3, researchers compared the condition in which feedback was given immediately after the initial test and the condition in which the correct answer feedback was presented after performing the initial test with all word pairs (i.e., delayed feedback). In Experiment 1, FRE occurred only when there was an association between cue and target. In Experiment 2, performance was poorer in when the experimenter presented the word than in the control condition. In Experiment 3, learning promotion did not occur in either the delayed feedback or control conditions, which was a different result compared to the immediate feedback condition. Based on these results, Grimaldi and Karpicke (2012) explained the FRE mechanism as follows. First, participants freely retrieve target words from the cues in the initial test, thereby activating a semantic trace network (search set). At the time of the initial test, “candidates” (Grimaldi and Karpicke’s term for words that are likely to be associated with the cue) are given as a response, but the activation spreads to other candidates, including the correct target word. The activation eventually promotes learning of the presented correct answer feedback. Additionally, results of Experiment 3 showed that this activation lasted only for a short time, as in priming.

Conversely, Kornell (2014, Experiment 3a) found the wrong answer learning promotion effect, even when corrective feedback was delayed. In Kornell’s experiment, participants were asked to respond to trivia questions such as “Who was Time Magazine’s ‘Man of the Year’ in 1938?” and were given the correct feedback 24 h later. Participants were unable to correctly answer 86% of the questions, and these questions were analyzed. When a final test was conducted 24 h after the provision of feedback (i.e., 48 h after answering the questions), a learning promotion effect occurred that was attributed to the initial test, despite the delayed feedback. Based on these results, Kornell stated that there is no difference between FRE using word pairs in the immediate feedback condition and FRE using sentences in the delayed feedback condition, in which enhanced learning occurs because of retrieval. However, in the case of delayed feedback, it was likely that long-term memory had been activated concerning questions and incorrect answers. From the research of Grimaldi and Karpicke (2012), it appears that word-level information activated by FRE is not maintained with delayed feedback; however, for sentences, the learning benefits of FRE are obtained with delayed feedback (Kornell, 2014). The source of this discrepancy between studies is attributed to the amount of information activated. More information is activated when sentences are presented as a stimulus than when word pairs are given.

Thus, if we activate more information, will enhanced learning occur even with delayed feedback procedures using word pairs? Several studies have investigated the impact of the amount of activated information on the testing effect and FRE at the time of retrieval. For example, in Experiment 3a–5 by Lehman and Karpicke (2016), in the initial test conducted after participants had learned word pairs (i.e., in a testing effect procedure), they were asked to generate several words semantically associated with the cue words that differed from the target word they were supposed to remember (e.g., “mediator” words). As a result, on a final test in which participants were required to recall target words from cue words, their performance declined as the number of mediators increased. Regarding FRE, in Experiment 2 of Grimaldi and Karpicke (2012), as mentioned above, if the lure word presented by the experimenter was learned in addition to the to-be-remembered word pair, performance declined more than in the control condition. Moreover, in a recent study, Vaughn et al. (2017) used trivia questions and set the response interval between the question presentation and the correct feedback presentation to either 0 s (i.e., questions and answers were presented together, without allowing for retrieval) or between 5 and 30 s. As a result, although the correct answer rates on the initial test increased with longer retrieval times (Vaughn and colleagues believe that the participants continued to think of correct answers, and thus, more semantic revitalization occurred), “whether or not retrieval occurred” had an effect on the performance in the final test, while length of time did not.

From these studies, it appears that differences in the amount of activated information may not have a significant impact on learning promotion. However, Lehman and Karpicke (2016) asked participants to generate and learn words that were different from previously remembered target words. Furthermore, in Grimaldi and Karpicke (2012), lure words were presented by experimenters and not by the participants themselves. Therefore, it may be possible that this produced interference. Moreover, in Vaughn et al. (2017), sentences were used as stimuli; in light of Kornell’s (2014) results, it is also conceivable that sufficiently rich information to promote retrieval was activated, even if only for a short period. According to Grimaldi and Karpicke (2012), the search set theory explaining FRE is based on search-related theories, such as the SAM model. In the SAM model (Raaijmakers and Shiffrin, 1981; Gillund and Shiffrin, 1984), difficulty in recalling the learned item is dependent on the amount of related information. It is believed that as the amount of information related to the item to be learned increases, the recall possibility of that item in the final test increases. Furthermore, although word pair and sentence stimuli were compared by Kornell (2014), it is possible that the underlying characteristics of these stimuli may also differ greatly (i.e., the ease or difficulty of memorization, etc. may affect results).

In this study, we use word pairs to consider the relationship between delayed feedback and the richness of the semantic network activated at the time of initial testing. We investigate the impact of the amount of information retrieved in the initial test on FRE, particularly whether learning is not enhanced under delayed feedback as shown in previous studies or if it is affected by the amount of erroneous information retrieved in the initial testing. With reference to the SAM model and Vaughn et al. (2017), the semantic network activated was set using the number of words retrieved in the initial test.

In this study, we compared the impact of different retrieval conditions, achieved by adjusting feedback delay and the degree of semantic richness in the initial test, to promote learning. The retrieval conditions were determined based on the number of words answered in the initial test (one or two words) and the length of time to feedback (immediate feedback or delayed feedback). The four experimental conditions were (1) single word immediate feedback, (2) multiple word immediate feedback, (3) single word delayed feedback, and (4) multiple word delayed feedback. Participants were asked to learn word pairs under five conditions consisting of these four and a control condition just to read and remember the word pairs. Furthermore, unlike in previous studies (e.g., Grimaldi and Karpicke, 2012), participants read the responses aloud during the initial test rather than typing responses. This was done to reduce individual differences affecting learning outcomes, as there is large variation in typing accuracy and speed of Japanese college students.

Previous studies found no differences between oral responses and typed responses on testing effects (Putnam and Roediger, 2013). However, there has been no research on the effect of response format on FRE itself. Regarding the relationship between the response format and memory, for example in the production effect (MacLeod et al., 2010), whispering, typing, or handwriting items promote learning, but learning is promoted most by reading aloud (Forrin et al., 2012). While reading aloud, auditory processing occurs in addition to motor processing. Additionally, compared to whispering, reading aloud results in a stronger auditory signal, which requires more active encoding. Furthermore, Pinet and Nozari (2018) demonstrated that, unlike verbal responses, typed responses require post-lexical processing. Considering this information, if this study fails to produce the same results as the previous ones, it may possibly be due to the influence of response format (verbal vs. typed response) on FRE.

This study considers the following five hypotheses:

1. Final test performance is better under the condition in which subjects retrieve a word from a cue they think might be the target word (hereinafter “retrieved word”), before immediately being presented with the correct target word.

2. Learning will be promoted if feedback on the retrieved answer is given immediately, but not if it is delayed. This hypothesis is based on the findings of Grimaldi and Karpicke (2012) as well as Kornell (2014), particularly under the condition in which participants were asked to answer with only a single retrieved word (i.e., in which the activation of semantic networks is not rich).

3. If learning is promoted when considerable information is recalled from a cue even when feedback is delayed (Kornell, 2014), then learning promotion will occur when participants answer using multiple retrieved words even when feedback with the correct target words is delayed.

4. Performance on a final test with immediate feedback, in which participants are asked to give multiple retrieved words, will be better than performance in a typical procedure (i.e., in which participants are asked to answer using a single retrieved word with immediate feedback).

5. Performance on a final test will be affected by response format. Auditory processing occurs spontaneously and post-lexical processing is not required, so performance may be more improved when spoken than typed. Moreover, verbal response causes auditory processing but does not require post-lexical processing, which may benefit the oral response format.

Experiment 1

All experiments in this study were approved by the Ethical Review Committee of Tokushima Bunri University.

Materials and Methods

Participants

A mix of 27 university and junior college students (10 men and 17 women with an average age of 19.9 years, SD = 1.2) participated in the experiment and were paid an honorarium of 1,000 yen. Participants were treated in accordance with the APA Ethical Guidelines. Participants’ native language was Japanese, and each participant learned word pairs under all four experimental and one control conditions described below.

Materials

Stimuli were a hundred Japanese word pairs, consisting of a cue word and a target word (e.g., cheese-pizza, motorcycle-tire, the underlined word is the target). (All stimuli are provided in the Supplementary Material and were selected based on Mizuno, 2011). Word pairs were selected so that their association strength (i.e., the rate at which the target word would be guessed as the first response to the cue word) was between 0.041 and 0.054—that is, fairly weak (Kornell et al., 2009). Cue words were three morae in Japanese notation; target words were between two and four morae, and word notations included Chinese-derived characters (kanji) as well as both Japanese syllabaries (hiragana and katakana).

Conditions

There were four experimental and one control conditions. Participants were exposed to one-answer and two-answer conditions (respectively, the single condition and multiple condition) in which they answered one or two words that might be target words (“retrieved words”). Additionally, conditions were established in which the correct answer (the target word to be recalled) was provided as feedback immediately after the initial test and in which feedback was delayed (respectively, the immediate condition and the delayed condition). The five overall conditions, in combination, were as follows: the single-immediate condition, multiple-immediate condition, single-delayed condition, and multiple-delayed condition, as well as an additional control condition that did not involve answering a retrieved word (Figure 1).

FIGURE 1

Figure 1. Procedures for each condition at the word learning stage.

In the single-immediate condition, a plus symbol (+) was shown for 1 s on a computer screen as a fixation point. Participants were taught that the number of plus symbols indicated the number of words to be answered (retrieved words). Then, a cue word and a frame were shown on the screen for 7 s, during which participants were asked, based on the cue word, to say as quickly as possible a single word that might be the target word. Once a participant answered, the experimenter hit the ENTER key to proceed to the next screen, on which the correct target word was displayed for 5 s. If a participant was unable to answer within the 7-s time window, the display automatically moved to the next screen (which displayed the correct target word in the single-immediate condition). Participants were instructed to memorize the cue and target word pairs.

In the multiple-immediate condition, two plus symbols were shown for 1 s, after which, just as in the single-immediate condition, a cue word and a frame were shown on the screen for 7 s. Then, participants were asked to say a retrieved word as quickly as possible. When the participant answered, the experimenter hit the ENTER key to proceed to the next screen; when 7 s elapsed without the participant answering, the display automatically moved to the next screen. In this multiple-immediate condition, the procedure was repeated twice to make participants say two words (i.e., participants were requested to say another retrieved word after again being shown a cue word and frame for 7 s, proceeding to the next screen once 7 s had elapsed or they said a word). After answering, as in the single-immediate condition, the correct target word for the cue word was shown for 5 s, and participants were instructed to memorize the cue and target word pairs.

In the single-delayed condition, the initial test was the same as in the single-immediate condition. As a point of contrast, after a participant had answered a word or 7 s had elapsed, the screen shifted to the next trial immediately, without providing the correct target word from the previous cue as feedback. Feedback was presented in the same way as under the control condition (described below) once a certain number of trials were completed.

In the multiple-delayed condition, the initial test was the same as in the multiple-immediate condition, and feedback was presented in the same way as under the control condition once a certain number of trials were completed.

For the control condition, an asterisk (*) was displayed for 1 s as a fixation point, after which both the cue word and target word were displayed for 5 s, without implementing the initial test. Participants were asked to memorize the word pairs. These conditions were each allocated 20 word pairs, and the allocation of word pairs was counterbalanced.

Procedure

This experiment consisted of three phases, the study phase (initial tests and feedback of word pairs), a distractor task, and a final test (cf. Kornell et al., 2009). However, to delay feedback, the study phase was divided into four blocks (e.g., Hays et al., 2013) (Figures 2 and 3). In other words, in the single- and multiple-delayed conditions, the correct target word feedback for word pairs presented in one block was given in the subsequent block. No break was provided between blocks, and participants were not told that the study phase was divided into blocks. In the delayed conditions, the time from cue word presentation to target word presentation ranged between 0.4 and 11.6 min (M = 5.6 min, SD = 2.4 min).

FIGURE 2

Figure 2. Overall flow of Experiment 1.

FIGURE 3

Figure 3. Initial test and feedback implementation order for each block in each learning condition.

In each block, five pairs of words were assigned to each condition, with a total of 25 word pairs presented in random order. In addition, in the second and subsequent blocks, 35 pairs [25 trials and 10 feedback trials under the two delayed conditions presented in the previous block (five trials respectively)] were presented in random order. Ten word pairs assigned to the delayed conditions in the fourth block were presented using the same procedure as in the immediate conditions; these were excluded from the analysis. In addition, a total of 15 word pairs (10 trials assigned to the immediate conditions and 5 trials assigned to the control condition) learned in the first block were also excluded from the analysis in order to equate the number of trials in the delayed conditions and the lag up to the final test.

After obtaining participants’ written consent, instructions were given about the learning strategies for each condition in the study phase. Afterwards, a practice session was conducted using 10 word pairs not used in the experimental session to check whether participants understood the instructions. Thereafter, participants were asked to learn word pairs using the five conditions described earlier: the single-immediate condition, multiple-immediate conditions, single-delayed condition, multiple-delayed condition, and control condition. The order in which word pairs were presented in the blocks was randomized for each participant.

In the distractor phase, participants spent 5 min on a mental arithmetic task (performing arithmetic calculations involving two- and three-digits). Mathematical formulae and frames were presented on a computer screen, whereupon participants were instructed to enter answers into the frame using a numeric keypad.

The final test was a cued-recall task. After a plus symbol was displayed as a fixation point on the screen for 1 s, a cue word and frame were displayed for 7 s, during which time participants were instructed to recall and say the correct target word (not retrieved word) paired with the cue word. The correct answer was not displayed even when participants could not recall the target word in time or answered incorrectly. The cued-recall task in the final test followed the same procedure for all learning condition, and the order in which cue words were presented was randomized for each participant.

Results and Discussion

The trials were analyzed after excluding a total of 25 word pairs assigned to the single-immediate, multiple-immediate, and control conditions in the first block and the single- and multiple-delayed conditions in the fourth block. In addition, any word pairs answered correctly in the initial test (4%) were excluded. Cohen’s d and partial η ² were used as the effect size of the t-test and analysis of variance, respectively.

When comparing final test performance in the control condition to that in the four retrieval conditions (pooled performance of the four retrieval conditions), the correct answer rate was higher for word pairs assigned to the retrieval condition. Consistent with previous research, we found that prior retrieval of erroneous information promoted learning correct information (retrieval condition M = 0.75, SD = 0.14, control condition M = 0.62, SD = 0.21, t(26) = 4.11, p < 0.01, d = 0.73).

A two-factor analysis of variance using two (presence or absence of delay) by two (number of retrieved words answered) for the four retrieval conditions showed no main effects of delay (F(1, 26) = 0.39, p = 0.54 $η_{p}^{2}$ = 0.02) or number of answers (F(1, 26) = 0.09, p = 0.77 $η_{p}^{2}$ < 0.01). The interaction of delay and number of retrieved words answered was not significant [F(1, 26) = 0.04, p = 0.83 $η_{p}^{2}$ < 0.01] (Figure 4). Detailed results of each condition are described in Supplementary Material.

FIGURE 4

Figure 4. Correct answer rates on the final tests for each learning condition in Experiment 1 (error bars represent SD).

Therefore, the learning was promoted by conducting retrieval in the initial test, consistent with previous studies. Although the presentation time of the target word was the same in the retrieval and control conditions, the presentation time of the cue word was different. There is a possibility that this may have influenced the FRE. However, in previous studies (Kornell et al., 2009; Grimaldi and Karpicke, 2012), the presentation time of the cue word in the initial test was as 7–8 s, and the presentation time of the feedback (i.e., a pair of cue and correct target) was 5 s, as in this experiment. Similar results were obtained in these experiments. Moreover, in the study by Kornell et al. (2009), the effect of FRE did not change even if the stimulus presentation times between conditions were equal, that is, even when the presentation time of the control condition was increased. Thus, our results may have been influenced by the specifics of processing in the retrieval condition and not by the difference in the presentation time of cue words.

In contrast to previous studies, we found improved learning even when correct answer feedback was delayed. Grimaldi and Karpicke (2012) suggested that learning promotion by failed retrieval does not occur under conditions where feedback is delayed, so semantic networks that activate for only a short time like priming exist as a failed retrieval mechanism. However, the results of this experiment did not support this conclusion. Notably, encouraging the activation of the semantic networks by increasing the number of words to be answered had no impact on the FRE. Thus, the difference between sentences and word pair stimuli cannot be fully explained by “the richness of semantic networks” (Kornell, 2014).

However, in this experiment, the time interval to feedback varied greatly, from just under a minute to just under 12 min. Although the procedure in this experiment followed that of previous studies in which the effect of failed retrieval disappeared due to delayed feedback, it is possible that the variation in time until feedback may have influenced learning, and perhaps the delay time was not long enough. Accordingly, in Experiment 2, we minimized the variation of delay time to examine the impact of the relative length of the interval until feedback in promoting learning.

Experiment 2

In Experiment 2, we manipulated the interval of time until feedback to be either 4–5 min (short-delay condition) or 10–11 min (long-delay condition). This allowed us to investigate the effect and length of feedback delay on learning.