Impact of modality and mode of questioning and testing on memory reports

Riggenbach, Mackenzie R.; Gronlund, Scott D.; Zoladz, Phillip R.

doi:10.3389/fcogn.2024.1349511

ORIGINAL RESEARCH article

Front. Cognit., 23 October 2024

Sec. Memory

Volume 3 - 2024 | https://doi.org/10.3389/fcogn.2024.1349511

Impact of modality and mode of questioning and testing on memory reports

Mackenzie R. Riggenbach¹^*

Scott D. Gronlund¹

Phillip R. Zoladz²

¹Department of Psychology, University of Oklahoma, Norman, OK, United States
²Psychology Program, The School of Health and Behavioral Sciences, Ohio Northern University, Ada, OH, United States

Introduction: Individuals' memories are assessed in multiple contexts; however, depending on the context, how an individual is questioned may impact the quantity and quality of the details reported. One goal of this study is to investigate how the modality of questioning (individuals talk or write about an event) impacts memory reports. Additionally, being tested on previously learned information improves memory for that information compared to re-studying it. Consequently, another goal is to examine how questioning impacts memory reports compared to a second exposure. We utilized open-ended and pointed questions (true and false).

Method: Participants watched a short video and were questioned (Experiment 1: In-Person; Experiment 2: Virtual) about its contents immediately, 1 week, and 1 month later.

Results: The current study found that writing leads to better quality memory reports than speaking, and the benefit is present 1 week later. Additionally, we found that writing mitigates an anticipated testing benefit, although this depended on whether a pointed or open-ended question was asked. Restudying (vs. immediate testing) led to better performance for the false pointed questions. However, the better performance operated differently depending on whether participants wrote or spoke following restudying, perhaps due to a differential criterion shift between the Restudy-Written and the Restudy-Spoken conditions.

Discussion: We conclude that the impact of the modality of questioning is influenced in several ways by the types of questions asked, which bears significance for many domains because one modality (or a combination) may be more suitable for producing more accurate memory reports as a function of different domains.

Introduction

Individuals' memories are assessed in multiple contexts. However, depending on the context, individuals may be questioned differently. For example, in the classroom, a student may complete a multiple-choice or essay exam to assess course comprehension; at a crime scene, a police officer might ask a witness to complete an incident report or question a witness about a robbery; in a medical setting, a patient may be asked to list their symptoms on a check-in sheet or report them to a medical professional. Therefore, obtaining reports promoting the most accurate response is important. Thus, one focus of the current study is to examine the impact of a spoken or written report on the number and types of details reported.

Furthermore, in some contexts, individuals' memories may be assessed initially, or they may be provided with an opportunity to reengage with the information beforehand. For example, in the classroom, a student may restudy material before taking an exam or could be given a quiz; at a crime scene, an eyewitness may be questioned once or by several detectives about an incident; in a medical setting, a patient may be re-experiencing symptoms of an illness or experiencing and recounting them for the first time. Therefore, another focus of this study is the impact of reengaging with information compared to initial questioning on the quantity and quality of details reported.

It is also important to consider the dynamic of the memory assessments. For example, in an eyewitness setting, a witness may not predict the interview questions, the structure of the interview, or the kinds of questions they will be asked. However, in a classroom setting, this is not always the case, as students usually know what information they need to study and the type of test they will take. Therefore, the unanticipated nature of the questioning in an eyewitness setting compared to a classroom setting could impact individuals' expectations and, subsequently, the demands placed on their memories. Additionally, a consequence of reengaging with information or having one's memory assessed is that it alters the timeframe over which memories are evaluated and compared. Thus, of additional interest are changes in memory reports across time.

Overall, the goal of the study is to provide support for practical advice on how to improve memory in various real-world scenarios while also assessing potential interactions between variables, as memory can be impacted by more than one at a time. For example, the present study will allow us to gain evidence regarding how to question a witness or how students should prepare for their exams while being sure to consider how re-engaging with information over time and the subsequent passage of time can impact one's ability to do either task successfully.

Prior research reveals two competing ideas regarding the impact of the modality of questioning. Research that supports a written superiority effect suggests that writing is better because it allows for self-pacing and the ability to monitor what information has previously been produced. Sauerland et al. (2014) found that, in general, written free recalls led to better memory performance compared to speaking. Kraus et al. (2017) conducted several types of interviews after participants watched a video of a criminal event. Self-administered interviews (SAI), police officer questioning (POQ), and written free recall (FR) techniques were used for questioning. The SAI, a structured questionnaire that witnesses fill out, led to reports of more correct victim and setting details compared to the participants in the POQ or FR conditions. The SAI group also reported more correct offender and action details compared to the FR group. However, the POQ group did report more offender details compared to both the SAI and FR groups. This study suggests that writing (SAI) leads to better memory performance compared to speaking (POQ), although it is possible that how different written interviews are structured impacts the quality and quantity of eyewitness reports. In contrast, other work suggests that writing places higher demands on working memory because writing is slower than talking, less practiced, and requires activation of grapheme representations for spelling words (Kellogg, 2007).

On the other hand, research that supports a spoken superiority effect suggests that speaking about an event leads to better memory performance because it demands fewer cognitive resources (Kellogg, 2007). In a non-forensic setting, Kellogg (2007) presented a narrative story and found that participants in the spoken condition reported more correct propositions. Sauerland and Sporer (2011) found that having participants talk about a video event led to more detailed and accurate crime descriptions and more accurate central perpetrator details, although writing was better for reporting peripheral perpetrator details. However, it is important to note that although speaking is considered more productive, it is not necessarily more efficient. For example, when speaking, individuals may repeat what they have previously stated. Mechanisms thought to induce a spoken superiority effect include that speaking requires less muscular energy, is acquired earlier in life, and therefore is easier and more practiced (Sauerland and Sporer, 2011). As a result, speaking is thought to lead to a lower level of cognitive demand. Consequently, if cognitive load is low when speaking, individuals may have more working memory capacity available to report and describe details that require more effortful retrieval.

In addition to assessing memory with either a written or spoken report, evaluation can induce a testing effect because initial questioning can function as a test. Roediger and Karpicke (2006) showed that immediately testing individuals after reading a passage led to better long-term retention rates compared to merely restudying the passage. This effect remained even after a retention interval of a week. Additionally, research suggests that rates of forgetting can differ as a function of restudying vs. repeated testing and timing (Wheeler et al., 2003). More specifically, the forgetting of 40 items over an interval of 7 days occurred much faster in the study-only condition compared to the repeatedly tested condition. The ability to learn more from being tested provides an avenue to examine whether the modality of questioning mitigates the testing effect. This is particularly relevant because it is possible that testing and modality could interact, and under some circumstances, there may be instances where modality effects are no longer present or strong under. For example, Rohrer et al. (2005) conducted a study where participants who learned a set of stimulus material perfectly but further engaged in studying (overlearners) recalled more than the low learners on a 1-week test. However, these immediate benefits greatly diminished on the long-term retention tests. These findings suggest that overlearning is an ineffective strategy for learning material for longer periods of time. For our purposes, these findings support the notion that one variable (testing or modality) could dominate the other and subsequently impact memory reports. Therefore, elucidating these differences is crucial to understanding memory mechanisms, especially in different contexts, because it is possible that modality effects may change under optimal (or not) testing scenarios. For practical purposes, it is important to understand how testing and modality may interact.

Parsing out potential differences between a written test compared to a spoken test is important, especially as it relates to the types of details reported. For example, some research suggests that testing can increase the rate of semantically related false memories when there is a theme within a set of stimuli (McDermott, 2006). Thus, investigating the impact of testing and modality of questioning on true and false information is important. In addition, research suggests that different types of information are recalled based on how the information was previously encoded. Fuzzy-Trace Theory (FTT) posits that individuals encode details of an event as a function of gist and verbatim information (Reyna and Brainerd, 1995; Brainerd and Reyna, 2005). According to FTT (Reyna and Brainerd, 1995), studying promotes verbatim processing, whereas testing promotes gist processing (Bouwmeester and Verkoeijen, 2011). Consequently, when a theme is present within a set of stimuli, the enhancement of gist processing associated with testing serves as a helpful retrieval cue. However, restudying may be more effective at promoting retention through the enhancement of verbatim processing (Delaney et al., 2010). Additionally, research investigating memory for repeated events has found that participants in the repeated-event condition were more likely than their counterparts to report general details (Powell and Thomson, 1996) because they can recognize commonalities across exposures of the event (Brainerd et al., 2008). However, Theunissen et al. (2017) found that participants in the repeated-event condition were less accurate than the single-event condition. Therefore, it is important to investigate how repeated exposure impacts memory reports.

Studies that have investigated the modality of questioning have minimally investigated how the passage of time can impact these types of memory reports, and research that does so typically uses a retention interval of a week or shorter. An individual's memory may be assessed at various times following initial encoding, and it is likely that the memory report changes over time as a function of subsequent questioning. Kraus et al. (2017) found that those who completed the SAI immediately after observing the crime reported more correct details without a loss of accuracy 1 week later and had higher accuracy in the Cognitive Interview (CI) (Fisher and Geiselman, 1992) compared to participants in the FR and no-initial interview group. Additionally, Warren and Lane (1995) manipulated the type of initial test (no test, neutral, or misleading) and the type of second test that occurred 1 week later (no test, neutral, or misleading). They found that immediate neutral testing led to an enhancement in inoculating against forgetting and suggestibility. Pansky and Tenenboim (2011) found results consistent with Warren and Lane (1995) at a 48-h delay. It is important to investigate how memory reports change over longer periods in conjunction with a testing effect because Butler and Roediger (2007) found that being immediately tested can improve final recall even 1 month after the initial encoding.

Given the limited and contradictory evidence of the effect of the modality of questioning on memory reports, in conjunction with possible testing effects and question timing, the goal of the present study is to gain a greater understanding of how modality, testing effects, and timing interact to impact memory reports for correct and incorrect information. Based on the findings from Sauerland et al. (2014), we expect the participants in the written condition to report more correct details compared to participants in the spoken condition. We also expect a written superiority effect when questioned immediately, as evidenced by the findings from Kraus et al. (2017). Additionally, given the robust findings of the testing effect, we anticipate that participants who are tested (either by writing or speaking) during Phase 1 of our experiments will show a testing benefit compared to those that restudy the video. Lastly, we expect participants who restudy the information, instead of being immediately questioned, will endorse false questions to a lesser extent because FTT posits that restudying promotes verbatim processing, which should help participants to identify a false question that asks about an event that did not happen.

Experiment 1

Method

Participants

A statistical power analysis was conducted using GPower 3.0.10 (Faul et al., 2007) to determine the sample size, which was based on a Cohen's f effect size estimate of 0.35. The analysis recommended that 28 participants be recruited per condition for our design (power = 0.95). A medium-large effect size was chosen to remain consistent with findings from previous memory reporting research. A total of 125 introductory psychology students (29 males, 96 females; M_Age = 19.04 years, SD_Age = 2.21) from the University of Oklahoma (N = 95) and Ohio Northern University (N = 30) participated in this study in exchange for partial course credit resulting in a post hoc power of 0.91. All students were enrolled in an introductory psychology course and were recruited via a university recruitment portal (SONA study flier). The flier informed potential participants that they would watch a video and then be asked questions about the video at three different time points. Participants received a maximum of 2.5 research credits for their psychology course. They received credit following the completion of two laboratory sessions and one email response. To participate, students were at least 18 years of age and able to provide consent. In addition, participants indicated that they were proficient in English.

The present study is a 2 (Modality of Questioning: Written vs. Spoken) × 2 (Test vs. Restudy) × 3 (Timing: Immediate, 1-week delay, and 1-month delay, hereafter denoted Phase 1, Phase 2, and Phase 3) incomplete factorial design. The modality of questioning and testing vs. restudying is a between-subjects factor, and participants were randomly assigned to one of the following conditions: Restudy-Written (n = 30), Restudy-Spoken (n = 33), Written-Written (n = 31), and Spoken-Spoken (n = 31). Timing is a within-subject factor.

All participants' data were kept anonymous and separate from identifying information. No significant risks were encountered by the participants, and they were treated in accordance with APA (American Psychological Association) ethical standards. The study was approved by both the University of Oklahoma and Ohio Northern University Institutional Review Boards (IRBs - #11236).

Materials

Participants completed a demographic survey that involved them self-reporting their gender and age. They then viewed an 8-min excerpt from the Disney movie Looking for Miracles (Grant and Sullivan, 1989), which depicts the adventures of two brothers at summer camp. This video was chosen because it is an older film and is unlikely to have been previously seen by the participants. After watching the video, all participants were asked whether they had seen the video (n = 0). This video also was used because the dynamics of each scene allowed participants to have opportunities to report a multitude of details. Following the video, participants were asked questions about the video at different time points. Both the video and question materials are like those used in previous studies (Zaragoza et al., 2001; Zoladz et al., 2017). Depending on the condition, participants either wrote about what they saw in the video or spoke about it to a research assistant while an audio recording device (iPad) recorded the interaction.

A norming study (n = 11) was conducted to determine which reported details would be classified as central or peripheral to the video. These participants watched the video and were asked to report everything that they could remember. Details reported by more than six of the participants were classified as central details and details reported by five or fewer of the participants were classified as peripheral details. On average, a central detail was reported by 7.5 participants and a peripheral detail was reported by 3.7 participants.

Participants were queried with open-ended questions, pointed questions, and a combination of the two, by trained undergraduate research assistants. The open-ended questions were general questions asking about each of the three main scenes in the video. For example, “The first scene took place in the dining hall. Please talk about what events occurred, who was in the scene, describe the people who were there, and any other details that you can remember, such as, did any important conversations happen?” The open-ended questions were asked in storyline order. These questions were framed in a way to serve as guideposts to help participants organize their thoughts and report subsequent details, though they were free and encouraged to recall everything they could remember. If questioning occurred during the first phase of the experiment, only open-ended questions were used.

When questioned during the second and third phases of the experiment, the exact same open-ended questions were used, plus the addition of the pointed questions. The pointed questions used in Phase 2 were identical to those asked in Phase 3. Participants were asked all 13-pointed questions in storyline order, with five false questions interspersed amongst the eight true questions (but still following the order of events in the video). The true pointed questions asked about an event or detail that appeared in the video, whereas the false pointed questions asked about an event or detail that was plausible but did not occur in the video. The false pointed questions allowed us to determine how the susceptibility to false information might change depending on the modality of questioning. Participants were not forced to answer; they could indicate that an event did not happen or that they could not remember an answer. An example of a true question is, “The cook brought out a cake because it was one of the boy's birthdays. What did the cake say?” An example of a false question is, “After Delaney fell, where did he say that he injured himself?” Participants were not forced to answer; they could indicate that an event did not happen or that they could not remember an answer. Participants' responses that included anything other than “that did not happen” when answering a false question were coded as endorsing the false question and, therefore, provided an incorrect answer, unless they indicated that they did not know the correct answer. The amount of questions asked and the ordering of questions is similar to how they were presented in the studies conducted by Zaragoza et al. (2001) and Zoladz et al. (2017).

Procedure

After obtaining informed consent from all participants, participants were asked if they would provide their cell phone numbers to the researcher to receive session reminders throughout their 1-month sequence of sessions. Participants were not required to provide their cell phone numbers. Next, all participants completed the demographic survey. Following the completion of the survey, all participants watched the video. The previously mentioned procedural steps were identical for all participants. It is at this point that the procedure changes depending on the condition. Figure 1A displays the tasks and timeline followed by the participants who were not questioned about the video in Phase 1 and instead re-watched the video (Restudy Conditions). Figure 1B depicts the timeline followed and the tasks completed by the participants who were questioned about the video in Phase 1 (Immediately Questioned Conditions).

Figure 1

Figure 1. (A) Restudy conditions. Restudy-Written (RW) and Restudy-Spoken (RS). (B) Immediately questioned conditions. Written-Written (WW) and Spoken-Spoken (SS).

Restudy conditions

After the video ended, participants in the two restudy conditions (Restudy-Written and Restudy-Spoken) watched it again. These participants were not questioned during Phase 1. This phase concluded with a reminder of their next session and then dismissal. The testing effect will be assessed by comparing these participants to those in the Immediately Questioned conditions (described below).

One week later, participants were randomly assigned to write (Restudy-Written) or speak (Restudy-Spoken) during the Phase 2 questioning. Before beginning, participants were asked to indicate the two main characters' names (Delaney and Sullivan). If participants answered incorrectly, the researcher informed them of the correct answers. The purpose of having the participants identify the characters, and be corrected, if necessary, was to make sure that they could correctly reference the two main characters. Participants were then queried with open-ended questions and then with pointed questions. However, depending on the condition, participants either wrote out their responses on lined sheets of paper (Restudy-Written) or spoke to a research assistant while an audio recording device recorded the interaction (Restudy-Spoken). Following the questioning, participants were debriefed. They also were reminded that they would receive an email in 3 weeks to complete Phase 3. One month following Phase 1, participants received an email containing the open-ended and pointed questions along with instructions for how to complete Phase 3. Participants were given 1 week to return their responses. Phase 3 should have taken participants about 15 min to complete.

Immediately questioned conditions

Participants randomly assigned to the immediately questioned conditions (Written-Written or Spoken-Spoken) watched the video only once and then were immediately questioned. Participants were first asked the warm-up questions about the main characters' names before the open-ended questions (i.e., no pointed questions were asked). Those in the Written-Written condition wrote their responses to the questions on lined sheets of paper, whereas those in the Spoken-Spoken condition spoke their responses into the audio recorder. Following the questioning, participants were reminded of their next session and dismissed.

One week later, participants were queried using open-ended questions and then pointed questions. The modality of questioning was not mixed. That is, participants in the written conditions (Written-Written) wrote their responses on lined sheets of paper during Phase 2. Participants in the spoken conditions (Spoken-Spoken) for Phase 2 spoke to a research assistant while their responses were recorded with an audio recorder. Following the questioning, participants were debriefed and then reminded that they would receive an email to complete Phase 3 in 3 weeks. The email questioning proceeded as described above. Regardless of condition, all participants typed their responses to the Phase 3 questions.

Note that for these in-person experimental sessions, a researcher was present throughout Phases 1 and 2, regardless of condition type. Multiple research assistants helped with data collection; however, participants had the same research assistant for their Phase 1 and Phase 2 sessions. The researcher set up the video and then sat across from the participant during the entire session to ask all questions and record the spoken responses. All researchers were trained on the proper protocol for questioning participants and were blind to the experimental hypotheses. They responded to each answer with a transitory comment such as, “Okay, the next question is....” This was meant to reduce chances of confirmatory feedback or other cues that might indicate to the participants the verity of their responses (Zaragoza et al., 2001). Researchers also recorded the time it took to complete the questioning, though participants were free to take as much time as they needed. All lab sessions had to be completed sequentially. That is, to participate in Phase 3, Phases 1, and 2 must have been completed within the allotted timeframe.

Results

A total of 125 participants completed Phase 1. Of those, 118 (94.4% return rate) participants completed Phase 2, and 96 (76.8% response rate) participants completed Phase 3. Only participants who completed at least Phases 1 and 2 were included in the subsequent analyses. However, before conducting any analyses, the data were cleaned, which resulted in an additional 10 participants being removed because of technical errors or not completing Phases 1 and 2. Additionally, the data were examined for normality, and participants' scores for each of the dependent variables were assessed for outliers. Outlying data points were removed; however, only the single data point was removed, not the participant. Therefore, 115 (out of 118) participants' data were used for the Phase 1 and Phase 2 data analyses, and 91 (of 96) participants' data were used for the Phase 3 analyses.

All audio recordings were transcribed before coding. Recordings were coded by three individuals. Interrater reliability scores ranged between a Kappa value of 0.77 and 1.00. All coding disagreements were discussed amongst the coders until a mutual decision could be reached. Open-ended responses were coded for central and peripheral details, intrusions, and any other detail reported correctly but not deemed central or peripheral to the video (based on the norming study). For all analyses, we use partial eta squared as the appropriate effect size estimate. The reasonable effect sizes when interpreting partial eta squared are $η_{p}^{2}$ = 0.01, $η_{p}^{2}$ = 0.06, and $η_{p}^{2}$ = 0.14 for small, medium, and large effects, respectively.

Phase 1

To reiterate, participants in the two restudy conditions (Restudy-Written and Restudy-Spoken) were not questioned during Phase 1. In the Written-Written and Spoken-Spoken conditions, the total number of central, peripheral, and correct other details reported during Phase 1 were combined to reflect the overall correct number of details reported for each participant. A one-way ANOVA was conducted to compare the total correct number of details reported in the Spoken-Spoken and Written-Written conditions. Participants in the Written-Written condition (M = 35.89, SD = 7.40) reported significantly more correct details overall compared to participants in the Spoken-Spoken condition (M = 29.25, SD = 11.73). We also separately analyzed the number of central and peripheral details reported. Participants in the Written-Written (M = 14.37, SD = 2.87) condition reported significantly more central details than those in the Spoken-Spoken (M = 12.18, SD = 3.87) condition. Participants in the Written-Written (M = 7.48, SD = 2.53) condition also reported more peripheral details than participants in the Spoken-Spoken (M = 6.11, SD = 2.59) condition. Therefore, the written benefit extends to both the most important details (central) as well as to peripheral details. All F-tests and relevant descriptive statistics are reported in Table 1.

Table 1

Table 1. Experiment 1: Phase 1 F-tests and relevant descriptive statistics.

In addition to the number of details correctly reported during Phase 1, the accuracy of those details was assessed. Thus, input- and output-bound scores (Koriat and Goldsmith, 1994) were computed for each participant and then compared across conditions. Input-bound accuracy (IBA) is the proportion of central and peripheral details correctly reported. It was computed by dividing the number of correctly recalled central and peripheral details by the total number of predetermined central and peripheral details. Other details correctly reported are not considered in this metric. Output-bound accuracy (OBA) is the proportion of all details that are reported correctly. It was computed by dividing the number of correctly recalled details by the total amount of details recalled. This metric includes all correctly recalled details and any intrusions. It is important to note that with movie scenes, it is possible to report an infinite number of details (Koriat and Goldsmith, 1996), which might suggest that an IBA score is not appropriate to report. However, by relying on our norming study, we established that (one estimate of) the maximum number of central and peripheral details reported are 21 and 18 details, respectively. These are the details we included in our calculation of IBA.

Multiple one-way ANOVAs were conducted to compare IBA and OBA, completion times, and word counts for the Written-Written and Spoken-Spoken conditions. There was a significant difference in IBA and completion times, but not for OBA or word counts. More specifically, participants in the Written-Written condition (M = 0.56, SD = 0.12) had a greater IBA proportion than those in the Spoken-Spoken condition (M = 0.48, SD = 0.16). These findings suggest that, in general, writing immediately following encoding improves memory reports compared to speaking about what transpired. This effect is illustrated in Figure 2. Additionally, participants in the Written-Written condition (M = 1,069.91, SD = 347.71) took significantly more time to answer the questions than those in the Spoken-Spoken condition (M = 303.00, SD = 118.00). This finding is not surprising, given that writing is slower than speaking. However, combined with no difference in the number of words produced during Phase 1, these findings suggest that participants in the Written-Written condition are not performing better merely because they took more time to answer the questions. Instead, it suggests that participants are working equally hard in each condition, but participants in the Written-Written condition are working more effectively and accurately. All F-tests and relevant descriptive statistics are reported in Table 1.

Figure 2

Figure 2. Experiment 1: Phase 1. IBA is portrayed as a function of the modality of questioning. Along the x-axis, participants in the Written-Written condition were those who were immediately questioned following the video and then wrote their answers during the interview. Participants in the Spoken-Spoken condition were those who were immediately questioned following the video and then wrote their answers during the interview. This figure illustrates the presence of the written superiority effect for Phase 1. Error bars represent standard errors.

Phase 2

Multiple two-way ANOVAs and subsequent pairwise comparisons with Bonferroni corrections were conducted on the Phase 2 data to understand better the effects that the modality of questioning and testing vs. restudying had on memory report changes over time. These ANOVAs were conducted to compare (1) IBA, (2) OBA, (3) the proportion of true pointed questions answered correctly, (4) the proportion of false questions endorsed, (5) the proportion of false questions rejected, (6) the time it took to complete the interview, and (7) word counts.

There was no significant main effect of the modality of questioning or testing vs. restudying, as well as no significant interaction for the dependent variables of IBA, OBA, or the proportion of true pointed questions answered correctly. All F-tests and relevant descriptive statistics are reported in Table 2. Though there was not a significant difference in OBA, it is important to note that participants were highly accurate in what they reported (see Table 2).

Table 2

Table 2. Experiment 1: Phase 2 F-tests and relevant descriptive statistics.

For the proportion of false questions endorsed, there was a significant main effect of the modality of questioning. More specifically, participants who wrote (M = 0.35, SD = 0.27) endorsed false questions to a greater extent than participants who spoke (M = 0.22, SD = 0.25). Here, endorsing a false question is an incorrect response (participants should indicate that the question asked about something that did not happen). Therefore, participants who wrote did worse than participants who spoke when answering the false questions. There was also a significant main effect of testing vs. restudying, such that participants who were interviewed immediately (Test) (M = 0.34, SD = 0.26) endorsed false questions to a greater extent than participants who had the opportunity to rewatch the video (Restudy) (M = 0.23, SD = 0.27). There was not a significant interaction between the modality of questioning and testing vs. restudying. All F-tests and relevant descriptive statistics are reported in Table 2.

There was also a significant main effect of the proportion of false questions correctly rejected. Participants who wrote (M = 0.27, SD = 0.25) correctly rejected the false questions as asking about a false detail more than participants who spoke (M = 0.16, SD = 0.20). Additionally, there was a significant main effect of testing vs. restudying such that participants who had the opportunity to rewatch the video (M = 0.27, SD = 0.25) correctly rejected the false questions more than participants who were immediately questioned (M = 0.17, SD = 0.21) about the video. This suggests that, in general, restudying (vs. immediate testing) led to better performance regarding handling the false questions, consistent with research supporting that restudying promotes verbatim recall. There was no significant interaction between the modality of questioning and testing vs. restudying. All F-tests and relevant descriptive statistics are reported in Table 2.

Unsurprisingly, there was a significant main effect of the interview completion times such that participants who wrote (M = 1,213.35, SD = 271.54) took significantly more time to complete the interview compared to participants who spoke (M = 446.190, SD = 103.77), replicating the results from Phase 1 (p's < 0.05). There was not a significant main effect of testing vs. restudying or an interaction. Of greater interest are the significant differences in the number of words produced during the interviews. Participants who wrote (M = 408.81, SD = 114.40) produced fewer words than those who spoke (M = 572.74, SD = 234.62). There was no significant main effect of testing vs. restudying or a significant interaction. All F-tests and relevant descriptive statistics are reported in Table 2.

Phase 3

The Phase 3 data were checked for outliers, normality tests were conducted, and the relevant data were transformed, when necessary, via log transformations. The time to answer the questions was not recorded for Phase 3 because the questioning occurred via email. Multiple two-way ANOVAs and subsequent pairwise comparisons with Bonferroni corrections were conducted on the Phase 3 data to understand better the effects that the modality of questioning and testing vs. restudying had on memory report changes over time. These ANOVAs were conducted to compare (1) IBA, (2) OBA, (3) the proportion of true pointed questions answered correctly, (4) the proportion of false questions endorsed, (5) the proportion of false questions rejected, (6) open-ended question word counts, and (7) pointed-question word counts. These analyses all produced non-significant findings (p's > 0.05) except for a significant main effect of testing vs. restudying for the proportion of false questions correctly rejected. Those who had the opportunity to restudy (M = 0.26, SD = 0.26) correctly rejected the false questions as asking about a false detail more than participants who were immediately questioned (M = 0.16, SD = 0.21). These findings suggest that the written superiority effect only extended to 1 week after encoding. All F-tests and relevant descriptive statistics are reported in Table 3.

Table 3

Table 3. Experiment 1: Phase 3 F-tests and relevant descriptive statistics.

Performance of participants who completed all sessions

It is possible that the number of participants that self-selected out at each phase could have impacted the previous findings. Therefore, two 2 × 2 × 2 mixed ANOVAs and subsequent pairwise comparisons with Bonferroni corrections were conducted for the IBAs and OBAs as a function of the modality of questioning and testing vs. restudying to assess potential differences across phases that may have been obscured by only examining participants who completed at least Phase 1 and Phase 2. Thus, the following analyses involve only participants who completed every phase (N = 91). There was a significant difference in IBA across phases such that the IBAs were significantly lower from Phase 2 (M = 0.44, SD = 0.14) to Phase 3 (M = 0.39, SD = 0.15). There were no other significant within-subject main effects or interactions, or any significant between-subject main effects or interactions (p's > 0.05). When examining OBAs, there were no significant between- or within-subject main effects or interactions. All F-tests and relevant descriptive statistics are reported in Table 4.

Table 4

Table 4. Experiment 1: Performance of participants who completed all sessions.

Rate of information loss

Multiple two-way ANOVAs and subsequent pairwise comparisons with Bonferroni corrections were conducted on the rate of information loss from Phase 1 to Phase 2 to Phase 3 for the immediately questioned groups and the rate of information loss from Phase 2 to Phase 3 for the restudy groups. There was a significant difference in the rate of information loss for IBA for participants in the immediately questioned groups [F_{(5, 153)} = 6.429, p < 0.001, $η_{p}^{2}$ = 0.174] (power = 1.00), such that participants had a significantly lower IBA in Phase 3 (M = 0.40, SD = 0.15) compared to Phase 1 (M = 0.52, SD = 0.15); [F_{(2, 153)} = 9.199, p < 0.001, $η_{p}^{2}$ = 0.107] (power = 1.00) (Figure 3). There was also a significant difference in IBA for those who wrote (M = 0.51, SD = 0.14) vs. spoke (M = 0.42, SD = 0.15); [F_{(1, 153)} = 14.159, p < 0.001, $η_{p}^{2}$ = 0.085] (power = 1.00). There was not a significant interaction between Phase and the immediately questioned groups (Written or Spoken) [F_{(2, 153)} = 0.009, p = 0.991, $η_{p}^{2}$ = 0.000]. There was not a significant difference in the rate of information loss for OBA for participants immediately questioned across the three phases: F_{(5, 139)} = 1.254, p = 0.288, $η_{p}^{2}$ = 0.043. For the restudy groups, there was not a significant difference in the rate of information loss for IBA [F_{(3, 100)} = 1.126, p = 0.342, $η_{p}^{2}$ = 0.033]. Additionally, there was no significant difference in the rate of information loss for OBA: F_{(3, 94)} = 0.659, p = 0.580, $η_{p}^{2}$ = 0.021.

Figure 3

Figure 3. Experiment 1: Rate of information loss for IBA as a function of phase and test vs. restudy. The rate of information loss for IBA is presented along the y-axis as a function of each phase and whether a participant was immediately questioned or rewatched the video as a function of time (Phase). Over time, IBA decreases. Phase 2 occurs 1 week after Phase 1, and Phase 3 occurs 1 month after Phase 1.

Discussion

Results from the present experiment are consistent with research supporting a written superiority effect in Phase 1 and 2 (Kraus et al., 2017; Sauerland et al., 2014). Participants who write provide more correct information compared to their counterparts, although the amount of information they produce is similar. Sauerland and Sporer (2011) posited that speaking may be more productive but not necessarily as efficient; the present study provides support for this idea based on the longer completion times for those participants who wrote but equivalent word counts compared to those who spoke.

In general, restudying led to better performance regarding the management of false questions, consistent with research supporting that restudying promotes verbatim recall (Reyna and Brainerd, 1995). However, the better performance operated differently depending on whether participants wrote or spoke following restudying. There may be a differential criterion shift such that participants' inclination to provide an answer increased in the written conditions, possibly due to not having to directly provide answers to the research assistant, whereas those who spoke may have been reticent to provide an incorrect answer directly to the research assistant. These findings provide another indication that the benefit due to writing can override an anticipated testing effect.

Experiment 2

Experiment 2 was conducted to replicate and extend the findings of Experiment 1. The COVID-19 pandemic created a natural experiment examining the impact of removing some of the social factors that may impact reporting performance. Experiment 2 was identical to Experiment 1 with the exception that it occurred virtually using the platform Zoom rather than in-person. Prior research suggests that allowing individuals to be questioned remotely can lead to an increase in the accuracy of overall memory reports and a reduction in error reporting (Nash et al., 2014; Taylor and Dando, 2018). Therefore, it is possible that the written superiority effect seen in Experiment 1 could be enhanced in Experiment 2 due to removing the presence of the interviewer.

In addition to the cognitive factors that may impact memory retrieval, there are social factors that impact performance, like the presence of an interviewer. Bergmann et al. (2004) had patients complete a written questionnaire and personal interview related to their medical history. They found that when the interviewer was absent, the reporting of serious diseases was less likely. In a review, Rosenthal (2002) suggested that the presence of an interviewer may unintentionally introduce cues to the witnesses who spoke to report more central rather than peripheral details, though the interviewer was not instructed to do so. This might occur because the interviewer's presence may increase an individual's overall motivation to perform, thereby focusing on more relevant information. Sauerland et al. (2014) found that conditions in which the interviewer was absent while writing led to better recall performance. We do not expect the move to virtual testing to influence the testing benefit. Lastly, we still expect participants who restudy the information, instead of being immediately questioned, to be less likely to endorse the false questions.

In Experiment 2, both the researcher and participant kept their cameras turned off. It is important to note that because the experimental sessions occurred virtually, the participants typed their responses instead of writing on lined sheets of paper. However, previous research has shown that typing and writing lead to similar performance levels on essay exams. Though typing is faster, the quality of the essays is not significantly different (Mogey et al., 2010).

Method

Participants

A statistical power analysis was conducted using GPower to determine the sample size, which was based on Experiment 1's effect size estimate of 0.35 (power = 0.95), which determined an ideal sample size of 109 participants. However, a total of 84 introductory psychology students (32 males, 52 females; M_Age = 20.04 years, SD_Age = 4.54) from the University of Oklahoma participated in this study in exchange for partial course credit, therefore resulting in an overall post hoc power estimate of 0.89. All students were recruited via a university recruitment portal (SONA study flier). The flier informed potential participants that they would watch a video and be asked various questions about the video at three different time points. Participants received a maximum of 2.5 research credits for their psychology course. They received credit following the completion of two virtual sessions and one email response. To participate, students must have been at least 18 years of age and be able to provide consent. In addition, participants must have considered themselves proficient in English.

As in Experiment 1, participants were randomly assigned to: Restudy-Written (n = 19), Restudy-Spoken (n = 20), Written-Written (n = 25), and Spoken-Spoken (n = 20). All participants' data were kept anonymous and separate from all possible identifying information. No significant risks were encountered by the participants, and they were treated in accordance with APA ethical standards. This study was approved by the University of Oklahoma IRB.

Materials

The materials used in this experiment were identical to those used in Experiment 1.

Design and procedure

The design of Experiment 2 is the same as Experiment 1. A 2 (Modality of Questioning: Written vs. Spoken) × 2 (Test vs. Restudy) × 3 (Timing: Immediate, 1-week delay, and 1-month delay, hereafter denoted Phase 1, Phase 2, and Phase 3) incomplete factorial design was used. The dependent variables are the same as in Experiment 1.

Experiment 2′s procedure is identical to Experiment 1, apart from it occurring virtually. Before beginning the sessions, participants were instructed to turn their computer cameras off. The researchers also kept their computer camera off. The researchers shared their screen to show the participant the video. Also, instead of recording participants' spoken answers with an audio recorder as in Experiment 1, the virtual meeting was recorded and uploaded to MyMedia (OU MyMedia, 2020) for transcription, after which a researcher edited and fixed any transcription errors. Only participants in the Spoken conditions (Restudy-Spoken and Spoken-Spoken) had their answers recorded and uploaded to MyMedia. Participants were only recorded when they answered the questions; the parts of the session that occurred before the questions were asked were not recorded. Participants in the written conditions (Restudy-Written and Written-Written) typed their responses in a Word document instead of writing on lined sheets of paper. These participants emailed their responses immediately to the researcher when the session ended. The responses were de-identified and saved.

Results

Eighty four participants completed Phase 1. Of those, 75 (89.3% return rate) participants completed Phase 2, and 44 (52.4% response rate) participants completed Phase 3. Only participants who completed Phases 1 and 2 (n = 75) were included in the subsequent analyses. An additional participant was removed (failed to return their responses by the due date) for the Phase 3 analyses, so 43 (out of 44) participants' email responses were used for the Phase 3 analyses. All data were checked for outliers and transformed when necessary.