Psychometric Properties of the Verbal Affective Memory Test-26 and Evaluation of Affective Biases in Major Depressive Disorder

Hjordt, Liv V.; Ozenne, Brice; Armand, Sophia; Dam, Vibeke H.; Jensen, Christian G.; Köhler-Forsberg, Kristin; Knudsen, Gitte M.; Stenbæk, Dea S.

doi:10.3389/fpsyg.2020.00961

ORIGINAL RESEARCH article

Front. Psychol., 05 June 2020

Sec. Quantitative Psychology and Measurement

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.00961

Psychometric Properties of the Verbal Affective Memory Test-26 and Evaluation of Affective Biases in Major Depressive Disorder

$\r\nLiv V. Hjordt*$ Liv V. Hjordt^1*

Brice Ozenne^1,2

Sophia Armand¹

Vibeke H. Dam^1,3

Christian G. Jensen⁴

Kristin Köhler-Forsberg^1,3,5

Gitte M. Knudsen^1,3

Dea S. Stenbæk¹

¹Neurobiology Research Unit and Center for Integrated Molecular Brain Imaging, Rigshospitalet, Copenhagen, Denmark
²Department of Public Health, Section of Biostatistics, University of Copenhagen, Copenhagen, Denmark
³Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
⁴Centre for Mental Health Promotion, Department of Psychology, University of Copenhagen, Copenhagen, Denmark
⁵Psychiatric Center Copenhagen, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark

We developed the Verbal Affective Memory Test-26 (VAMT-26), a computerized test to assess verbal memory, as an improvement of the Verbal Affective Memory Test-24 (VAMT-24). Here, we psychometrically evaluate the VAMT-26 in 182 healthy controls, examine 1-month test–retest stability in 48 healthy controls, and examine whether 87 antidepressant-free patients diagnosed with Major Depressive Disorder (MDD) tested with VAMT-26 differed in affective memory biases from 335 healthy controls tested with VAMT24/26. We also examine whether affective memory biases are associated with depressive symptoms across the patients and healthy controls. VAMT-26 showed good psychometric properties. Age, sex, and IQ, but not education, influenced VAMT-26 scores. VAMT-26 scores converged satisfactorily with scores on a test associated with non-affective verbal memory. Test–retest analyses showed a learning effect and a r ≥ 0.0.8, corresponding to a typical variation of 10% in recalled words from first to second test. Patients tended to remember more negative words relative to positive words compared to healthy controls at borderline significance (p = 0.06), and affective memory biases were negatively associated with depressive symptoms across the two groups at borderline significance (p = 0.07), however, the effect sizes were small. Future studies are needed to address whether VAMT-26 can be used to distinguish between depression subtypes in patients with MDD. As a verbal memory test, VAMT-26 is a well validated neuropsychological test and we recommend it to be used in Danish and international studies on affective memory.

Introduction

While verbal memory is a broad concept referring to memory for verbally presented information, verbal affective memory refers to memory for verbally presented information with an emotional content. Examination of non-affective and affective verbal memory typically involves the presentation of word lists or stories, which are subsequently recalled or recognized within a set time interval. Verbal memory is one of the most examined cognitive domains and considered fundamental to intelligence testing and to disease assessment and diagnosing (e.g., Alzheimer’s disease), as well as in the study of affective biases in cognition following psychological or pharmacological interventions (Lezak et al., 2004). However, currently, our knowledge about the interaction between affectivity and verbal memory is more limited (Groeger, 1997; Terry, 2003; Joormann and Stanton, 2016). The notion of a mood-congruent memory bias was first suggested by Bower (1981), who theorized that individuals show superior memory for material that is consistent with the individual’s mood state compared to material that is mood incongruent. Such mood-congruent memory bias may contribute to difficulties using adaptive emotion regulation strategies, and may also affect individuals’ perception of a certain situation and change their appraisals [as discussed in Joormann and Stanton (2016)]. Empirical evidence of depression-related memory biases, where clinically depressed individuals show a preferential recall of negative compared to positive information (Matt et al., 1992; Mathews and MacLeod, 2005), is especially supported by studies examining implicit (Gaddy and Ingram, 2014) and autobiographical (Kohler et al., 2015) mood-congruent memory. However, empirical evidence of a mood-congruent memory bias in explicit non-self-referential memory is more discrepant, as some studies provide support of a negative bias in patients with Major Depression Disorder (MDD) (Watkins et al., 1992; Bradley et al., 1995; Neshat-Doost et al., 1998), while other studies provide support of a positive bias (Danion et al., 1995; Calev, 1996; Zupan et al., 2017). Further, studying the association between verbal memory biases and depression may contribute to existing knowledge and enhance our understanding of affective cognition in affective disorders, as well as in general.

Unfortunately, most available verbal memory tests suffer from several methodological shortcomings, which especially holds true for tests of verbal affective memory (Lezak and Lezak, 2004). For example, the Affective Auditory Verbal Learning Test (AAVLT) (Snyder and Harrison, 1997) administered separate positive and negative word lists to separate individuals, despite it being more efficient to examine affective memory by administering positive, negative, and neutral information to the same individuals (as recommended by e.g., Elliott et al., 2011). The Emotional Verbal Learning Test (EVLT) (Strauss and Allen, 2013) consists of only 16 words, which increases the risk of ceiling effects (i.e., that all or nearly all words are recalled) and uses an unequal distribution of positive and negative words (i.e., 4 positive and 12 negative words), lowering the sensitivity for assessing memory for positive information. The Cognitive-Affective Verbal Learning Test (C-AVLT) (Considine et al., 2017) was developed to overcome several of the shortcomings of the AAVLT and the EVLT, for example. However, as for the EVLT, the C-AVLT consists of only 16 words (i.e., 4 positive words, 4 negative words, 4 neutral-abstract words, and 4 neutral-concrete words, increasing the risk of ceilings effects. In general, most existing verbal memory tests include a mix of common, highly unusual or taboo words, though the latter are known to affect memory performance differentially (Lezak and Lezak, 2004).

Taken together, new tests of recall of commonly encountered non-affective and affective words are needed (Elliott et al., 2011; Bayer and Schacht, 2014). To address some of the methodological shortcomings of existing verbal affective memory tests, we recently developed the Verbal Affective Memory Test-24 (VAMT-24), a computerized test to assess affective verbal memory (Jensen et al., 2015). Subsequently, we developed the Verbal Affective Memory Test-26 (VAMT-26) as a logical and theoretical improvement of VAMT-24, specially addressing the potential effects of word class on recall and a suboptimal proportion of affective words in VAMT-24. Compared to VAMT-24, VAMT-26 includes two more words to increase test difficulty and only nouns to control for potential memory enhancing effects of word class. Finally, VAMT-26 comprises a larger proportion of affective words to increase sensitivity in detecting affective biases (10 positive, 10 negative, and 6 neutral words).

In Part 1 of the study, we psychometrically evaluate VAMT in an extended 26-word version. In Part 2, we examine the impact of the adjustments from VAMT-24 to VAMT-26 on recall outcomes and propose a conversion algorithm to render VAMT scores comparable across different versions of VAMT. In Part 3, we examine biases in verbal affective memory in patients diagnosed with MDD and compare the biases to that of healthy controls.

Part 1. Psychometric Properties of VAMT-26

In this part of the study, we investigate the psychometric properties of VAMT-26 in a large sample of healthy controls and assess the test–retest stability after approximately 1 month.

Based on our previous VAMT-24 validation (Jensen et al., 2015), we hypothesize that (1) the distribution of VAMT-26 outcomes can be approximated by a normal distribution, (2) learning effects occur over five consecutive immediate recall (IMR) trials, (3) mean recall rates of words presented in the beginning (primacy section) and at the end (recency section) of the test will be higher compared to words presented in the middle section, and (4) VAMT-26 outcomes are positively associated with an established neuropsychological instrument assessing verbal memory.

Materials and Methods

Procedures and Participants

VAMT-26 data were acquired as part of other ongoing studies and stored in the Center for Integrated Molecular Brain Imaging (CIMBI) database. For descriptions of the CIMBI database, please see Knudsen et al. (2016). We extracted data from the CIMBI database, including healthy individuals between 18 and 65 years of age with VAMT-26 data from a first VAMT-26 measurement, and who did not undergo any experimental interventions. All individuals completed VAMT-26 in accordance with standardized VAMT-26 test administration procedures (Jensen et al., 2015). A total of 182 healthy individuals were eligible for the current study.

Across studies, exclusion criteria were a family history of neurological or primary psychiatric disorders (DSM IV Axis I or WHO ICD-10 diagnostic classifications), severe neurological or somatic illness, use of medication which could influence cognitive performance, learning disabilities, sight or hearing impairment, pregnancy, and substance and drug abuse (lifetime use of cannabis >50 times or lifetime use of any other drug >10 times). None of the healthy individuals presented with clinical levels of depression according to established Danish criteria for clinical cut off scores on the Major Depression Inventory > 21 (Olsen et al., 2004). All individuals were recruited by advertisement for different research protocols approved by the Ethics Committee of Copenhagen and Frederiksberg, Denmark (protocol numbers: H-15013578 (n = 97), H-3-2013-100 (n = 39), H-2-2014-070 (n = 12), H-15001910 (n = 8), H-16026898 (n = 17), H-15017713 (n = 8) and H-1-2014-002 (n = 1). After receiving verbal and written information about the respective studies, written informed consent was obtained prior to participation for all individuals. The included data was collected in the period from 2013 to 2018.

Verbal Affective Memory Task-26 (VAMT-26)

VAMT-26 consists of 26 nouns: 10 positive, 10 negative, and 6 neutral. The basic task design, test administration and instructions of VAMT-26 are identical with that of VAMT-24. Participants are initially informed that a series of words (list A-26) will be presented several times on the computer screen and are asked to remember as many words as possible. The procedure is repeated five times (yielding the IMR score = IMR1 + IMR2 + IMR3 + IMR4 + IMR5), yet the participants are blinded to the number of recall repetition trials. The recalled words and mistakes (i.e., words that were not presented and thus, are incorrectly “recalled”) are noted on a preformed VAMT-26 scoring sheet. Following the IMR trials, the interference list (I-26) is displayed, after which, participants are asked to recall list A-26 without seeing it, yielding short-term memory (STM) scores. After a wait period of 30 min in which other cognitive tests are administered, participants are asked to do a surprise recall of list A-26, providing long-term memory (LTM) scores. The duration of a VAMT-26 test without the wait period is approximately 25 min. The valence of all words included in list A-26 and I-26 have previously been validated (Jensen et al., 2015). Based on the extraction of a count of each word’s occurrence in a linguistic research database¹, we ensured that the overall frequency of use was not different between A-26 and I-26 and between valences in A-26. The words are displayed in a fixed order in regards to valence (1 = Positive, 2 = Negative, 3 = Neutral): 3-2-1-2-2-1-1-2-3-3-1-1-2-2-3-2-2-3-2-2-1-1-2-2-1-3. Words with similar first letters are separated by at least four other words in A-26 and I-26. Figure 1 shows an illustration of word presentation in VAMT-26. Each word is displayed for 750 ms on a computer screen, at a distance of approximately 60 cm, followed by an Interstimulus Interval (ISI) of 750 ms, displaying a fixation cross. VAMT-26 is programmed in Eprime 2.0 Professional (Psychology Software Tools, United States).

FIGURE 1

Figure 1. Illustration of the Verbal Affective Memory Test-26. Each word trial displayed a fixation cross (750 ms) and a word (750 ms) in black (font = Times, size = 40) on a gray background. The Danish [English] words presented in the figure are: Kam [comb], Haevn [revenge], Sol [sun], Løgn [lie].

VAMT-26 outcomes

We defined 9 VAMT-26 outcomes; the total number of words recalled across IMR trials 1–5 (i.e., IMR1 + IMR2 + IMR3 + IMR4 + IMR5), and within STM and LTM (e.g., LTM Total), respectively, as well as the number of positive or negative words recalled across IMR trials 1–5 and within the STM trial and the LTM trial (e.g., LTM Positive), respectively.

Neuropsychological Tests

To measure Intelligence Quotient (IQ), we used Reynold’s Intellectual Screening Test (RIST) (Reynolds and Kamphaus, 2003). To examine convergent validity of VAMT-26, we used non-affective neuropsychological tests known to be related to verbal memory; Letter-Number Sequencing (LNS) from the Wechsler Adult Intelligence Scale-III (WAIS-III) (Wechsler, 1997). From these tests, we extracted the following main outcomes: RIST index: expressed as an age-adjusted standard IQ score; LNS: total number of correctly repeated series (scores range from 0–21). A more detailed description of these neuropsychological tests can be found in Jensen et al. (2015).

Data Analyses

Descriptive statistics

We visually inspected VAMT-26 histograms and P–P plots of the data with tests of normality (Shapiro–Wilk). For outcomes with non-normal distributions, the median and interquartile range are reported instead of the mean and standard deviation (SD).

Psychometric properties

Learning and recall effects: To evaluate learning effects, we examined changes in mean word recall between each IMR list presentation (i.e., comparing IMR1 to IMR2, IMR2 to IMR3, IMR3 to IMR4, IMR4 to IMR5) with seven Wilcoxon signed-rank tests. In addition, we examined whether presentation of the I-26 list significantly decreased STM Total compared to IMR5, and whether the 30 min interval between STM and LTM trials significantly decreased LTM Total compared to IMR5 and STM Total. Primacy and recency effects: We divided the A-26 list into three sections: primacy section = words number 1–3; middle section = words number 4–23; recency section = words number 24–26. To test primacy and recency effects, we examined differences in (Mean) percentage of words recalled across the five IMR trials between primacy section and middle section and between middle section and recency section, with two paired t-tests. Internal consistency: We examined internal consistency with nine Pearson product-moment correlation coefficients between each valence for IMR, STM, and LTM performances. Test inherent affective biases: We tested whether VAMT-26 exhibits test-inherent affective biases by comparing recall for positive and negative words within IMR, STM, and LTM respectively using three Wilcoxon signed-rank tests. Ceiling effects: We evaluated ceiling effects of VAMT-26 outcomes as a recall mean less than 1.5 SD from a maximum observed score (e.g., the maximum observed score for IMR Positive), yielding a standardized distance score.

Established covariates for verbal recall

We examined whether age, sex, IQ, and educational level are associated with each of the nine VAMT-26 outcomes in nine multiple regression models.

Convergent validity

To evaluate convergent validity, we examined the relations between VAMT-26 Total outcomes and LNS with nine Pearson product-moment correlation coefficients.

Test–retest analyses

To examine test–retest stability, a sub-group of the full sample (n = 48) were administered VAMT-26 on two occasions, with the two test sessions separated by approximately 1 month (Mean = 27.7 days, range: 21–43). Retest data was not included in the analyses evaluating the psychometric properties of VAMT-26. Stability was assessed for the IMR Total, STM Total, and LTM Total score using the Bland-Altman method (Bland and Altman, 1986; see Giavarina, 2015) for a more recent introduction. This method considers two components to assess stability: unbiasedness (referred to as learning effect hereafter) and precision (small variance or degree of scatter). We expressed the learning effect as the mean difference in total number of words recalled between the first and second test. The precision was defined as the half width of the 90% limits of agreement (LOA) interval, i.e., ignoring a possible learning effect, the interval [−precision; +precision] contains the difference in words between the test and the retest of 90% of the observations. We chose 90% instead of the traditional 95% to better reflect the typical sampling error. To be consistent with the existing literature, we also report Pearson’s correlation coefficient as another measure of precision.

Correction for multiple comparisons

Unless otherwise stated, p-values were adjusted with the Bonferroni–Holm multiple comparison procedure (Holm, 1979), with the number of statistical tests carried out. An alpha level of 0.05 was adopted throughout all analyses. Statistical analyses were conducted using Statistical Package for the Social Sciences version 25.0 (SPSS).

Results

Descriptive Statistics

Descriptive information about the 182 healthy individuals included in Part 1 of the study is presented in Table 1. Descriptive information on VAMT-26 outcomes at the first test is displayed in Table 2. The mean IQ score was in the high end of the normal range. The normal distribution provided a reasonable approximation to the distribution of IMR Total, IMR Positive, and IMR Negative (Shapiro Wilks ps > 0.09). Other VAMT-26 outcome distributions were left-skewed (Shapiro Wilks ps < 0.05). Missing values were: Education: n = 3, LNS n = 2, and BMI: n = 44. BMI were not acquired in all studies from which the data from the CIMBI database originate, explaining the high number of missing values.

TABLE 1

Table 1. Descriptive information for the healthy sample in Part 1.

TABLE 2

Table 2. Descriptive information on VAMT-26 outcomes.

Psychometric Properties

Figure 2 shows recall curves for VAMT-26, and Figure 3 shows Pearson product-moment correlation coefficients between IMR, STM, and LTM valences. Descriptive information for IMR 1–5 recall trials and for primacy and recency effects are listed in Supplementary Table S1. Learning effects: Absolute recall of words increased significantly between each IMR list presentation (median difference range: 1–5, Z range = −5.8 – −11.3, ps < 0.001) (Figure 2 and Supplementary Table S1). Recall of words within the STM trial was significantly lower compared to recall of words within the IMR5 trial (median difference = −2.5, Z = −9.5, p < 0.001). The 30 min interval significantly decreased recall of words within the LTM trial compared to the IMR5 trial (median difference = −1.0, Z = −8.6, p < 0.001) but increased recall of words within the LTM trial compared to the STM trial (median difference = 1.5, Z = −3.8, p < 0.001). In post hoc analyses, we examined differences in STM and LTM recall within positive and negative words using Wilcoxon signed-rank analyses. We found a significant increase in recall for positive (median difference = 1.0, Z = −3.6, p < 0.001, unadjusted), but not for negative words (difference = 1.0, Z = −0.88, p = 0.38, unadjusted). Primacy and recency effects: Recall of the first three presented words (primacy section) was significantly higher than recall of the middle section of twenty words (median% difference = 28.2, t = 21.7, p < 0.001). The recall of the last three presented words (recency section) was also significantly higher than recall of the middle section (median% difference = 8.2, t = −6.2, p < 0.001).

FIGURE 2

Figure 2. Recall curves for the Verbal Affective Memory Test-26 (VAMT-26). ***p < 0.0001. Recall means and confidence intervals (CI) for each of the seven trials in VAMT-26. Parametric tests were used to calculate the CI displayed in the figure. IMR1–5, immediate recall trials 1–5; STM, short-term memory; LTM, long-term memory. P-values in analyses on learning effects (i.e., change in total recall of words between each IMR list presentation) were obtained using Wilcoxon sign-rank tests and adjusted for four comparisons using the Bonferroni–Holm adjustment procedure (Holm, 1979). P-values in analyses on differences in recall between IMR5 vs. STM, IMR5 vs. LTM trials, STM vs. LTM were obtained using three Wilcoxon sign-rank tests and adjusted for three comparisons using the Bonferroni–Holm adjustment procedure (Holm, 1979).

FIGURE 3

Figure 3. Correlations between each valence for IMR, STM, and LTM VAMT-26 performances. Correlation matrix plot showing Pearson product-moment correlation coefficients between all valences (i.e., positive, negative, and neutral) within IMR (i.e., words recalled across IMR trials 1–5, i.e., IMR1 + IMR2 + IMR3 + IMR4 + IMR5), STM, and LTM. IMR, immediate recall; STM, short-term memory; LTM, long-term memory.

Internal consistency

IMR, STM, and LTM scores were significantly associated within all valences: positive words (r range = 0.75–0.83, 95% CI range = 0.68–0.88), negative words (r range = 0.79–0.84, 95% CI range = 0.72–0.88) and neutral words (r range = 0.63–0.71, 95% CI range = 0.52–0.79) (Figure 3).

Test inherent affective biases

Recall of positive and negative words was not significantly different within IMR (Z = −0.2, p = 0.87), STM (Z = −0.9, p = 0.75) and LTM (Z = −1.77, p = 0.23). Ceiling effects: All standardized distance scores for VAMT-26 outcomes were above −1.6 SD from maximum score. In addition, 7% of the participants recalled all positive and negative words within STM, 9% recalled all positive words within LTM and 8% recalled all negative words within LTM (Table 2).

Established Covariates for Verbal Recall

Associations between VAMT-26 outcomes and established covariates are listed in Supplementary Table S2. Young adults showed superior recall across all VAMT-26 outcomes [beta coefficients (β) range: −0.64 – −0.05, ps ≤ 0.01). Women showed superior LTM Total (β = −1.86, p = 0.03) compared to men, which was driven by LTM Positive (β = −0.96, p = 0.01). Sex was not associated with performance on other VAMT-26 outcomes (ps ≥ 0.15). Higher IQ was associated with better recall of all VAMT-26 outcomes (β range: 0.05–0.64, ps ≤ 0.04), except for STM Negative, which was of borderline significance (p = 0.08). Educational level was not associated with any VAMT-26 outcomes (ps ≤ 0.34).

Convergent Validity

Total numbers of words recalled within IMR, STM, and LTM, respectively, were positively associated with scores on LNS (r range = 0.21–0.23, mean r = 0.22).

Test–Retest Analyses

Bland and Altman plots comparing the scores between the first and second test for IMR Total, STM Total, and LTM Total are presented in Supplementary Figure S3.

Results on learning effects showed that the mean difference (the bias) for total number of words recalled between the first and second test was: IMR Total = 16.5 (95% CI = 13.9; 19.1), STM Total = 3.9 (95% CI = 3.1; 4.6), and LTM Total = 3.1 (95% CI = 2.4; 3.8). The results of the bias indicated a significant increase in recall at the second test session and supported a learning effect. The half width of the 90% LOA interval was of 11.6 words for IMR Total, 3.2 words for STM Total, and 3.1 words for LTM Total. With respect to the achievable IMR Total score (range from 0–130) and the achievable STM Total and LTM Total scores (range from 0–26), this corresponds to a difference in remembered words between first test and second test of 8.9% for IMR Total, 12.2% for STM Total and 11.9% for LTM Total. The corresponding Pearson product-moment correlation coefficients were all large: r = 0.79 for IMR Total, r = 0.81 for STM Total, and r = 0.80 for LTM Total.

Part 2. Comparison of VAMT-24 and VAMT-26

In this part of the study, we examine the impact of change in VAMT versions on VAMT recall outcomes and propose an adjustment procedure to make recall performance independent of VAMT version.