The Musical Emotion Discrimination Task: A New Measure for Assessing the Ability to Discriminate Emotions in Music

MacGregor, Chloe; Müllensiefen, Daniel

doi:10.3389/fpsyg.2019.01955

ORIGINAL RESEARCH article

Front. Psychol. , 27 August 2019

Sec. Emotion Science

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.01955

The Musical Emotion Discrimination Task: A New Measure for Assessing the Ability to Discriminate Emotions in Music

$\r\nChloe MacGregor$ Chloe MacGregor

Daniel Müllensiefen^*

Department of Psychology, Goldsmiths, University of London, London, United Kingdom

Previous research has shown that levels of musical training and emotional engagement with music are associated with an individual’s ability to decode the intended emotional expression from a music performance. The present study aimed to assess traits and abilities that might influence emotion recognition, and to create a new test of emotion discrimination ability. The first experiment investigated musical features that influenced the difficulty of the stimulus items (length, type of melody, instrument, target-/comparison emotion) to inform the creation of a short test of emotion discrimination. The second experiment assessed the contribution of individual differences measures of emotional and musical abilities as well as psychoacoustic abilities. Finally, the third experiment established the validity of the new test against other measures currently used to assess similar abilities. Performance on the Musical Emotion Discrimination Task (MEDT) was significantly associated with high levels of self-reported emotional engagement with music as well as with performance on a facial emotion recognition task. Results are discussed in the context of a process model for emotion discrimination in music and psychometric properties of the MEDT are provided. The MEDT is freely available for research use.

Introduction

The affective experience associated with music is commonly considered a key motive for engagement in musical activities (Juslin and Laukka, 2004). Music is often used in a constructive manner, to express emotion through composition and performance, or to evoke or regulate an emotional state through listening. The amount of research contributing to an understanding of emotional processes associated with music has increased considerably over the last few decades, most of which has focused especially on the expression and induction of musical emotions (Thompson, 2009). It has been suggested that the ability to perceive musical emotions may vary across individuals, just as recognition of emotional facial and vocal expressions has been found to vary according to individual differences (Palermo et al., 2013; Taruffi et al., 2017). Though many tests have been developed to detect such differences in facial and vocal recognition (Mayer et al., 2008), equivalent tests for musical emotion recognition are considerably less common. The current study therefore aims to establish a new measure of emotion discrimination using music in order to investigate whether differences in emotional, musical, and perceptual abilities may account for variation in the perception of musical emotions.

One factor of potential influence within musical emotion decoding is emotional intelligence (EI): the ability to categorize, express and regulate one’s emotions, as well as those of others (Salovey and Mayer, 1990). EI is typically separated into two constructs for the purpose of measurement; ability EI, measured using cognitive ability tests, and trait EI, assessed via self-report methods (Petrides et al., 2004). In keeping with a recent study of emotion decoding in music (Akkermans et al., 2018), a self-report measure of trait EI was used within the current research. Differences in recognition of emotion within speech prosody have been linked to EI (Trimmer and Cuddy, 2008), signifying the potential influence of EI on musical emotion decoding, especially when considering the strong evidence for a link between the communication of emotions in speech and music (Juslin and Laukka, 2003). This supposition is further endorsed by Resnicow et al. (2004), who found a positive correlation between EI and a test of emotion recognition in which participants rated basic emotions conveyed through piano pieces. Evidence therefore indicates that differences in EI may explain variation in music-perceived emotion.

One relevant component of EI of is emotional contagion (EC) (Salovey and Mayer, 1990), which refers to ones’ tendency to be influenced by, or unconsciously mimic, others’ emotional states (Doherty, 1997). EC has mostly been investigated in relation to facial expression (Juslin and Västfjäll, 2008); for example, one study examined facial muscle responses to videos of emotional singing and found that participants tended to unconsciously imitate the emotional facial expression of the singer (Chan et al., 2013). Though less prevalent in the literature, there are also reports of contagion from vocal expression (e.g., Neumann and Strack, 2000). On account of such evidence, as well as the aforementioned notion that music’s emotional quality may be derived from its similarities to vocal expression (Juslin and Laukka, 2003), it has been speculated that EC may occur in music listening through the internal mimicking of a perceived expression (Juslin et al., 2009). This is backed up by neuroimaging research conducted by Koelsch et al. (2006); when participants were exposed to music, activation was found within areas of the mirror-neuron system which have been linked with vocal production, thought to represent the mimicking of emotions expressed by music (Juslin and Västfjäll, 2008). Such evidence indicates that EC may contribute to the categorization of emotions in music.

Though a high level of emotional ability is likely to result in a consistent level of emotion processing throughout different modalities, it is arguable that emotional ability may vary specifically in relation to music. It is thus necessary to consider an individuals’ typical level of emotional engagement with music, alongside more general measures of emotional ability, when investigating factors influencing emotion recognition. Emotional music skills can be measured using the Goldsmiths Musical Sophistication Index (GOLD-MSI) (Müllensiefen et al., 2013), a self-report tool that allows for the assessment of a wide range of musical skills and behaviors. The Emotion subscale of the Gold-MSI subscale was used in a recent study, which found that self-reported level of emotional engagement with music predicted accuracy on a musical emotion decoding task (Akkermans et al., 2018). A high level of emotional engagement with music, as measured using the Gold-MSI, may therefore be a good indicator of ability to discriminate musical emotional expression.

Musical ability has also been explored with regard to its relationship with emotional capacity (Hallam, 2010). The idea that musical expertise may enhance emotional skills seems plausible when taking into account other cognitive advantages found to result from training (Schellenberg, 2005). Accordingly, it has been suggested that enhanced musical and acoustic processing acquired through training (Kraus and Chandrasekaran, 2010) may contribute to an enhanced sensitivity to non-verbal emotions (Taruffi et al., 2017; Akkermans et al., 2018). Empirical evidence for this claim has been provided by studies conducted by Thompson et al. (2004) and Lima and Castro (2011), both of which demonstrated that a group of musicians were better able to decode emotions in speech prosody than untrained controls. However, one study carried out by Trimmer and Cuddy (2008) found little variation among individuals’ recognition of emotions in speech prosody based on their level of musical expertise (Trimmer and Cuddy, 2008). Lima and Castro (2011) point out that Trimmer and Cuddy’s (2008) findings could be accounted for by distinctions between the participants recruited for each study. The participants in both theirs and Thompson et al. (2004) study had, on average, between 8 and 14 years of musical training, whereas participants in Trimmer and Cuddy’s (2008) study had an average of 6.5 years. As it is possible that the effects of training may only manifest at a measurable level as a result of extensive training, Lima and Castro (2011) argue that this could have played a role in the lack of a discernable effect. More recent studies investigating musical emotion decoding have uncovered a positive association between decoding performance and self-reported musical expertise, providing additional support for the influence of musical training (Taruffi et al., 2017; Akkermans et al., 2018). In spite of this, further investigation is required to delineate the relationship between musical training and recognition of non-verbal emotional expression.

Given that superior emotion recognition ability could result from enhanced acoustical processing, it follows that fundamental differences in auditory perception may also influence recognition ability. The pitch and duration of musical events are important cues for interpreting emotional expression in both speech and music (Juslin and Laukka, 2003; Lima et al., 2016), meaning that differences in perceptual sensitivity may be predictive of differences in emotion perceived in music. This hypothesis is reinforced by studies of individuals with hearing impairments, who show deficits in processing of emotion in both music and speech which align with difficulties processing pitch (Wang et al., 2013) and timbral variations such as roughness (Paquette et al., 2018).

The current research was inspired by a recent replication and extension (Akkermans et al., 2018) of a study carried out by Gabrielsson and Juslin (1996). The original study investigated communication of emotion in music using a production-recognition paradigm. Firstly, professional musicians (including flautist, guitarists, violinists, and vocalists) were asked to perform three melodies several times; for each performance they were instructed to adjust their expressive intentions to convey a specific emotion (happy, sad, angry, fearful, tender, solemn, or without expression). Performance recordings were analyzed in terms of their musical and acoustic properties to identify the expressive cues characteristic of each emotion. These recordings were then used for listening experiments, in which participants were asked to identify performer-intended expressions. Results indicated that the performers’ intentions were mostly identified correctly, with a higher decoding accuracy for basic emotions, in accordance with Juslin’s (1995) hypothesis regarding the comparative ease of communicating basic versus complex emotions. In the replication study, emotional and musical individual differences were assessed with regard to their influence on emotion-decoding ability (Akkermans et al., 2018). Participants’ ability to accurately decode musical emotions was found to be associated with their level of musical training.

The main objectives for the current study were: firstly, to develop a short and effective Musical Emotion Discrimination Task (MEDT), which tests an individuals’ ability to decode emotions in music using a simple response format. Secondly, to examine individual differences in EI, EC, musical training, and emotional engagement in relation to their influence on perceived emotion in music, and finally, to extend previous research by investigating the contribution of low-level auditory ability to emotion decoding performance. Three experiments were carried out. Experiment 1 consisted of a preliminary MEDT, in which two excerpts of the same melody were presented per trial, which differed only in terms of emotional expression conveyed through the performance. Excerpts differed between trials in terms of musical features such as length, instrument, melody, target emotion and comparison emotion. Item difficulty was assessed with regard to the contribution of these features, with the expectation that they would affect task performance as found previously (Akkermans et al., 2018). Furthermore, this analysis informed a shorter test of emotion discrimination by allowing for the calibration of overall test difficulty. The short MEDT was then tested and further refined in experiment 2. The test was employed alongside other measures of relevant abilities to allow for a preliminary assessment of test validity. It was hypothesized that participants’ superior emotional, musical and perceptual abilities, would coincide with a superior ability to decode performer-intended emotions. Experiment 3 was conducted in order to further establish the usefulness of the test as a measure of musical emotion decoding by investigating the overlap between the MEDT and measures of general emotion abilities such as emotion recognition from facial and vocal stimuli, and of emotion deficits such as alexithymia. Accordingly, it was expected that performance on general emotion recognition tasks would be positively linked with MEDT performance, while self-reported levels of alexithymia would be negatively related to performance.

To better understand how such individual differences might impact on emotion decoding, it is useful to view them as part of a cognitive process model. The following therefore describes a simple model that can be used to understand the processes underlying the decoding of music-expressed emotions (see Figure 1), which can be used to account for the influence of other relevant cognitive abilities. At the first stage, a listener must perceive an external musical stimulus and extract expressive auditory cues such as tempo, articulation, or dynamics. Next, the listener must meaningfully identify these cues by matching them to stereotypical expressions of musical emotion. This process is thought to rely on general emotion processing mechanisms responsible for the understanding of emotional sounds, such as those engaged within the processing of speech prosody (Juslin and Laukka, 2003), as well as on schemas built through previous music listening or music performance experience. Finally, the listener can use the information gained from these cues through the matching process to facilitate an emotional understanding of the stimulus. For example, an individual may listen to a musical piece with a slow tempo and (subconsciously) extract this as an expressive cue. Due to previous associations with sad music and the potential overlap with characteristics of sad vocal expression, this feature may be linked with a stereotypical expression of sadness and could therefore cause them to identify the piece as sad.

FIGURE 1

Figure 1. A diagram to illustrate the cognitive model proposed to underlie emotion recognition in music as relevant to the testing paradigm of the MEDT. The rectangles reflect covert processes that cannot easily be directly measured or controlled, while the parallelograms represent processes that can be manipulated and studied.

This model is informed by current literature exploring the extent to which processes involved in the perception and interpretation of acoustic cues of emotion in speech and music are shared. There are numerous examples of overlaps between emotional cues used in speech and music (Juslin and Laukka, 2003). For instance, Curtis and Bharucha (2010) provide an analysis of vocal portrayals of emotion performed by American actors, which reveals the prominence of minor third intervals in portrayals of sadness theorized to occur as a result of physiological effects of emotion. Interestingly, minor third intervals are commonly interpreted to represent sad expression in music. Cross-cultural research has also demonstrated significant cross overs between emotional cues in both eastern and western music and vocal expressions, displaying that positive emotions are conveyed using large melodic or prosodic intervals compared to smaller intervals used to convey negative emotions (Bowling et al., 2012). Though these findings relate exclusively to melodic intervals, which are controlled in the current study through use of the same melody between emotions, there is also strong evidence for the impact of emotional properties such as rate and intensity on the processing of speech and music, where fast paced, loud speech is interpreted to be similar in valence to fast paced, loud music (for example, Ilie and Thompson, 2006). Such evidence reinforces earlier ideas put forward by Juslin (1997) within the functionalist perspective which stipulate that similarities between vocal and musical expression rely on shared communicative systems within the brain which are present from birth and are strengthened through social interaction. The present model thus endorses the possibility of shared processing as hypothesized by Juslin (1997), and the following investigation aims to determine whether this is mirrored by positive correlations between performance on musical and vocal emotion recognition tasks.

It is speculated that specific cognitive abilities may only play a role at certain processing stages within the proposed model. For example, perceptual ability can only influence early auditory processing, whereas EI is likely to have more impact at later stages involving more general emotion mechanisms responsible for the processing of vocal and facial emotions. Emotional contagion and alexithymia are also higher-level processes, involved in later cognition, although their effect may be more restricted to individual processing stages. For example, alexithymia, a condition associated with impairments in verbal formulation of emotion (Taruffi et al., 2017) is expected to only impact upon the final phase involving labeling of music perceived emotions. Musical training, on the other hand, has the potential to impact all stages of processing. Previous research has demonstrated the effect of training on both the perception of music (Musacchia et al., 2007; Kraus and Chandrasekaran, 2010) as well as on higher-level processes such as emotion decoding (Akkermans et al., 2018). The model below illustrates how individual differences are hypothesized to impact upon different stages of processing and therefore provides a starting point for the following investigation (see Figure 2).

FIGURE 2

Figure 2. A diagram displaying the contribution of individual differences (in circles) at different stages of a cognitive model proposed to underlie emotion recognition in music. The diamond shapes highlighted in purple represent cognitive mechanisms thought to underlie the operation of particular processes.

Experiment 1

Method

Participants

Seventy seven participants were recruited online via social network platforms and the Goldsmith’s research participation scheme; only those recruited through the research scheme received compensation, which was administered in the form of course credit. Participants ranged from 18 to 80 years of age (M = 37.06, SD = 22.65), and included 26 females, 18 males, and 2 individuals who preferred to withhold gender information and 31 who did not provide demographic information. The current study was granted ethical approval by Goldsmith’s Research Ethics Committee.

Materials

Stimuli recording

For the MEDT, melodies B and C from Gabrielsson and Juslin’s (1996) study were employed. Melody B is a Swedish folk melody in F major which spans a two-octave range and is entirely diatonic (see Figure 3), while Melody C was composed specifically for use within their research. Melody C is in G harmonic minor, spans two octaves and contains a few chromatic notes (see Figure 4).

FIGURE 3

Figure 3. Notation of melody B (1).

FIGURE 4

Figure 4. Notation of melody C (2).

Hereafter, melody B will be referred to as melody 1, and melody C as melody 2. The musical extracts utilized in the current study were re-recordings of the stimuli first validated by Gabrielsson and Juslin (1996). The replication study carried out by Akkermans et al. (2018) validated the re-recorded versions of the stimuli. In the present study, only recordings that conveyed angry, happy, sad, and tender expressions on piano, violin, or voice were used, as findings indicated these tended to be identified most accurately by listeners (Akkermans et al., 2018). On average, angry excerpts were 15s long, while happy excerpts were 16s, sad excerpts were 35s, and tender excerpts were 31s. Duration and tempo for each of the melodies selected for the current experiment is provided in Table 1. Tempo was estimated manually by extracting the average beats per minute (bpm) across the entire clip to account for the performers’ use of rubato.

TABLE 1

Table 1. Stimulus properties of the melodies from Akkermans et al. (2018) employed in the current study.

Acoustic analyses of these clips were carried out by Akkermans et al. (2018), who found that emotions tended to possess distinct acoustic features; the key features of angry excerpts were high amplitude, fast tempo and greater roughness, happy excerpts were most similar to angry excerpts though did not display such high roughness, sad excerpts exhibited slow tempos and low amplitude, and tender excerpts displayed similar acoustic properties as those conveying sadness. For further detail on the acoustic properties of the stimuli readers are referred to the paper by Akkermans et al. (2018) which provides a comprehensive analysis.

Stimuli editing

Recordings were edited in order to establish a greater variation in terms of difficulty. This was achieved by splitting audio files into musically meaningful phrases using Adobe audition CC. Melody 1 was split into four 4-bar phrases, while melody 2 was split into six 2-bar phrases. Subsequently, audio files were produced from all possible combinations of consecutive phrase sequences. For example, one clip of melody 1 was edited to produce 10 separate clips: four one-phrase clips, three two-phrase clips (e.g., 1 and 2, 2 and 3, and 3 and 4), and three three-phrase clips (e.g., 1, 2 and 3, and 2, 3, and 4). Excerpts were then paired, each pair featuring the same combination of musical phrases played on the same instrument by the same performer but two contrasting performances aiming to convey distinct emotions. These pairs were combined into a single mp3 file using SoX (sound exchange) software¹, with a buzzer sound inserted in-between. Thus, 1116 items were produced that featured two clips with the same melody, instrument, and phrases, but differing emotional expressions.

Musical emotion discrimination task

The MEDT initially consisted of 112 items, selected to represent the larger corpus of 1116 items. These were selected at random, under the condition that each musical feature under assessment must be equally represented, for example half of the extracts were melody 1 and half melody 2. Correspondingly, a third were played on the piano, a third on the violin and a third sung. From the pool of 112 selected test items, 21 were randomly presented to each participant. Responses were collected using a two-alternative forced choice (2-AFC) format.

Depression screening

The Patient Health Questionnaire (PHQ-9), a short, self-administered survey, was used to assess depression severity (Kroenke and Spitzer, 2002). This measure consists of nine items related directly to the DSM-IV diagnostic criteria.

Procedure

This experiment was conducted online which allowed for automatic administration of the information sheet, consent form, depression screening, MEDT, demographics form, and debrief. For the MEDT, participants were told that for each trial, they would hear two versions of a melody that would differ in terms of emotional expression. They were instructed to indicate which version they felt was most representative of a given emotion. Each participant was exposed to 21 audio clips and instructed as follows: “Please listen to the following clips and select which one sounds happier to you. Select 1 for the clip heard before the buzzer, or 2 for the clip heard after the buzzer.” The attribute happier would be exchanged for the target emotion of the item. The task took around 15–20 min to complete.

Results

From the initial sample of 77 participants, 34 participants were excluded from analysis, as they had not fully completed the experiment. Additionally, 10 participants were excluded as their scores were above the typical cut off point (≥10) in the depression screening (Manea et al., 2012), on account of the previous finding that depressed individuals display difficulties processing emotions in music (Punkanen et al., 2011).

Musical Features

Mixed-effects logistic regression analyses were performed to determine whether the stimulus features length and target emotion influence the correctness of participant responses and to account for random effects resulting from differences between participant abilities. Binary item responses (0 = incorrect and 1 = correct) served as dependent variable, target emotion and length were treated as predictors and participant ID was used as random factor. The binomial mixed effects model (i.e., logistic regression models) including only the random factor was compared against two models that contained length or target emotion as predictors. Likelihood ratio tests indicated that each predictor contributed to the accuracy of the model beyond the random factor (for model including length p = 0.011, for model including target emotion p = 0.002). The significant contribution of both length and target emotion to participant accuracy was confirmed by type II Wald Chi-square tests on the predictors of the final model including target emotion (χ²(3) = 13.6, p = 0.003) and length (χ²(1) = 5.6, p = 0.018). As expected, length showed a significant positive effect on item easiness. This means that longer items generated more correct responses. Target emotion also affected item difficulty/easiness with happy and tender items being significantly more difficult than angry items (p = 0.012 and p = 0.011) which served as the reference category. In contrast sad items did not affect item difficulty/easiness compared to angry items (p = 0.764).

Discussion

The focus of experiment 1 was to determine whether variation of musical features affected item difficulty. Two features of significance were: extract length (i.e., number of phrases featured in the audio clip), and target emotion (i.e., the emotion expressed within the extract that participants were asked to identify). Excerpts featuring happy or tender as a target emotion were more difficult than angry items, and shorter items were more difficult than longer items, according to logistic regression models.

These findings were used to inform a shorter test of emotion discrimination ability for use in experiment 2. It is likely that shorter excerpts were harder to discriminate owing to the fact they contain fewer expressive cues, thus only items featuring only one phrase of the melody were retained in order to adjust the overall level of correct responses from 83.4 to 75%, halfway between optimal and chance performance. Items containing two or three melodic phrases were eliminated. As a result, there were few items remaining that featured melody 2, owing to the initial stimuli selection process. Results indicated that melody 1 had a lower overall percentage of correct responses than melody 2; hence, all items featuring melody 2 were removed. Despite finding that item difficulty was influenced by target emotion, this variable was not used as criteria for item elimination in order to maintain the range of possible target and comparison emotions across the test. Therefore, the shortened test was comprised of 28 items, which differed in terms of target emotion, comparison emotion and instrument.