A novel task to evaluate irony comprehension and its essential elements in Spanish speakers

Valles-Capetillo, Elizabeth; Ibarra, Cristian; Martinez, Domingo; Giordano, Magda

doi:10.3389/fpsyg.2022.963666

ORIGINAL RESEARCH article

Front. Psychol., 22 November 2022

Sec. Psychology of Language

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.963666

A novel task to evaluate irony comprehension and its essential elements in Spanish speakers

Elizabeth Valles-Capetillo¹

Cristian Ibarra¹

Domingo Martinez²

Magda Giordano^1,2^*

¹Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Mexico
²Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México, Querétaro, Mexico

An ironic statement transmits the opposite meaning to its literal counterpart and is one of the most complex communicative acts. Thus, it has been proposed to be a good indicator of social communication ability. Prosody and facial expression are two crucial paralinguistic cues that can facilitate the understanding of ironic statements. The primary aim of this study was to create and evaluate a task of irony identification that could be used in neuroimaging studies. We independently evaluated three cues, contextual discrepancy, prosody and facial expression, and selected the best cue that would lead participants in fMRI studies to identify a stimulus as ironic in a reliable way. This process included the design, selection, and comparison of the three cues, all of which have been previously associated with irony detection. The secondary aim was to correlate irony comprehension with specific cognitive functions. Results showed that psycholinguistic properties could differentiate irony from other communicative acts. The contextual discrepancy, prosody, and facial expression were relevant cues that helped detect ironic statements; with contextual discrepancy being the cue that produced the highest classification accuracy and classification time. This task can be used successfully to test irony comprehension in Spanish speakers using the cue of interest. The correlation of irony comprehension with cognitive functions did not yield consistent results. A more heterogeneous sample of participants and a broader battery of tests may be needed to find reliable cognitive correlates of irony comprehension.

Introduction

Pragmatics studies the role that language plays in social communication and how contextual elements can facilitate this process. Pragmatic abilities have been described as the proficiency to communicate, express, and recognize intentions (Scott-Phillips, 2017). They represent a key process in human communication, allowing people to distinguish between the possible alternative interpretations of the linguistic information they receive (Bosco et al., 2017). Alteration in social communication has been reported in several disorders, for example: the Social Communication Disorder and the Autism Spectrum Disorder (American Psychiatric Association, 2013). One of the most difficult communication forms to understand is irony (Wilson and Sperber, 1981), therefore it has been proposed that it can be a useful indicator of pragmatic abilities (Caillies et al., 2014). Irony plays different roles during communication; it serves to indirectly convey feelings (Shamay-Tsoory et al., 2005), express courtesy, emotion, or humor, and enhance criticism (Milanowicz, 2013). It has been reported that ironic statements are used in approximately 7% of the conversational turns in everyday conversation (Tannen, 2005), and 8% during conversations with friends (Gibbs, 2000).

One of the most utilized theories to understand irony is the standard pragmatic view (Grice, 1975), which proposes that when an ironic statement is comprehended, the receiver or listener of the message first constructs the literal interpretation, and when it becomes apparent that the literal interpretation is not compatible with the context, the ironic interpretation is established. From this view, ironic interpretation requires more effort, resources, and time from the listener. In opposition with Grice, Gibbs (1994) proposed the direct-access view theory. This theory assumes that the contextual and lexical information is processed interactively in early stages, and if context supports an ironic interpretation, this can be activated directly, without the need for the literal interpretation to be computed first (Gibbs, 1994). Compared with the standard pragmatic view, the direct-access view suggests that irony does not require more time from the receptor. Likewise, the graded salience hypothesis states that salient meanings are activated initially, giving a limited role to context. Giora defined salience as “the accessibility of meanings of words or collocations out of context.” If there are salient cues that support the ironic interpretation, it would be computed first (Giora, 1997).

In addition to the above theories, Attardo (2000) proposed that certain psycholinguistic properties are important for the identification of ironic statements. One of them is the relevance that a statement has to its context. Another is the appropriateness of a statement to its context, which indicates whether the linguistic information of the statement is compatible with the information available in the context. A third property is the speaker’s intention. In the case of ironic statements, the intention is that the listener detects the true message (i.e., ironic). According to this view, ironic statements are relevant, inappropriate to the context, and are used by the speaker to convey the true meaning to the listener (Attardo, 2000).

Pexman (2008) proposed the constraint satisfaction model for the processing of ironic statements. According to this model, cues activated by a statement “are processed rapidly and in parallel and an ironic interpretation is considered as soon as there is sufficient evidence that it might be supported “(Pexman, 2008, p. 287). The correct selection of the intended meaning depends on the adequate functioning of the speech recognition system, and on the cues that are activated by the statement including event comprehension (outcome and history), statement valence, the frequency of irony usage in a situation, the speaker’s attitude (e.g., facial expression and prosody), and the listener’s expectations. These elements are supported by the Theory of Mind (ToM), executive functions, and the listener’s experience with irony (Pexman, 2008).

The ToM is the ability to represent mental states of oneself and others, such as desires, beliefs, emotions, and intentions (Premack and Woodruff, 1978). Because the linguistic code may not be enough to represent the full meaning of language during social communication, ToM plays an important role filling this gap (Bohrn et al., 2012; Spotorno et al., 2012). Executive functions include the ability to inhibit unwanted behaviors, to update information or strategies to solve problems (Miyake et al., 2000). Executive functions measures can predict the pragmatic performance in patients with brain injury (Bosco et al., 2017), thus it has been proposed that these functions are relevant for pragmatic comprehension⁠. In older adults it has been reported that the identification of irony has an association with inhibitory control, mental flexibility and working memory (Gaudreau et al., 2015).

With regard to the cues for the identification of irony, the discrepancy between the context and the statement is considered a relevant cue (Kreuz and Link, 2002). Other cues that can facilitate the identification of irony are prosody (Wang et al., 2006) and facial expression (Akimoto et al., 2014). The acoustic parameters associated with prosody in irony are lower fundamental frequency (F0; Rockwell, 2001; Peters et al., 2016), changes in F0 (Milosky and Ford, 1997; Cheang and Pell, 2009; Bryant, 2010; Li et al., 2013; Deliens et al., 2017; Rivière et al., 2018), greater intensity (Rockwell, 2001; Li et al., 2013; Peters et al., 2016; Deliens et al., 2017), and slower speech rate (Rockwell, 2001; Cheang and Pell, 2009; Bryant, 2010; Li et al., 2013; Peters et al., 2016; Voyer and Vu, 2016; Deliens et al., 2017). The facial information that supports ironic comprehension includes smiling, raised eyebrows, eye-rolling, winking, and squinting eyes (Rockwell, 2001; Attardo et al., 2003; Caucci and Kreuz, 2012).

The neural correlate of irony comprehension has been studied using different psychophysiological tools such as electrophysiological (EEG) recordings, eye-tracking and functional magnetic resonance imaging (fMRI) (see Fabry, 2021). The literature on neuroimaging (fMRI) of irony comprehension is relatively modest, only 12 studies have been published since 2004, and none have used Spanish as the natural language (see review by Reyes-Aguilar et al., 2018). The tasks that have been used involve mostly written scenarios followed by an ironic or non-ironic utterance which the participants are asked to judge. The results of a meta-analysis of these studies showed that understanding irony requires the left language network and areas that participate in ToM (Reyes-Aguilar et al., 2018). Furthermore, the results of this meta-analysis suggested that the natural language employed may be relevant for pragmatic language processing (Reyes-Aguilar et al., 2018).

With these antecedents in mind, we aimed to create a task that evaluated the identification of ironic statements in Mexican adults that could be used for subsequent fMRI experiments. We used three cues: contextual discrepancy, prosody and facial expression. To select the cue that would lead participants in fMRI studies to identify a stimulus as ironic in a reliable way, the three cues were evaluated independently. First, we created the statements, i.e., ironic, literal, unrelated and white lies, and their accompanying contexts. Second, we assessed the psycholinguistic properties of the statements which included comprehensibility, relevance, appropriateness, sincerity, and emotional valence; all according to the context in which they were used. We also evaluated if the contexts were comprehensible. Third, we selected acoustic parameters and facial expressions indicative of irony, and evaluated if they were correctly identified. Fourth, we compared contextual discrepancy, prosody and facial expression in terms of the classification accuracy and classification time of ironic statements. Finally, to assess the relationship between ironic statement identification and cognition we applied a battery of psychometric tests that evaluate cognitive processes that have been associated with irony identification.

Materials and methods

Construction of linguistic stimuli

Contextual discrepancy

For the ironic identification task, 56 social contexts, and 14 statements were created. Each statement was associated with four different categories of social contexts. Each category of context creates an environment that modifies the interpretation of the statements (e.g., ironic). In each context, two adults of the opposite sex and the same social standing (e.g., colleagues, classmates) interact, and one of them utters the statement. The stimuli were created in Spanish, the context was 30 to 40 words long, statements were 3 to 6 words long. The operational definitions for each category of statement are as follows:

• Ironic: a statement that is relevant, meaning it has relation to the context. The information presented in the context differs from the message conveyed in the statement. The speaker intends the statement to be interpreted as ironic, i.e., to convey irony.

• Literal: a statement that is relevant, and appropriate, meaning the information presented in the context is compatible with the message conveyed in the statement. The speaker intends the statement to be interpreted literally.

• Unrelated: a statement that has no relation to the context. The information in the context disagrees with the message conveyed in the statement. There is no intention on the part of the speaker.

• White lies: a statement that is relevant, meaning it has relation to the context. The information presented in the context differs from the message conveyed in the statement. The speaker has the intention to hide the truth.

The following is an example of the target statement: “You are playing very well.” The context used to turn it into an ironic statement was: “Paco is playing soccer and Karla is watching him. Paco is playing terribly and scores an own goal. They both believe that Paco is obviously playing badly. At halftime, Paco approaches Karla. Karla tells him: You are playing very well.” The context used to turn it into a literal statement was: “Omar is playing cards with Lluvia. Lluvia has won almost every game. Lluvia is very cheerful because she is winning. Omar thinks that Lluvia is playing very well. Omar tells Lluvia: You are playing very well.” The context used to turn it into an unrelated statement was: “Verónica and Saúl are at a piano recital. The presentation is flawless and moving. Both are satisfied with the presentation. Saúl asks Verónica what she thinks of the recital. Veronica responds: You are playing very well.” And for white lies, the context was: “Paulina is teaching Marcos chess. Marcos makes bad moves and is losing. Paulina sees Marcos excited and does not want to discourage him. Marcos asks her how he’s playing. Paulina answers: You are playing very well.”

Because we used the same statements for all four context categories, only the word length of the contexts was analyzed; a one-way ANOVA showed no significant effect of context category on word length.

Materials and procedure

The next step was to validate if the stimuli were consistent with the psycholinguistic properties that were expected and to assess if the stimuli were accurately detected. The psycholinguistic properties evaluated were the comprehensibility of the context (without considering the statement); relevance, if the statement had relation to the context; sincerity, if the speaker wanted the listener to know the truth; appropriateness, if the statement was congruent with the contextual information; and emotional valence, if the statement, when read in a particular context, evoked a positive or negative feeling. Also, participants were asked to classify the intention of the statement according to the context (i.e., ironic, literal, unrelated, or white lies; see Figure 1).

FIGURE 1

Figure 1. Graphical depiction of the experimental procedure. Columns show the phases of each experiment and rows depict the cues that were evaluated (i.e., contextual discrepancy, prosody and facial expression). Details about each step are available in their respective sections. AU = Action Units.

The stimuli were organized into three booklets, each one evaluated by a separate sample of 30 participants. The psycholinguistic properties were ranked on a Likert scale of 1 to 4 points. To encourage scores to be assigned carefully, some properties ranged from higher to lower (i.e., 1 = higher comprehensibility, appropriateness, and emotional valence) and others from lower to higher (i.e., 1 = lower relevance and sincerity). The intention was classified by selecting among the four categories of statements (i.e., ironic, literal, unrelated, or white lie). Participants were asked to read the definitions of the statements that were on the first page of the booklets (for definitions of statements categories, see previous section). According to the results, 14 contexts were not understandable and had to be modified to improve their comprehensibility. The 14 modified contexts were evaluated by a different sample of 30 participants using a fourth booklet. Then, an independent sample of participants ranked how ironic they considered the ironic statements using a Likert scale of 7 points (1 = less ironic, 7 = more ironic).

Participants

Participants were asked to sign an informed consent form to participate in the study and to fill in a general data form with information about their level of education, sex, and age. Participants were undergraduate or graduate Spanish-speaking students that reported no psychiatric or neurological disorders. Considering the four booklets, the stimuli were evaluated by 120 participants, with a mean age of 22.91 ± 3.82 (booklet 1 = 22 F, 8 M, mean age 22.06 ± 3.34; booklet 2 = 20 F, 10 M, mean age 21.57 ± 2.57; booklet 3 = 21 F, 9 M, mean age 23.03 ± 4.00; and booklet 4 = 20 F, 10 M, mean age 23.86 ± 4.63). An additional sample of 45 participants (22 Female, mean age of 26.69 ± 5.83) ranked the 14 ironic statements, in terms of how ironic they found them.

Statistical analysis

For aesthetic reasons and ease of reading, results for all psycholinguistic properties were homogenized. Thus, the scores for comprehensibility, appropriateness, and emotional valence were inverted (i.e., from 1 = higher to 1 = lower). The statistical analyses were performed using R software (version 3.6.3; R Core Team, 2020) through the graphical interface of RStudio (version 1.1.447; RStudio Team, 2019). First, the descriptive statistics of classification accuracy and psycholinguistic properties were computed. The percentage and standard deviation are presented for the classification accuracy; the median (Mdn) and the interquartile range (IQR) are reported for the psycholinguistic properties.

Additionally, to analyze if the identification of statement categories could be predicted by scores of relevance, appropriateness, and sincerity, a multinomial logistic regression, was calculated and a model was designed (multinom function from the nnet package; version 7.3–17, Venables and Ripley, 2002). According to the recommendations to perform this analysis (Venables and Ripley, 2002), the data was split into two datasets, the first one was used to train the model (80% of data), and the second to validate the model (20% of data). The model was calculated four times. First, with all the statements of the four categories. Then, considering the ratings of how ironic the statements were rated, they were split into two categories: less ironic (statements: 1, 3, 5, 36, 44, 53, 55, Mdn = 6) and more ironic (statements: 10, 52, 15, 17, 22, 46, Mdn = 7). Considering these two categories (i.e., less and more ironic), the model was calculated excluding the more ironic statements; then, excluding the less ironic statements. For each category (i.e., less or more ironic) a Monte Carlo simulation, with 5,000 replications of the model, were calculated and the mean accuracy of those simulations are reported. Finally, excluding six ironic statements randomly (i.e., regardless if they were less or more ironic) 5,000 replications of Monte Carlo simulation of the model were calculated.

Results

Results indicated that all contexts were comprehensible, and that all categories met the desired psycholinguistic properties according to their operational definition. Percentage of classification for each category was as follows: ironic statements (57.14 ± 49.55), white lies (84.76 ± 35.98), unrelated (86.06 ± 34.68) and literal (95.95 ± 19.73). Regarding the psycholinguistic properties, ironic statements were identified as comprehensible (Mdn = 4, IQR = 0), relevant (Mdn = 3, IQR = 2), insincere (Mdn = 1, IQR = 1), inappropriate (Mdn = 2, IQR = 2), and with neutral emotional valence (Mdn = 3, IQR = 2). Literal statements were rated as comprehensible (Mdn = 4, IQR = 0), relevant (Mdn = 4, IQR = 1), sincere (Mdn = 4, IQR = 0), appropriate (Mdn = 4, IQR = 0), and with positive emotional valence (Mdn = 4, IQR = 0). The unrelated statements were identified as comprehensible (Mdn = 3, IQR = 2), irrelevant (Mdn = 1, IQR = 1), insincere (Mdn = 1, IQR = 1), inappropriate (Mdn = 1, IQR = 1), and with neutral emotional valence (Mdn = 2, IQR = 2). The white lies were rated as comprehensible (Mdn = 4, IQR = 0), relevant (Mdn = 2, IQR = 2), insincere (Mdn = 1, IQR = 0), inappropriate (Mdn = 2, IQR = 2), and with neutral emotional valence (Mdn = 2, IQR = 2; see Figure 2).

FIGURE 2

Figure 2. Radar chart showing the psycholinguistic properties associated with each statement category. The scores range from 1, which means less to 4, which means more. A sample of 120 participants rated the stimuli. All categories met their expected psycholinguistic properties. See text for additional details.

Results from the first multinomial logistic regression model analysis (with all the statements) showed that the model in the training dataset had a 68.06% classification accuracy, and the validation dataset had a 59.32% classification accuracy. The statement category with the highest classification accuracy was literal (training = 90.18%, validation = 93.44%), followed by unrelated (training = 78.21%, validation = 79.63%), white lies (training = 77.51%, validation = 76.19%); and ironic had the lowest classification accuracy (training = 9.47%, validation = 10.78%). Because the ironic statements had the lowest classification accuracy, in order to try to increase the accuracy, the model was calculated three more times, considering the categories less and more ironic (see 2.1.1.1). The performance of the second model, excluding the more ironic statements, showed that the training dataset had a 73.36% classification accuracy, and the validation dataset had a 63.72% classification accuracy. The performance of the third model, excluding the less ironic statements, showed that the training dataset had a 71.45% classification accuracy, and the validation dataset had a 64.25% classification accuracy. The performance of the fourth model, excluding six ironic statements randomly, showed that the training dataset had a 68.07% classification accuracy, and the validation dataset had a 58.96% classification accuracy. In sum, the model had a 59.32% classification accuracy; accuracy increased when the ironic statements were split into less (63.72%) and more ironic (64.25%), and it decreased when the degree of irony was not controlled (58.96%).

Recording of acoustic stimuli to test the effect of prosody

Stimulus recording

A total of 40 statements were used including the 14 statements from the contextual discrepancy experiment and 26 new ones created using the previously described methods (see 2.1.1). The statements were recorded by two professional actors, a man and a woman with experience in voice modulation. Each stimulus was recorded by both actors using three different intonations: ironic, literal, and unrelated. For ironic statements, the actors were asked to read with an ironic intonation; for literal statements, they were asked to read as if they really believed what the statements said; and for unrelated statements, the actors were asked to read without intonation. A total of 240 statements were recorded. To select the stimulus that had the expected intonation, two the judges were the coauthors E.V and C.I, and they were blinded to the classification of the statements. The judges classified the intention of the stimuli. Of the 240 audios, 57 were excluded because they did not meet the expected intonation, according to the judges. Of the 183 remaining audios, 47 were judged as ironic (23 female voices), 66 as unrelated (27 female voices), and 70 as literal (37 female voices). This was followed by the evaluation of the acoustic parameters that characterized each intonation.

Selection of acoustic parameters

A systematic search was performed to select the relevant acoustic parameters for irony. Following the PRISMA guidelines (Moher et al., 2009) 141 articles that studied the acoustic parameters of irony were identified in the Web of Science (Clarivate Web of Science. © Copyright Clarivate 2019) database. The keywords used were “irony” and “sarcasm,” combined with “prosody,” “prosodic,” and “intonation.” Seventy-seven records remained after duplicates were removed. Of the 77 records, 46 did not meet the inclusion criteria, 44 did not associate irony with prosody, and two were chapters of books. Of the remaining 31 articles, nine were excluded because they did not use acoustic markers (7 articles), one was a review, and another did not use prosodic modulation (1 article). Based on the 22 remaining articles we found that in terms of F0, six articles reported a lower F0, six found differences in range, and three articles indicated unspecified variations. For the intensity of voice, 12 articles reported an increase in intensity (Rockwell, 2001; Li et al., 2013; Peters et al., 2016; Deliens et al., 2017). Concerning speech rate, 16 articles reported a slower speech rate and three longer syllables. In conclusion, articles that study ironic statements consistently report changes in the F0, intensity, and speech rate. Thus, these parameters were selected as the acoustic parameters for analysis.

Acoustic analysis

Once the acoustic parameters had been selected, noise reduction was performed using the noise reduction parameters recommended by the Audacity program (version 2.2.1) (Audacity Team, 2018). The analyses were performed in R (R Core Team, 2020) using the PraatR library (Albin, 2014), which carries out the analysis from Praat (version 6.0.37) (Boersma and Weenink, 2021). From the 183 audios the median and range were extracted for the F0 (Hz) and intensity (in decibels, dB), also speech rate was calculated by dividing the duration of the audio (seconds, s) by the number of words in the linguistic stimulus.

A Kruskal–Wallis test followed by Dunn’s test of multiple comparisons with Bonferroni correction showed that there were differences between statement categories in median F0 (H(2) = 54.19, p < 0.001), F0 range (H(2) = 15.68, p < 0.01), median intensity (H(2) = 16.58, p < 0.01), and median speech rate (H(2) = 51.26, p < 0.001). Intensity range did not show significant differences. The pairwise comparisons (see Figure 3) showed significant differences (p < 0.01) in F0 medians between ironic and unrelated statements, and between literal and unrelated. For the F0 range, there were differences between ironic and unrelated statements. Likewise, for median intensity, differences were found between ironic and unrelated and between literal and unrelated (p < 0.001). For intensity range there were no differences between statements. And for mean speech rate, there were differences between ironic and literal, and between ironic and unrelated (p < 0.001). The results indicate that the statements can be indeed distinguished by their acoustic patterns. More specifically, F0, intensity and speech rate distinguish the ironic intonation from the unrelated, while speech rate distinguishes the ironic from the literal intonation (see Figure 3).

FIGURE 3

Figure 3. Acoustic parameters by statement category. The statement categories can be differentiated by their acoustic pattern. Panels (A, B) show the median and range for the fundamental frequency (F0) in Hz. Panels (C, D) show the median and range for intensity in decibels. Panel (E) shows the speech rate (duration in s/number of words). Plots show the density curves and the box plots show the median (dark circle), mean (thick line), interquartile range (rectangle), and the lower/upper adjacent values (black lines stretched from the rectangle), and scatter plot. Significant differences between categories are indicated.

Selection of facial expressions