Emblem Gestures Improve Perception and Evaluation of Non-native Speech

Billot-Vasquez, Kiana; Lian, Zhongwen; Hirata, Yukari; Kelly, Spencer D.

doi:10.3389/fpsyg.2020.574418

ORIGINAL RESEARCH article

Front. Psychol. , 22 September 2020

Sec. Cognition

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.574418

This article is part of the Research Topic Gesture-Speech Integration: Combining Gesture and Speech to Create Understanding View all 17 articles

Emblem Gestures Improve Perception and Evaluation of Non-native Speech

$\r\nKiana Billot-Vasquez,$ Kiana Billot-Vasquez^1,2

Zhongwen Lian^2,3

Yukari Hirata^2,3,4

Spencer D. Kelly^1,2,3*

¹Department of Psychological and Brain Sciences, Colgate University, Hamilton, NY, United States
²Center for Language and Brain, Hamilton, NY, United States
³Linguistics Program, Colgate University, Hamilton, NY, United States
⁴Department of East Asian Languages, Colgate University, Hamilton, NY, United States

Traditionally, much of the attention on the communicative effects of non-native accent has focused on the accent itself rather than how it functions within a more natural context. The present study explores how the bodily context of co-speech emblematic gestures affects perceptual and social evaluation of non-native accent. In two experiments in two different languages, Mandarin and Japanese, we filmed learners performing a short utterance in three different within-subjects conditions: speech alone, culturally familiar gesture, and culturally unfamiliar gesture. Native Mandarin participants watched videos of foreign-accented Mandarin speakers (Experiment 1), and native Japanese participants watched videos of foreign-accented Japanese speakers (Experiment 2). Following each video, native language participants were asked a set of questions targeting speech perception and social impressions of the learners. Results from both experiments demonstrate that familiar—and occasionally unfamiliar—emblems facilitated speech perception and enhanced social evaluations compared to the speech alone baseline. The variability in our findings suggests that gesture may serve varied functions in the perception and evaluation of non-native accent.

Introduction

More than half of the world’s population is bilingual, a pattern that has only accelerated since the turn of the millennium (Grosjean, 2010). Studies focused on the treatment and perception of non-native accented speech have shown that it is consistently discriminated against, negatively affecting measures related to likeability, sociability, and intelligence (Bradac, 1990; Lindemann, 2003). In an effort to understand accented speech within a natural communicative context, the present study explores how non-native accents are perceived and evaluated in the presence of co-speech emblematic gestures. Building on research demonstrating that gesture’s semantic relationship with speech can powerfully affect language processing, comprehension and learning (Church et al., 2017), the present study asks how a gesture’s cultural relationship to speech influences cross-cultural perceptions and impressions of accented speech and speakers.

The Stigma of Accent

Many people learn their non-native language later in life—through formal education or pressures from commerce—so it is commonplace to speak a second language with a non-native accent (Johnson and Newport, 1989; Cheng, 1999). In general, a non-native accent, a term interchangeable with foreign accent, has been defined as “speech that systematically diverges from native speech due to interference from the phonological and acoustic-phonetic characteristics of a talker’s native language” (Atagi and Bent, 2017).

Unfortunately, non-native accents often carry a social stigma (Gluszek and Dovidio, 2010). Because accents are one of the most immediate, powerful and fixed cues to one’s cultural identity (Giles, 1977), they can reinforce and maintain stereotypes and prejudices between groups of people (Kinzler et al., 2007). In addition, they can be used as salient markers of socio-economic class and educational levels, which can lead native speakers to have a sense of superiority or inferiority compared to non-native accented speakers (Lippi-Green, 2012). Lippi-Green points out that this social hierarchy is so powerful that even the medical community treats the elimination of accents as an explicit goal in certain practices of speech therapy. Because native speakers and non-native speakers interact with one another more than ever (Cheng, 1999; Pickering, 2006), this leads to important questions about how this stigma plays out in social interactions and judgments within cross-cultural contexts.

Research investigating the perceptions and impressions of non-native accented speech has repeatedly shown that it is perceived less favorably than native accented speech on measures of believability (Lev-Ari and Keysar, 2010) and social preference (Kinzler et al., 2009, 2011; DeJesus et al., 2017). For example, Lev-Ari and Keysar (2010) found that people judged statements delivered by non-native accented speakers as less believable than when delivered by native accented speakers. In another study, social preference was measured by asking 5-year-old children to evaluate the likelihood of becoming friends with other children (Kinzler et al., 2009). The study found that, while American children chose the pictures of children with the same race when they were presented silently, they chose the pictures of children with the different race over those with the same race when the latter was speaking in French-accented English. Moreover, in a study that controlled for comprehensibility of non-native accents by using nonsense speech, researchers found that preschool-aged children sought and endorsed information from native accented speakers over non-native accented speakers (Kinzler et al., 2011). Because they used nonsense speech, this study revealed that comprehensibility was not a factor in the children’s choices; rather, the preference was driven solely by the sound of the speech itself. Together, these studies show that speaking with a non-native accent comes at a significant social cost.

Hand Gestures and Native Language (L1)

Research has largely focused on how native and non-native accents interact with other cues to identity, like the race of the speaker (e.g., Rubin, 1992; Kinzler et al., 2011; DeJesus et al., 2017; Hansen et al., 2017). However, there is room for more research in the fluid aspects of communication that accompany accented speech, such as bodies, hands, and facial expressions that are a ubiquitous context when people speak (Kendon, 2004). For example, co-speech hand gesture—the natural movements of the hands and arms to co-construct meaning—is an essential component of everyday communication, so much so that some have theorized it should be treated as an integral part of language itself (McNeill, 1985, 1992, 2006). This fusion between speech and gesture justifies the importance of researching the two together when investigating all aspects of speech communication.

The integrated relationship between speech and gesture in languageproduction has led many researchers to study how these two parts ofthe system work together during language comprehension (for reviews, see Hostetter, 2011; Kelly, 2017). Specifically testing McNeill’s theory, Kelly et al. (2010) advanced the integrated systems hypothesis to show that that the semantic relationship between speech and gesture affect the accuracy and speed of language comprehension. Moreover, this semantic contribution appears to be bi-directional—gesture not only clarifies the meaning of speech, but speech itself clarifies the meaning of gesture. This tight relationship between speech and gesture has been further bolstered by research showing that speech and gesture are semantically integrated in traditional language networks in the brain (Willems et al., 2007; Wu and Coulson, 2007; Dick et al., 2009; Green et al., 2009; Holle et al., 2010).

Beyond semantics, co-speech gesture also serves a lower-level perceptual function as well. Indeed, researchers have shown that hand movements play a role in motor and acoustic processes, such as vocal production (Pouw et al., 2020) and prosodic accentuation (Krahmer and Swerts, 2007). For example, Krahmer and Swerts (2007) found that producing beat gestures with speech not only enhances acoustic properties of speech production, but they also help listeners perceive words to be more acoustically prominent in sentences, even when only the audio is presented. Moreover, when viewing beats, these gestures serve to enhance how viewers perceive prosodic stress in speech. On the neural level, this perceptual focusing function of gesture is evident in neuroimaging research showing that there tight coupling of gesture and speech during early stages of speech processing (Dick et al., 2009; Hubbard et al., 2009; Biau and Soto-Faraco, 2013; Wang and Chu, 2013; Skipper, 2014). In one early study, Hubbard et al. (2009) investigated the relationship between gesture and speech in the auditory cortex and found that compared to “speech with a still body” and “speech with nonsense hand gesture,” speech accompanied by a congruent gesture elicited greater activation of auditory areas in the brain, such as the left hemisphere primary auditory cortex and the planum temporale (see also Dick et al., 2009).

This tight connection between viewing the hands and perceiving speech make gestures a useful tool in “speechreading,” the ability to use visual cues of speakers to clarify what they are saying. In a pioneering (and under-cited) study, Popelka and Berger (1971) investigated how phrases presented in varying gesture conditions—ranging from no gesture to semantically congruent and incongruent iconic and deictic gestures—affected accurate perception of spoken sentences. They found that sentences presented with congruent gestures produced higher accuracy for hearing a spoken sentence than did sentences presented with no gestures, and both produced better accuracy than sentences accompanied by incongruent gestures. More recently, Drijvers and Özyürek (2017) discovered that when the auditory information is degraded, listeners particularly benefit from iconic gestures during speech comprehension (for similar evidence with people who are hard of hearing, Obermeier et al., 2011, or with “cued speech” representing the individual sounds of words with hands, LaSasso et al., 2003). However, when auditory information is too degraded, the “additive effect” from hand gestures is lost. So, it appears that not only do co-speech gestures help with understanding the meaning of an utterance, they also facilitate lower levels perceptual identification of the speech stream itself.

Hand Gestures and Second Language (L2)

Hand gestures are just as much part of using an L2 as they are using an L1 (Neu, 1990; Gullberg, 2006; McCafferty and Stam, 2009). Indeed, Gullberg argues that, given the integrated relationship between speech and co-speech gestures, the latter should be viewed as a fundamental part of the L2 elements that learners must master when acquiring an L2. Just as there are proper ways to phonetically articulate L2 syllables and syntactically organize L2 sentences, there seem to be fitting ways to move the hands when speaking a different language (Kita, 2009; Özyürek, 2017). This appropriate use of gesture applies to more than just the nuts and bolts of L2 phonetics, vocabulary and grammar—it also has pragmatic and cultural functions. In Gullberg’s own words, “[t]he command of the gestural repertoire of a language is important to the individual learners’ communicative efficiency and ‘cultural fluency’ (Poyatos, 1983)—perhaps less in terms of misunderstandings (Schneller, 1988) than in terms of the general integration in the target culture” (Gullberg, 2006, p. 116).

Many of the experiments on this topic have focused on how L2 learners attend to information conveyed through the hands when perceiving novel speech sounds (Hannah et al., 2017; Kelly, 2017; Kushch et al., 2018; Baills et al., 2019; Hoetjes et al., 2019) and comprehending new vocabulary (Allen, 1995; Sueyoshi and Hardison, 2005; Sime, 2006; Kelly et al., 2009; Morett, 2014; Morett and Chang, 2015; Baills et al., 2019; Huang et al., 2019). For example, Kelly et al. (2009) investigated how semantic congruence of gesture and speech affected the learning of L2 Japanese vocabulary in native English speakers. Results from a free recall and recognition test showed that compared to speech alone, congruent gestures enhanced memory and incongruent gesture disrupted it (and see Hannah et al., 2017, for a similar effect in L2 phonetic processing). Based on research in this vein, Macedonia (2014) makes a strong case for why hand gestures should be a bigger part of the L2 classroom and language education more generally.

But what about the other side of the coin? How do gestures produced by L2 speakers themselves affect native speaker’s perceptions and impressions of those L2 speakers? There are a few notable studies that have addressed this question (Neu, 1990; Gullberg, 1998; Jungheim, 2001; Gregersen, 2005; McCafferty and Stam, 2009). For example, Gullberg (1998) observed that the more L2 learners produced co-speech gestures—particularly, iconic gestures—the more native speakers judged them to be generally proficient in the L2. This fits well with L1 research showing that co-speech gestures positively influence social evaluations of native speakers (Maricchiolo et al., 2009). And there is even some recent evidence that training L2 speakers to use co-speech gesture not only enhances impressions of those speakers, but also how those speakers actually produce L2 speech (Gluhareva and Prieto, 2017; Zheng et al., 2018; Hoetjes et al., 2019). For example, Gluhareva and Prieto showed that when native Catalan speakers were given training on how to pronounce English words with beat gestures, their L2 speech was judged by native English speakers to have improved significantly compared to when there was no training with beat gestures. Note that native speakers’ judgments were on L2 speech alone, where they did not see learners’ gestures. Thus, it remains to be seen if viewing L2 gestures affects how native speakers process lower level auditory aspects of L2 speech, such as, correctly hearing what was said or explicitly evaluating the non-native accent itself. In other words, it is possible that seeing L2 gestures not only helps to boost native speakers’ social impressions of an L2 learner, it may also help them make better sense out of what they are hearing.

The Present Study

The present study explores this issue by focusing on a type of gesture that plays a powerful role in cross-cultural communication: emblematic gestures. Emblems are conventionalized movements of the hands, head and body that are understood by most members of one culture (or subculture), but not necessarily another (Efron, 1941; Ekman, 1972; Kendon, 1997; Kita, 2009; Matsumoto and Hwang, 2013). For example, in Japan, the emblem for, ‘It’s spicy,” is to hold the bridge of the nose with the thumb and index finger. Without culinary knowledge that wasabi causes a (strangely satisfying) burning sensation in the sinuses, this gesture would be quite baffling.

Emblems are interesting in an L2 context for a number of reasons. For one, they can be used simultaneously with L2 speech to create multimodal signals, and this allows L2 speakers to display additional knowledge about the L2 culture (Neu, 1990; Jungheim, 2001; Gullberg, 2006; Matsumoto and Hwang, 2013). Second, even though emblems are similar to words in that both have highly conventionalized forms, most emblems are less arbitrary than spoken words and exhibit an element of iconicity that more directly maps onto their cultural meaning (as with the “spicy” example) (McNeill, 1992; Poggi, 2008).¹ This gives L2 speakers an additional opportunity to convey meaning (similar to co-speech iconic gestures), which is particularly useful if their pronunciation is below the native level. And third, compared to the phonological challenges of L2 speech, emblems are relatively simple and easy to learn, making these visual conventions very handy in cross-cultural communication (Matsumoto and Hwang, 2013).

Emblems have not received much attention in the study of L1 speech comprehension, likely because they often occur independently of speech (Goldin-Meadow, 1999). However, in an L2 context, speakers can intentionally use culture-specific emblems along with speech to supplement the meaning of their utterances, in addition to demonstrating their sensitivity and knowledge of the L2 culture. Because viewing co-speech emblems helps L2 speakers comprehend L2 utterances (Allen, 1995), it is likely that they also help L1 speakers understand the non-native speech of L2 learners.

Building on this previous work, we ask the following question: From the perspective of native speakers, how does the cultural familiarity of L2 emblems affect phonetic perception of non-native accented speech specifically, in addition to the more general social evaluation of non-native speakers? This work extends the literature in three ways. First, previous studies on the perceptual processing and social stigma of accent (e.g., Gluszek and Dovidio, 2010; Lev-Ari and Keysar, 2010; Lippi-Green, 2012) have largely excluded its natural multimodal communicative context. If appropriately using hand gestures is an integral part of learning a complete L2 repertoire, as Gullberg (2006) argues, it makes sense to expand the focus and study non-native accents in their fully embodied form. Second, because many emblematic gestures are based on distinct and learned conventions—which often vary by culture—it is possible to explore the consequences of L2 speakers producing culturally right or wrong emblems. Just as a gesture’s iconic meaning matters for L2 vocabulary learning (Kelly et al., 2009), it is possible a gesture’s cultural meaning matters for perceptions and evaluations of L2 speech. Third, although previous research has shown that producing co-speech gestures in an L2 can make a general positive impression on native speakers—for example, Gullberg (1998) showing that gestures make L2 speakers appear more proficient—no study to our knowledge has more specifically broken down how L2 hand gestures influence the processing of non-native accents per se separately from the influence of gesture on evaluations of learners themselves.

In two experiments in two different languages, Mandarin and Japanese, we investigate how different gesture-speech relationships affect the evaluation of foreign language accent and learner from the perspective of native speakers. Specifically, we created gesture-speech pairs in which emblems that accompanied L2 speech were either culturally familiar or unfamiliar to native Mandarin or Japanese speakers. For both experiments, L2 learners were filmed performing a short utterance in three different conditions: culturally familiar gesture (common in China or Japan), culturally unfamiliar gesture (uncommon in China or Japan), and speech alone. In a within-subjects design, native Mandarin speakers watched videos (across all conditions) of L2 Mandarin learners, and native Japanese participants watched videos of L2 Japanese learners. Following each video, participants were asked a set of questions targeting speech perception and social impressions of the L2 learners.

We made two predictions about how L2 learners’ gesture would affect L1 listeners’ perception of speech and social impressions.

(1) We predicted that, relative to speech alone, culturally familiar gestures would improve accuracy and foreign accent ratings of L2 speech, and would positively affect social impressions of the accented speaker.

(2) In contrast, culturally unfamiliar gestures, relative to speech alone, would decrease accuracy and foreign accent ratings of L2 speech, and would negatively affect social impressions of the accented speaker.

These predictions were based on the following two lines of research as summarized in the introduction: one line of research showing that the relationship of gestures to speech matters for phonetic and semantic comprehension in L1 (Popelka and Berger, 1971; McNeill et al., 1994; Kelly et al., 2010) and L2 (Kelly et al., 2009; Hannah et al., 2017), and another line of research showing that the presence of meaningful gestures helps manage social impressions of L1 (Maricchiolo et al., 2009) and L2 speakers (Gullberg, 1998; Gregersen, 2005; McCafferty and Stam, 2009).

Experiment 1: Mandarin

Methods

Participants

Thirty-six undergraduates (13 males and 23 females) from a small liberal arts university on the East Coast participated in Experiment 1. All participants were international students from different regions of mainland China. They were all judged by one of the authors from Beijing to be native Mandarin Chinese speakers. All of them learned English in school in China, and none grew up speaking it at home. None of their first exposure to English in the U.S. is earlier than age 15, but they scored 100 or higher in Test of English as a Foreign Language (TOEFL) at the time of admission to college. Participants received either academic credit in psychology or $5 in cash for their participation.

Materials

L2 learner stimuli

For the L2 learner video stimuli, we recruited twenty-one (14 males and 7 females) “learners” of Mandarin, who were students attending a small liberal arts university on the U.S. East Coast. None of the learners were native Mandarin speakers, and included a range of speaking Mandarin for the first time to those in intermediate and advanced level courses. The range of L2 Mandarin competency was intended to reflect varying levels of the Mandarin accent. Additionally, stimulus learners represented a wide range of racial, ethnic and gender diversity.

Video clips

The stimuli in the experiment consisted of twenty-one 2–4 s videos of Mandarin phrases that are common in everyday speech (see Appendix 1 Supplementary Materials). Each phrase was produced by a different learner in three conditions: (a) Speech + Culturally Familiar Gesture, (b) Speech + Culturally Unfamiliar Gesture, and (c) Speech Alone. The “culturally familiar” gesture was defined as emblems that were familiar and commonly understood within Northern Mainland China. For example, consider the Mandarin utterance, “ www.frontiersin.org duì buqĭ,” which means “Sorry.” The culturally familiar emblem that goes with that speech is both palms meeting below the chin of the speaker, as in the left panel of Figure 1. A list of culturally familiar gestures was created with the help of native Mandarin speakers and Gestpedia², a website that documents gestures from various locations and cultures. To generate “culturally unfamiliar” gestures, we also consulted Gestpedia to find emblems for our various phrases that were associated with various different cultures. Some of these were taken from American culture, but some other cultures include Japanese, Nigerian, Vietnamese, and Egyptian. For example, a culturally unfamiliar gesture to native Mandarin speakers was a palm touching the chest, as in the middle panel of Figure 1. After an extensive list was compiled, the gestures were screened by three separate native Mandarin speakers to assure cultural familiarity.

FIGURE 1

Figure 1. Stimuli example from Experiment 1: Sorry ( www.frontiersin.org duì bu qǐ).

During the recording phase, one of the authors, whose L1 is Mandarin Chinese, was present to ensure that learners’ pronunciation of their assigned phrases were correct enough as to not to accidentally say a different word or phrase. Each learner said only one phrase but repeated it in the three conditions—familiar gesture, unfamiliar gesture, and speech alone—and all were videotaped. The stimulus clips were edited in Final Cut Pro and background noise in the audio clips was reduced with Audacity. In addition, the video clips were edited to have the same speech across all three conditions. To do this, the audio from the speech alone condition was dubbed onto all the other two versions of a given video to equate the speech across all conditions. This was important because it is known that producing hand gestures affects vocal production (Krahmer and Swerts, 2007; Pouw et al., 2020). Equally important was the naturalness of the audio and visual coupling. For this, we tested three people who were naïve to this experiment, and found that the stimuli all looked natural, and none of them noticed the dubbing. In summary, we created a total of 63 video clips (21 speakers × 3 conditions).

To prepare for the actual presentations of these video clips, three versions were created (see Appendix 2 Supplementary Materials), with the intention that each native speaker participant would take only one version of the experiment. Version A, B, or C each included all of the 21 learners, which meant that each version included all of the 21 utterances. But within each version, a learner appeared only in one of the three conditions. The condition in which the learner appeared was counterbalanced across the three versions. This was necessary to ensure that utterance type and gesture condition were not confounded, which is particularly important because there was a large range of accents across learners. In this way, we can control for diversity of accents by having each learner serve as his or her own control.

Evaluation of Learners’ Videos

A set of eight questions was used in the questionnaire. They were grouped into two general categories of evaluation: (1) questions that measured perception of speech itself (speech evaluation) and (2) questions about social impressions of the Mandarin learners (learner evaluation).

Speech evaluation

To measure various forms of speech perception, the following questions were presented in Mandarin Chinese, which was the participants’ L1: (1) Words Misheard: “What did this person say?” (fill in the blank); (2) Accent: “How would you rate their accent?” (1 = completely foreign to 10 = completely native Mandarin); and (3) Tone Accuracy: “How would you rate their tonal pronunciation?” (1 = completely incorrect to 10 = completely correct). The third question was specific to Mandarin as a tonal language, as it is possible to mispronounce a word in Mandarin by confusing one of the four lexical tones. In addition, we gave participants (4) a Surprise Memory Test at the end of the experiment, asking them to write down any of the learner’s utterances that they could recall from the video. This was included because past research has shown that iconic gestures help disambiguate audio-degraded speech (Obermeier et al., 2011; Drijvers and Özyürek, 2017), and it is possible that this disambiguation would manifest in recall for accented speech too.

Learner evaluation

To probe for different aspects of social impression about L2 learners, the following questions were presented: (5) Confidence: “How confident was this person?” (1 = not at all confident to 10 = extremely confident); (6) Nervousness: “How nervous was this person?” (1 = not at all nervous to 10 = extremely nervous); (7) Communicative Effectiveness: “How effective would this person be at communicating with native Mandarin Chinese speakers?” (1 = not at all effective to 10 = extremely effective); and (8) Length of Study Time: “How long do you think this person has been learning and practicing Mandarin Chinese?” [sliding scale labeled “amount of time (years)” from 0 to 20; it was converted to months later to be consistent with Experiment 2].

Procedure

The participants arrived at the Center for Language and Brain lab and were given a consent form. After they read the form, we clarified any questions before they signed it. The following script was read to participants of Experiment 1: “The purpose of this research is to evaluate the effectiveness of people speaking in Mandarin Chinese. You will view a series of brief videos of students practicing Mandarin Chinese, and after each one, take a survey to evaluate their learning efforts.” The intention of this introduction was to prime the participants to treat the L2 learners in the stimulus video as students, in addition to getting participants in the mindset of providing constructive feedback to L2 speakers.

After the basic introduction of the task, the researchers encouraged participants not to spend more than 1 min responding to all eight of the video’s questions. This time limit was introduced to emulate natural face-to-face communication in everyday life, during which listeners only have a very short time to process and integrate various sources of information about phonology, semantics, syntax, and pragmatics (Hanulíková et al., 2012). Participants were then brought into individual testing rooms, each containing of a computer, monitor and Pinyin keyboard on a desk.

The study was presented on Qualtrics. Participants were shown one video at a time, with each repeated twice. After that, the video would disappear from the screen, the set of seven survey questions appeared. The order of the questions, as described in the previous section, was set to a random order, and all participants answered them in this sequence: questions (5), (1), (2), (3), (8), (7), and (6). The experiment was self-paced, so the inter-stimulus interval length varied between participants. Each video and set of questions required about 45 s to 1 min. After participants finished responding to all the video stimuli, they were given the surprise memory test (question 4). The entire experiment lasted approximately 20–25 min.

After participants completed all of the tasks, the researcher debriefed them on the purpose of the study and compensated them with either course credit or $5 in cash.

Coding and Design

Aside from the rating scales, there were two measures that required coding: Words Misheard, with the question asking, “What did this person say,” and the Surprise Memory Test at the end.

The Words Misheard question was coded by comparing the participant’s typed answer to the actual speech in the video. A correct answer received a score of 0 (no errors), and an incorrect answer in any part of the utterance received a score of 1. The Surprise Memory Test involved free recall, and a score of “1” was given to phrases identical to the words presented in the study (complete memory) and a score of “0.5” was given to partially correct scores (partial memory), such as having the same root word but incorrect ending. A “0” was given for items that were entirely omitted or could not be traced back to any utterance (incomplete memory). In this way, low values for the “misheard” dependent variable (DV) mean better perception, whereas low values for the “memory” DV mean worse recall.

The experiment had a one-factor analysis of variance, with 3 conditions: culturally familiar gesture, culturally unfamiliar gesture, and speech alone.³ Because we make non-orthogonal comparisons among our three levels of condition, we used Dunn-Šidák multiple contrasts to correct for Type I errors.

The DVs were separated into two categories. First, the L2 “Speech” evaluation includes measurements concerning (1) Words Misheard, (2) Accent, (3) Tone Accuracy, and (4) Memory Test. Second, the L2 “Learner” evaluation includes measurements concerning (5) Confidence, (6) Nervousness, (7) Communicative Effectiveness, and (8) Judgments of Length of Time Studying Mandarin.

Results

Speech Evaluation

Means and standard deviations of native Mandarin speaker responses are shown in Table 1. See the top half for the Mandarin data (see section “Experiment 1: Mandarin results).

TABLE 1

Table 1. Means and standard deviations of native speaker responses in the L2 “Speech” evaluation.

For the proportion of misheard speech, there was a significant effect of gesture, F(2,70) = 5.065, p = 0.014, $η_{p}^{2}$ = 0.16. Familiar gestures produced lower error rates than both speech alone, tDS(3,35) = 2.757, p = 0.014, and culturally unfamiliar gestures tDS(3,35) = 2.743, p = 0.030. No significant difference was found between speech alone and unfamiliar gestures, tDS(3,35) = 1.03, n.s. The left panel of Figure 2 shows the number of Mandarin words misheard in each of the three conditions (out of a total number of 756 answers = 21 utterances × 36 native listeners). The figure clearly demonstrates that the familiar gesture condition yielded the smallest number of misheard words, contrasting with the unfamiliar gesture and speech alone conditions.

FIGURE 2

Figure 2. Number of words misheard in each of the familiar-gesture, unfamiliar-gesture, and speech-alone conditions.

On the evaluation of accent, there was a significant effect of gesture, F(2,70) = 5.830, p = 0.005, $η_{p}^{2}$ = 0.143. Familiar gestures produced significantly more native-like ratings compared to speech alone, tDS(3,35) = 3.061, p = 0.006, and also compared to unfamiliar gestures, tDS(3,35) = 2.776, p = 0.014. However, there was no significant difference between unfamiliar gestures and speech alone, tDS(3,35) = 0.281, n.s. For tonal accuracy, there was a significant effect of gesture, F(2,70) = 4.206, p = 0.019, $η_{p}^{2}$ = 0.107. Familiar gestures influenced participants to attribute more correct tonal pronunciation than speech alone, tDS(3,35) = 2.791, p = 0.012. However, there were no significant differences between familiar gestures and unfamiliar gestures, tDS(3,35) = 2.085, n.s., or between unfamiliar and speech alone, tDS(3,35) = 0.670, n.s.

The surprise memory test also yielded a significant effect of gesture, F(2,70) = 5.045, p = 0.011, $η_{p}^{2}$ = 0.126, such that speech alone yielded worse recall than both culturally familiar, tDS(3,35) = 2.500, p = 0.026, and unfamiliar gestures, tDS(3,35) = 3.332, p = 0.006. However, there was no significant difference between familiar and unfamiliar gestures, tDS(3,35) = 0.552, n.s.

Learner Evaluation

Means and standard deviations of native speaker responses in the L2 “Learner” evaluations were given in the upper half of Table 2 (see section “Experiment 1: Mandarin results”).

TABLE 2

Table 2. Means and standard deviations of native speaker responses in the L2 “Learner” evaluation.

For confidence, there was a significant effect of gesture, F(2,70) = 4.859, p = 0.011, $η_{p}^{2}$ = 0.122, with speech alone lowering confidence ratings compared to both familiar, tDS(3,35) = 2.214, p = 0.049, and unfamiliar gestures, tDS(3,35) = 3.049, p = 0.012. There was no significant difference between the familiar and unfamiliar gestures, tDS(3,35) = 0.646, n.s. For nervousness, there was a significant effect of gesture, F(2,70) = 3.311, p = 0.045, $η_{p}^{2}$ = 0.086. The mean rating appeared higher, i.e., more nervous, in speech alone than in the other conditions, as shown in Table 2. However, none of the individual comparisons yielded a significant difference [familiar gestures vs. speech alone: tDS(3,35) = 2.159, n.s.; familiar vs. unfamiliar gestures: tDS(3,35) = 0.028, n.s.; and unfamiliar vs. speech alone: tDS(3,35) = 2.059, n.s.]. (Note that finding null results with our planned contrasts, despite finding a significant omnibus effect in the ANOVA, is the result of using Dunn-Šidák multiple contrasts, which adjusted the criteria more strictly than without an adjustment).

For communicative effectiveness, there was a significant effect of gesture, F(2,70) = 6.644, p = 0.003, $η_{p}^{2}$ = 0.160. Both familiar and unfamiliar gestures were judged to be more effective than speech alone [tDS(3,35) = 3.240, p = 0.005; tDS(3,35) = 2.619, p = 0.039, respectively]. Between familiar and unfamiliar gesture, however, no significant difference was found, tDS(3,35) = 1.388, n.s.

For estimates of time studying the Mandarin language, there was no significant effect of gesture, F(2,70) = 1.457, n.s.

Experiment 1 Summary

Speech evaluation

The most consistent finding in the speech evaluation measures was that familiar gestures indicated an advantage over speech alone in all dimensions: with fewer words misheard, higher “native-like” accent ratings, higher tone accuracy, and more recalled utterances in the surprised memory test (see Table 3 for a summary of Experiment 1). However, effects of unfamiliar gestures were somewhere between the other two conditions—in two evaluations (tone accuracy and memory test), unfamiliar gestures did not differ from familiar gestures, but in the other two evaluations (words misheard and accent ratings) unfamiliar gestures showed significantly less advantage than familiar gestures. Compared with speech alone, unfamiliar gestures had only one advantage, producing more recalled items in the surprised memory test than speech alone, but they did not differ in the other evaluations. Our original prediction was that unfamiliar gestures would have a more negative effect than speech alone, but none of the cases showed this.

TABLE 3

Table 3. Summary of significant differences between conditions: FAMILIAR Gesture, UNFAMILIAR Gesture, and SPEECH Alone.

Learner evaluation

Two major patterns were found for evaluation of L2 learners. First, we found that familiar and unfamiliar gestures both led to higher ratings of confidence and communicative effectiveness, compared to speech alone. In contrast, for the evaluation of nervousness and the estimate of time studying Mandarin, there were no differences among the three conditions.

Experiment 2: Japanese

Experiment 2 attempted to build on Experiment 1 by generalizing to a different language and culture: Japanese. Given that the vast majority of research in psychology has focused on Western societies and English speakers, it is important to increase diversity in the field by expanding to different cultures and languages (Henrich et al., 2010). It goes without saying that there are vast differences among Asian languages and cultures as well. This diversity is especially relevant for the topic of emblematic gestures, which by definition depend on the specific conventions of a particular culture.

Combined with the authors’ impressions and discussions with native Chinese and Japanese speakers, we reasoned that these two cultures might vary to different degrees in the use of gesture, making Japanese emblems a good candidate for the present study.

More importantly, we considered another point: The Mandarin speakers in the first experiment were enrolled in a university in the U.S. for 0.5–3.5 years at the time of testing, and they were proficient in English for undergraduate studies. This factor might have exposed them to a greater variety of linguistic and cultural elements outside their native language and culture, and it might have made them more open to difference than people who have never lived abroad. With this in mind, we sought to find college students in Japan who did not have as extensive experience abroad. This may make the interpretation of emblems relatively more uniform across these participants in Japan, which would be a nice contrast with the Chinese participants in Experiment 1.

Using the same basic paradigm as Experiment 1, we investigated the extent to which native Japanese speakers are sensitive to the cultural meaning of emblem gestures when: (1) perceiving non-native speech and (2) forming social impressions of non-native speakers.