Gesture alignment in teacher–student interaction: a study concerning office hour consultations using English as the lingua franca

Opazo, Paloma; Cienki, Alan; Oben, Bert; Brône, Geert

doi:10.3389/fcomm.2024.1457533

ORIGINAL RESEARCH article

Front. Commun., 11 November 2024

Sec. Multimodality of Communication

Volume 9 - 2024 | https://doi.org/10.3389/fcomm.2024.1457533

This article is part of the Research TopicMultimodality in Face-to-Face Teaching and Learning: Contemporary Re-Evaluations in Theory, Method, and PedagogyView all 5 articles

Gesture alignment in teacher–student interaction: a study concerning office hour consultations using English as the lingua franca

Paloma Opazo^1,2^*

Alan Cienki¹

Bert Oben²

Geert Brône²

¹Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
²Faculty of Arts, KU Leuven, Leuven, Belgium

Introduction: In this study, we explore the presence of gesture alignment in office hour consultations, a form of academic talk characterized by private face-to-face dialogues between a lecturer and a university student. Unlike classroom interactions, the topic of these consultations is initiated by the student. Our objectives were to describe the patterns of gesture alignment in these educational settings, to determine the direction of the copied behavior (i.e., who copies whom?), and to understand the temporal structuring of these instances.

Methods: We analyzed 12 office hour consultations, involving Spanish undergraduate students and lecturers from universities in England, Ireland, and Sweden. All the conversations were held in English. The annotation considered three domains: the timing of matching gestures (i.e., if the aligned gestures appeared in a Simultaneous, Consecutive, or Later manner), the form features of the aligned gestures (hand shape, movement, and orientation), and the function of the gestures (representational, deictic, or pragmatic).

Results: Our results show that although there are important differences between dyads, there were four general findings. First, aligned gestures mostly took place in a Consecutive manner. Second, gesture alignment is shown to achieve shared understanding between interactants, but this can be manifested in different ways: from the active negotiation of meaning to the signaling of agreement. Third, paired gestures become useful in educational contexts where the teachers and students include native and non-native speakers, as they contribute to disambiguating meaning. Fourth, many cases of matching gestures happen due to the presence of recurrent gestural forms.

Discussion: Overall, our results are in line with previous evidence that has highlighted the role of gesture alignment in grounding processes, related to the achievement of mutual agreement between participants. Matching gestures are a helpful resource during office hour consultations—a form of academic talk where content is being explained and negotiated.

1 Introduction

Research has shown that speakers can repeat or imitate each other’s behaviors from one or more semiotic systems. One such system is gesture, which can be defined as bodily movements, often done with our hands, with the intention to communicate (Kendon, 2004). The copying of gestures between interactants has been observed in experimental and non-experimental environments. Although the phenomenon has received many names, we will refer to it as gesture alignment, a term that has been related to the interactive alignment framework (Pickering and Garrod, 2006). That approach assumes that the repetition of linguistic and non-linguistic behaviors between speakers is associated with an alignment on a cognitive level. The claim is that the use of similar words or gestures involves the alignment of mental representations (Pickering and Garrod, 2006). We do not necessarily subscribe to this view, and rather use alignment as a descriptive term that refers to “cross-participant repetition of communicative behavior” (Rasenberg et al., 2020, p. 1). Therefore, gesture alignment here simply indicates that a given gesture was subsequently copied by another speaker.

In an earlier study, Kimbara (2006) looked at gesture alignment, which she labeled gestural mimicry, in joint-narration tasks, where participants were asked to watch clips from cartoon episodes. After watching the clips, they were paired in dyads and had to re-tell the content of the cartoons in front of a camera (Kimbara, 2006). As speakers were retelling the cartoons, there were instances of gesture alignment, which Kimbara (2006) described as instances of “jointly constructed meaning” (p. 42). According to the author, cases of form-meaning mapping in gesture by one speaker may become salient for the interlocutor and are copied by them using similar features of the first gesture. Gesture alignment is thus useful when participants collaborate to establish meaning (Kimbara, 2006). As Holler and Wilkin (2011) show, mimicry at a gestural level is important in face-to-face interaction when speakers are “creating a mutually shared understanding of referring expressions” (Holler and Wilkin, 2011, p. 137). Following previous research, alignment at a gestural level contributes to achieving mutual agreement between participants, a process known as grounding (Clark and Brennan, 1991; Holler and Wilkin, 2011).

Specifically in education, there is some research on gesture alignment, which has mostly come from interaction studies (Arnold, 2012; Lerner, 2002; Majlesi, 2015, 2022) or is framed within Vygotskian sociocultural theory (Hudson, 2011; Rosborough, 2010; Smotrova, 2014; Smotrova and Lantolf, 2013). Through qualitative methods, such as conversation analysis and ethnography, these studies have introduced naturally occurring examples of gesture alignment between students in a classroom and between teachers and students. After studying teachers’ repetition of students’ gestures, Majlesi (2015) reached a similar conclusion to that of previous studies by indicating that matching gestures support mutual understanding. However, the author added that teachers re-use students’ talk and gestures with the intention to create “teaching and learning opportunities” (p. 42). Matching gestures highlight actions previously made by students and turn them into a “teachable moment” (Majlesi, 2015, p. 32). By re-using students’ gestures, teachers express that they have understood the students’ actions, and gestures become the focus of attention, which makes them a pedagogical instrument (Majlesi, 2015).

In this study, we analyzed 12 videos, involving Spanish undergraduate students and English-speaking lecturers from universities in England, Ireland, and Sweden, with the goal of describing instances of gesture alignment in teacher–student interaction. The videos contain dialogues of a specific type of academic talk called office hour consultations. Contrary to some other forms of teacher–student dialogues (e.g., in the classroom), these consultations involve a face-to-face interaction where the “topic is determined and initiated by the student” (Limberg, 2007, p. 178). During this form of academic talk, participants negotiate academic topics that can be related to the content of a subject or to the administrative aspects of the class (e.g., assessments or deadlines). In addition, all the videos were dialogues between Spanish undergraduate students and university lecturers who were native speakers of English; thus, these consultations are also L1–L2 dialogues.

This study seeks to characterize gesture alignment in office hour consultations as well as address the direction of alignment; that is, whether students incorporate teachers’ gestures or the other way around. In addition, we also considered the temporal dimension of the alignment. Gestural matching has mostly been studied sequentially in a rather limited temporal context (e.g., in adjacent turns). With the current study, we want to stretch that time window and study gesture alignment at different levels of temporal granularity. While this study is primarily qualitative, using descriptions based on video annotations, it also provides an overview of gesture alignment based on descriptive statistics.

The text is structured as follows: the first sections give the theoretical basis of the study by indicating the relevance of gesture in education and the main findings of previous studies on gesture alignment in various settings, including educational contexts. After this, we explain the methodology and the criteria that were used to determine if gestures were aligned or not. The results section begins with an overview of aligned gestures in the videos, and then it presents relevant examples found in the data to achieve a better understanding of the role of gesture alignment in office hour consultations. The last part deals with the main conclusions of the study and aspects to take into consideration for future research.

1.1 Gestures in education

In broad terms, it is possible to distinguish two main perspectives to conceptualize gestures: one that defines them as communicative actions and another one that considers gestures as ‘windows’ into the mind (McNeill, 2010b). In the first approach, gesture is seen as “visible bodily action” (Kendon, 2004, p. 7) that plays a role in social interaction. The second perspective emphasizes “the mental processes in individual speakers and listeners” (McNeill, 2010b, p. 139) and therefore focuses on the online processes of thinking and speaking. Despite the differences between approaches, over time they have reached many points of agreement. One of them corresponds to the relevance of gesture in teaching and learning processes. In the last few decades, extensive research has been conducted on teachers’ and students’ use of gestures in the classroom (Alibali et al., 2014; Alibali et al., 2013a; Alibali et al., 2013b; Goldin-Meadow, 2010) as well as in experimental environments (Goldin-Meadow and Singer, 2003). Studies have analyzed the role of gestures in various fields of study, such as mathematics (Alibali et al., 2019; Alibali and Nathan, 2012; Herbert, 2018; Krause, 2016; Marchant Araya, 2016), biology (Pozzer-Ardenghi and Roth, 2008), and second language teaching and learning (Gullberg, 2008; Matsumoto and Dobs, 2017; Smotrova, 2014; Smotrova and Lantolf, 2013; Stam, 2012; Tellier et al., 2021).

Some scholars have highlighted the fact that instruction is an embodied practice, in the sense that teachers use multiple modes of communication to teach and present knowledge (Ehmer and Brône, 2021). Instructors demonstrate by using their bodies, and these demonstrations “are usually not simple ‘non-verbal’ performances or displays of the knowledge to be transferred, but highly structured social activities of sharing and distributing conceptual knowledge adjusted to their instructional purpose” (Ehmer and Brône, 2021, p. 2). Gesture, in particular, is considered as one of the multiple semiotic resources that are displayed in the classroom to convey meaning (Alibali et al., 2019; Arzarello et al., 2009; Pozzer-Ardenghi and Roth, 2008) in addition to spoken and written language. Other non-verbal forms of representation include pictures, concrete objects, or symbols, as shown by previous research conducted in the classroom (Flevares and Perry, 2001; Mittelberg, 2006; Williams, 2008).

Gesture has been shown to fulfill important functions in the teaching process by fostering common ground (Alibali et al., 2019; Alibali et al., 2013a), linking technical ideas (Alibali et al., 2014), or presenting abstract ideas (Parrill and Stec, 2017). The synergy between speech and gesture becomes especially beneficial in teaching and learning processes, as explanations using both modalities take less cognitive effort to understand than those that just rely on speech (Goldin-Meadow, 2005). Furthermore, gesture can “reduce demands on the speaker’s cognitive resources (relative to speaking without gesture), and free up cognitive capacity to perform other tasks” (Goldin-Meadow, 2010, p. 12). From an embodied cognition perspective, Alibali and Nathan (2012) argue that teachers and students use different types of gestures for specific purposes. Pointing gestures, for example, would reflect “the grounding of cognition in the physical environment” (Alibali and Nathan, 2012, p. 252), while representational gestures “manifest mental simulations of action and perception” (Alibali and Nathan, 2012, p. 252).

Gesture contributes to learning through different strategies (for a review, see Goldin-Meadow, 2005). Learners’ gesture production has been addressed at different developmental stages, from childhood (Goldin-Meadow, 2010; Rowe et al., 2013) to adulthood (Cook, 2018; Dargue and Sweller, 2020). From a cognitive perspective, speech and gesture form an integrated system (Goldin-Meadow, 1998; McNeill, 1992; McNeill et al., 2010), but this integration develops over time. Goldin-Meadow (1998) has conducted extensive research on gesture-speech mismatches in children, which occurs when both modalities introduce different information. These mismatches are informative for educators, as they reflect “useful information about a child’s knowledge state” (Goldin-Meadow, 2005, p. 121). Furthermore, gestures can contribute to establishing common ground during trouble spots, and moments during instruction where students express a lack of understanding (Alibali et al., 2013a). Teachers can present gestures to connect ideas and, when they do this effectively, they have a positive impact on students’ learning (Alibali et al., 2013b). In sum, there is sufficient evidence to claim that teachers’ gestures contribute to learning but also fulfill other functions in the classroom, as Tellier et al. (2021) indicate. The analysis of language teachers’ gestures showed the presence of three main pedagogical functions: informing about relevant content, such as vocabulary or grammar, managing the classroom, and assessing or providing feedback. In addition to teaching relevant technical ideas to students, gestures are used to give instructions or to assess the knowledge being learned.

This section has highlighted the role of gesture inside the classroom. According to the findings from previous research, gestures are one of the many semiotic resources introduced by teachers to explain content. They are useful resources to teach, and they have an impact on learning if they are presented effectively by linking technical ideas. Different scholars agree that they are a window onto the speaker’s knowledge, but they also contribute to the acquisition of new knowledge by supporting students’ learning processes (Arzarello et al., 2009). Gestures are also useful when it comes to managing the class, which is an important aspect of classroom interaction. Although this section has focused on the positive aspects of gestures, some argue that gestures can create obstacles in communication if they are “too ambiguous, abstract, or culturally embedded as is the case with metaphorics and emblems” (Tellier et al., 2021, p. 35). The cultural variation of gesture in education is an aspect that should certainly be considered when analyzing gestures.

1.2 Gesture alignment

Historically, the concept of behavior matching (Bernieri and Rosenthal, 1991) or behavioral mimicry (Vicaria and Dickens, 2016) has been used to reflect the repetition of similar behaviors across interacting partners over a short time window. Some examples of behavior matching are posture congruence (Scheflen, 1964), shared eye gaze (Oben and Brône, 2015), or matching gestures. Specifically with gestures, the copying of these behaviors between interactants has received growing attention in the last few years, but research concerning teachers and students is scattered. In this section, we give an overview of previous findings related to the matching of gestures and then look further into the repetition of gestures in educational contexts.

Cross-participant repetition of gestures has received different names depending on the theoretical approach: gesture mimicry (Chui, 2014; Holler and Wilkin, 2011; Kimbara, 2006; Parrill and Kimbara, 2006), gesture alignment (Bergmann and Kopp, 2012; Oben and Brône, 2016), gestural matching (Arnold, 2012; Lerner, 2002; Majlesi, 2015), use of a return gesture (De Fornel, 1992; Eskildsen and Wagner, 2013), gestural resonance (Warner-Garcia, 2013), or gesture repetition (Yasui, 2013). Most scholars have analyzed gesture alignment sequentially, that is, including gestures that occur in adjacent turns or with a specific time lag (Arnold, 2012; Kimbara, 2008; Parrill and Kimbara, 2006). There are some studies that have sought to analyze simultaneous alignment (Cienki et al., 2014; Lerner, 2002), but they have still considered a small temporal delay between instances.

One of the issues when studying gesture alignment is a great deal of variation between studies in how they determine that two or more gestures are similar to some degree. The phenomenon has been characterized by the recurrence of formal features in the gestures, such as palm orientation, type of movement, and/or hand shape, across interactants (Parrill and Kimbara, 2006). The nature and amount of formal features that are used to establish the presence of matching behaviors have varied across studies. Holler and Wilkin (2011) talk about the “same overall form” (p. 141), while Parrill and Kimbara (2006) based their decision on three features: “motion, hand shape, and location” (p. 161). Furthermore, Chui (2014) included five features to determine the similarity of gestural forms: “handedness, position, orientation, hand shape, and motion” (p. 71). By contrast, some other studies have not included formal parameters, since they have focused on the representation technique of the gesturer or, in other words, the way in which gestures are depicting ideas, actions, and/or concepts (Oben and Brône, 2016). There is agreement, however, in saying that gestures do not need to share exact parameters or be an “exact duplication” (Parrill and Kimbara, 2006, p. 161) to be categorized as aligned. These gestures “are not simply reproduced but are reformulated in relation to each other” (Warner-Garcia, 2013, p. 56), which is also the case for verbal alignment, where verbal instances do not need to be exactly the same to be considered cases of repetition or alignment (Tabensky, 2002; Tannen, 2007). These reformulations at a gestural or verbal level have been called rephrasing (Tabensky, 2002) or “recycling” (Tabensky, 2002, p. 218).

Similarity in form is essential to assess if two gestures are repeated, but it is not the only aspect to take into consideration. It has been stated that “mimicked gestures” (Holler and Wilkin, 2011, p. 133) share meaning; therefore, for a sequence of gestures to be seen as mimicry, they would have to share similar formal features, as well as represent the same entity or meaning (Holler and Wilkin, 2011; Kimbara, 2006). For this reason, most research on matching gestures has analyzed representational or iconic gestures, which refer to gestures that are depicting aspects that are presented in speech (McNeill, 2008). In other words, they embody semantic content and “present images or concrete entities and/or actions” (McNeill, 2008, p. 39).

Rasenberg et al. (2020) discuss two main approaches to explain behavioral alignment at the linguistic and non-linguistic levels: priming and grounding. From a priming perspective, Pickering and Garrod (2006) argue that alignment does not entail any sort of negotiation between speakers, since the repetition of behaviors would occur due to a priming mechanism that is automatic. In addition to this, alignment at a linguistic or non-linguistic level would express the alignment of mental representations (Pickering and Garrod, 2006). The notion of grounding, on the contrary, assumes “that alignment follows from interactive, coordinative efforts involved in joint meaning-making” (Rasenberg et al., 2020, p. 4). Other relevant studies also appear to support the claim that alignment contributes to grounding processes. Matching gestures in particular are used to maintain mutual understanding (Holler and Wilkin, 2011), to sustain common ground (Kimbara, 2006), and to accept a previous idea (Yasui, 2013). According to Chui (2014), matching gestures also demonstrate that speakers are paying attention to what is being expressed through speech and gesture. As Kimbara (2006) states, matching gestures correspond to instances of “jointly constructed meaning” (p. 42), and similar ideas are found in other studies (Chui, 2014; Tabensky, 2002). Most researchers focus on the collaborative aspect of gesture alignment by indicating that it takes place “during sequences of intense involvement in the topic being discussed” (Tabensky, 2002, p. 234), which displays the involvement of interactants. Despite this emphasis on collaboration, Warner-Garcia (2013) discusses how the repetition of gestures can also express a “problematized negotiation” (p. 70).

During face-to-face interaction, participants can repeat each other’s gestures with different communicative purposes. Following grounding approaches, matching gestures play a role in the joint construction of meaning. The acceptance of another person’s gesture may be related to agreement and mutual understanding (Chui, 2014; Holler and Wilkin, 2011; Tabensky, 2002), although other approaches might emphasize the cognitive dimension of alignment, as we previously mentioned with the notion of mental representations (Pickering and Garrod, 2006). The studies included in this section come from face-to-face dialogues, whether these took place in controlled experiments or in naturally occurring interactions. In the next section, we will address research on the communicative function of matching gestures in educational environments, but, as we will see, they have been grounded mostly in fields outside of gesture studies.

1.3 Repetition of gestures in educational environments

Studies on matching gestures in education are largely situated in interaction studies (Arnold, 2012; Eskildsen and Wagner, 2013; Koschmann and LeBaron, 2002; Lerner, 2002; Majlesi, 2015, 2022) or in Vygotskian sociocultural theory (Hudson, 2011; Rosborough, 2010; Smotrova, 2014; Smotrova and Lantolf, 2013; Van Compernolle and Smotrova, 2014). The field of second language acquisition has been fertile ground for these studies. Research on the topic has been observational, and it includes ethnography, conversation analysis, or other qualitative methods to describe the presentation of matching gestures in group interactions inside the classroom. Arnold (2012), however, collected a dataset of instructional interactions that were recorded at a bicycle-repair shop. In these recordings, a group of volunteer mechanics showed their customers how to fix their bikes. The embodied knowledge of the mechanics was taught and depicted through their hands and bodies. The author indicates that “the teaching gestures used by the volunteer mechanics visibly incorporated their bodily experience of bicycle repair, and this embodied knowledge was then transferred to the learners through practices such as dialogic embodied actions” (Arnold, 2012, p. 272). The knowledge of these practices is shared through gesture (and the body as a whole), and later repeated by learners following a leading-following pattern. Dialogic embodied action, as Arnold (2012) calls it, allows one to gain instrumental and conceptual knowledge, that is, understanding how to perform a procedure and the reasons behind said procedure.

A concept that is often used in sociocultural theory to refer to matching gestures is that of catchments. The concept was developed by McNeill (2010a), although it was based on work by Adam Kendon (1972). In 1972, Kendon gathered observations of the bodily behavior and speech of a man while he was drinking and talking at a pub in London. When analyzing the recordings, the author noticed that the man resorted to a similar gesture every time he was referring to the “main point” (Kendon, 1972, p. 204). In later years, McNeill (2010a) went back to this idea and called such gestures catchments. Following the author, a catchment can be inferred by the recurrent presentation of gestures with similar characteristics (e.g., hand shape, space, orientation, or movement) when certain discourse topics are addressed (McNeill, 2010a). McNeill connects this term with the growth point, a theoretical and minimal psychological unit that combines imagistic and linguistic content (McNeill et al., 2010). By analyzing catchments, it should be possible to better understand the origin and relationship between speech and gesture in thought (McNeill, 2010a).

Catchments are not the same as gesture alignment since with catchments, the gestures are produced by the same speaker throughout the interaction. In a way, catchments could be considered as a type of self-repetition at a gestural level. The notion of catchments is also different from alignment because it highlights the connection between gestures and discourse themes: “The logic of a catchment is that discourse themes produce gestures with recurrent features; these recurrences give rise to the catchment” (McNeill, 2010a, p. 316). Catchments were analyzed by Pozzer-Ardenghi and Roth (2008) during a biology lesson. The authors noticed that the teacher would repeat gestures when he was introducing and developing scientific ideas (Pozzer-Ardenghi and Roth, 2008). The analysis of catchments showed that recurring ideas were being explained with similar gestures. In the case of the “heart contraction” sign, the hands displayed a squeezing movement when the teacher introduced ideas about the circulatory system. “The various repetitions of this gesture constitute a catchment that presents a recurrent idea available through the particular movement that, in turn, carries meaning in conjunction with the words synchronously uttered” (Pozzer-Ardenghi and Roth, 2008, p. 396). The catchment is clear considering that the teacher depicts a specific concept using similar formal features.

Catchments, however, have also been used to characterize cross-participant repetition of gestures. Smotrova and Lantolf (2013) investigated the role of gesture and speech in two English-as-a-foreign-language (henceforth: EFL) classes in Ukraine. In one of the excerpts, students are asked to find the Russian translation of English words that the students were unfamiliar with (Smotrova and Lantolf, 2013). As one of the students (S1) tried to decipher the meaning of a word (further in English, dal’she in Russian), she produced a gesture that was later repeated by another student (S2). The authors interpret the matching gesture of S2 as “apparently aligning herself with S1’s hypothesis” (Smotrova and Lantolf, 2013, p. 404). The authors connect this example with catchments, because there is a recurrent topic being depicted with the same “gestural image” (Smotrova and Lantolf, 2013, p. 404). This “recurring image” serves as a reference point to connect the information contained in speech “back to the underlying topic” (Smotrova and Lantolf, 2013, p. 410). Similar to previous studies on gesture alignment, they argue that the repetition of the gesture exhibits a co-constructed understanding of a concept.

The direction of alignment has been addressed within the classroom, but there are only a few examples where this has been an explicit research goal. Majlesi (2015) looked at a corpus of Swedish-as-a-second-language classes and analyzed cases in which teachers matched students’ gestures. By repeating students’ gestures, Majlesi (2015) argues, teachers show that they understand the contributions made by students. The recycling of the gesture “transforms the actions into a teachable moment” (Majlesi, 2015, p. 32), and it creates teaching and learning opportunities. This process is useful for different reasons: to give feedback, that is, to unpack what students have expressed; to reformulate students’ prior formulations, repeating a previous action but with minor modifications; and to explain technical content, thus, turning these gestures into pedagogical devices (Majlesi, 2015). To sum up, matching gestures are used by teachers for grounding purposes to sustain mutual understanding and to create teachable moments. Other examples of teachers’ copying students’ gestures have mostly come from class observations. Hudson (2011) analyzed the speech and gestures of an instructor in an English-as-a-second-language pronunciation class and explained these cases of gesture alignment as a way for students “to appropriate the pedagogical gestures that the instructor exhibited” (Hudson, 2011, p. 258). However, the author framed it as a form of internalization, following Vygotsky’s theory (Hudson, 2011). In addition, within the sociocultural tradition, Rosborough (2010) described cases in which a second-grade teacher matched the students’ gestures, as well as examples of the students copying the teacher’s gestures. The author argues that the teacher did this “for pedagogical and communication purposes” (Rosborough, 2010, p. 106), and the students followed this trend by mirroring the teacher’s gestures. In this setting of English as a second language, gesture “was often central in collaborative and meaning-making searches between the teachers and students” (Rosborough, 2010, p. 106).

In this sense, gestural alignment appears to highlight teachable moments, especially when teachers match the students’ gestures, but the repetition of gestures appears to be particularly useful in educational contexts in which there is an L1 speaker, who is a teacher or a volunteer, and L2 speakers. Matching gestures have been shown to disambiguate confusing or unfamiliar meanings in the L2 (Smotrova, 2014), and students use teachers’ gestures to reflect agreement and understanding of their explanations. Similarly, gesture alignment plays a role during trouble talk (Eskildsen and Wagner, 2013), a process in which teachers try to elicit L2 words from students. After analyzing Language Cafés in Sweden, where L1 speakers, all of them volunteers, sat together with L2 speakers, Majlesi (2022) noticed that gesture alignment appeared during explanation sequences or word searches. The author found a recurrent pattern in these interactions: (1) the introduction of a gesture by L2 speakers through an inquiry, (2) the answer of the L1 speaker containing the gesture previously used by the L2 speaker, and (3) the confirmation of the understanding of the L1 speaker (Majlesi, 2022). Gestural matching would not only express understanding, as it allows “L1 speakers to highlight part of the previous turn as the focus of instruction” (Majlesi, 2022, Abstract section, para. 1). Previously, Koschmann and LeBaron (2002) had already indicated that gesture alignment “is an important mechanism for establishing semantic links across turns at talk” (p. 271).

1.4 Current study

This study looks at gesture alignment in teacher–student interaction in office hour consultations, a one-to-one encounter that takes place in universities. In this form of academic talk, the topic to be discussed is usually proposed by the student (Limberg, 2010). Teachers have a prominent part in the “negotiation of academic business,” and they are asked to exhibit their “understanding of the academic issue at hand” (Limberg, 2007, p. 186). These consultations also correspond to L1–L2 dialogues: teachers were native speakers of English, and students were native speakers of Spanish. From the evidence gathered in this section, we can identify the following gaps in the literature: the available studies have based their analyses on case studies, which are descriptively enriching but fail to indicate patterns in their datasets. It becomes necessary to determine if the insights gathered by these case studies can also be found in different contexts. Additionally, the direction of alignment has been addressed in a few studies but mostly from the perspective of the teacher matching the students’ gestures. The communicative function of students matching the teachers’ gestures has been based on singular observations. Finally, the temporal dimension of alignment in educational environments has not been explicitly included as a research goal. From previous studies, it is possible to notice that there is not yet a clear parameter to assess the temporal proximity between matching instances. From this diagnosis, our research questions are as follows:

• What are the patterns of gesture alignment in teacher–student interaction during office hour consultations that are also L1–L2 dialogues?

• What is the direction of alignment in office hour consultations that are also L1–L2 dialogues?

• What is the temporal dimension of gesture alignment in office hour consultations that are also L1–L2 dialogues?

2 Materials and methods

The original dataset consisted of 27 semi-guided recordings of office hour consultations, which were collected as part of the EuroCoAT project, a 3-year research project that looked into the use of metaphor in these settings (MacArthur et al., 2015). The videos were recorded in five universities located in England, Ireland, the Netherlands, and Sweden between April and November 2012 (for more information about the project, see MacArthur, 2016; MacArthur et al., 2015). Forty-eight volunteers in total participated in the study, all of whom received remuneration for taking part. Participants were informed of the aim of the study, which was related to metaphor use in one-to-one academic consultations in English, before giving their consent. They agreed to being recorded and knew their right to stop the recording at any time (MacArthur, 2016). Students were Spanish undergraduates who were spending time abroad on an Erasmus program. All the dialogues were held in English as an academic lingua franca. Participants belonged to different disciplines, such as Hispanic studies, health and safety, journalism and new media, biomedical sciences, and more.

The complete dataset included lecturers who were native English speakers (L1) and lecturers who were native in other languages, such as Greek, Spanish, or Dutch. Additionally, some lecturers participated on more than one occasion. We analyzed the 27 consultations, but, due to the heterogeneity of the original sample, the findings presented here focus on a subset of 12 office hour consultations. The criteria for selecting these videos were: (1) having Spanish undergraduate students and native English speakers, (2) participating only once in the study, and (3) having good-quality images in order to analyze the gestures. Nine of the 12 consultations were opposite-sex dyads, and three were same-sex dyads. In terms of the countries, five interactions were recorded in England, six in Ireland, and one in Sweden. More information about the country, sex, age, discipline distribution, and the duration of the dialogues can be found in Appendix 1.

2.1 Procedure

All the video-recordings consisted of naturally occurring academic consultations. Every conversation took place in the office of the lecturer, which meant, as a result, that every dialogue involved a different layout: the lecturer and the student could be seated face-to-face, but it was common to find multiple objects between them, such as computers, documents, or a table. With the intention of discussing academic topics, researchers asked students to prepare a few questions beforehand on topics related to: “written or other assignments that they had completed or were in the process of completing; the systems of assessment used at the host university for that particular subject; and/or difficulties being experienced in understanding the course contents” (MacArthur et al., 2015, p. 190). Although teachers were not aware of the students’ specific questions in advance, they knew about these guidelines. The researchers in charge of doing the recording set up the equipment and left the room. The researchers established a duration of 10 min for each dialogue, but ultimately this time limit was just used as a minimum. After 9 min, they would knock on the door to let participants know that the 10 min were almost up. However, participants were allowed to keep on talking as much as they wanted. For this reason, the video-recordings have different durations (MacArthur, 2016; MacArthur et al., 2015); information on this can be found in Appendix 1.

2.2 Annotation

The present study mostly uses qualitative methods. The videos were annotated using the software ELAN (Sloetjes and Wittenburg, 2008; Wittenburg et al., 2006), a tool commonly used for analyzing audiovisual materials. The operationalization of gestural alignment was based on Oben’s (2015) method, which looked at interactional prime and target pairs as the main unit of analysis. Although the terms of prime and target may give the impression that an interactive alignment approach was adopted, these notions only seek to reflect the presence of a behavior that is later copied. Other commonly used concepts to refer to the same dynamic are lead and follow (Arnold, 2012) or source and result (Tabensky, 2002). Following Oben (2015), alignment “always involves a behaviour by a first speaker (which we call prime) followed by that same behaviour by a second speaker (which we call target)” (p. 12).

Table 1 presents the codes of the annotation. The first tier deals with gesture alignment, and it was used to identify the first gesture (the prime) and a paired gesture (the target) together with its temporal dimension. The other two tiers are related to gesture form and gesture function, categories that are explained in the following sections.

Table 1

Table 1. Codes for the annotation.

2.2.1 Temporal dimension

Manual annotations were used to determine if the gestures were copied with no time delay (henceforth, Simultaneously), with a delay of a few seconds (Consecutively), or with a longer delay (Later). There are important differences between existing studies when it comes to defining time-windows for matching behaviors. While synchronization or coordination studies usually incorporate time lags ranging from 0.04 to 4 s (Ayache et al., 2021), research on behavior matching, alignment, or mimicry has tended to consider longer stretches of time. Holler and Wilkin (2011) “imposed no restrictions” (p. 140) when they analyzed the temporal distance of matching gestures, which is in line with Kimbara (2008), who did not impose specific temporal criteria to determine the gesture pairs. Louwerse et al. (2012) annotated different behaviors, including gesture, and they determined time lags that did not exceed 25 s.

Considering the diverse existing criteria to address gesture alignment, we included three different time-windows: 250 ms or less (Simultaneous), between 250 ms and 10 s (Consecutive), and between 10 s and 60 s (Later). Instances of gesture alignment beyond 60 s were not considered in the analysis. The time window was counted from the onset of the preparation phase of the gestures. By using an expanded time lag for the Consecutive category, we sought to consider the turn-taking dynamics of office hour consultations, in which turns extend beyond just a few seconds. The Later category was added to avoid missing cases of gesture alignment (i.e., if we had only included temporally adjacent gestures).

2.2.2 Gesture form

One important dimension to determine if two or more gestures are similar is gesture form. This study systematized this category following the gesture notation developed by Bressem (2013). According to Bressem (2013), the notation system focuses “solely on gestures’ physical appearance, directs the attention to the different facets of a gesture’s form, and focuses on its detailed characterization” (p. 1080). Starting from a linguistic perspective, the system gives a detailed description of form that is independent of speech. In the present study, three parameters were included under the label of gesture form: hand shape, orientation, and movement. When it comes to hand shape, there are basically four categories involved: fist, flat hand, single fingers, and combinations of fingers (Bressem, 2013). The next parameter is palm orientation, and we only considered the basic four descriptors of orientation: palm facing up, palm down, palm lateral, and palm vertical (Bressem, 2013; McNeill, 1992). The third and last parameter is movement, a category described by Bressem (2013) as the most difficult one to code. We focused on the type of movement, that is, if the repeated gesture had a similar motion pattern to the first gesture. Within the basic movement types, Bressem (2013) gives the following descriptions: straight movement, arced movement, circle, spiral, zigzag, and s-line.

Every time a prime and target were annotated, the formal parameters mentioned above had to be specified. In other words, the annotation indicated the number of parameters that were shared between prime and target. The repeated gesture could have three, two, or one parameter(s) in common. In addition to sharing at least one parameter, gestures had to belong to the same category of gesture function, which will be described in the next section.

2.2.3 Gesture function

Gestures were taken as being aligned if they shared the same function and at least one formal parameter. There are many existing categories for gesture function, but we used three common functions in the literature: representational, deictic, or pragmatic. Previously, Holler and Wilkin (2011) included similar categories in their analysis, that is, representational gestures (which they called iconic and metaphoric gestures), deictic, and pragmatic gestures (they referred to them as interactive gestures). However, they decided to focus on iconic and metaphoric gestures because deictic gestures “usually adopt a very limited range of forms anyway” (Holler and Wilkin, 2011, p. 139). A similar argument was used with interactive gestures, as they would “usually involve the palm facing upwards, the hand being open and directed towards the addressee” (Holler and Wilkin, 2011, p. 139).

Representational gestures have a referential function, as they “may represent some feature(s) of a referent in the verbal utterance” (Cienki, 2017, p. 139). These gestures can express a physical entity, an idea, an action, or a relation by showcasing a high degree of iconicity. These gestures, according to Kita (2000), display similarities, to a certain extent, “between the shape of the gesture and the entity that is expressed by the gesture” (p. 162). For example, if a gesture refers to an action verb such as “pushing,” it is likely that the hand movements will simulate the concrete action of pushing an object. McNeill referred to these gestures as iconic (McNeill, 1992, 2008), but in his work, these only deal with the representation of physical objects or actions. According to McNeill (1992, 2008), the representation of the abstract is displayed by metaphoric gestures. Previous studies on gesture alignment have mostly focused on iconic and metaphoric gestures (Holler and Wilkin, 2011; Kimbara, 2008; Majlesi, 2022; Rasenberg et al., 2022), because these gestures accompany and complement the information presented in speech.

Deictics or pointing gestures also fulfill referential functions (Cienki, 2017; Kita, 2003). Similar to representational gestures, referents can be physical or abstract objects, and pointing involving the latter has also been referred to as abstract deixis (McNeill, 2010b). When pointing is directed toward physical referents, it “can be accomplished with a motion in the direction of the intended referent (or to put it more precisely: in the direction of where one conceptualizes the physical referent to be, as in pointing at a building that one cannot see at the moment)” (Cienki, 2017, p. 140). During abstract deixis, pointing can be used to refer to abstract ideas, meaning that there is no concrete target. Speakers use these deictics while pointing to a “seemingly empty space in front” (Kita, 2003, p. 6). Previous research has shown the cultural, biological, and semiotic complexities of these gestures (Kita, 2003).

Finally, there are two types of gestures that have been placed under the label of pragmatics: those with a discourse-related function and gestures with pragmatic functions. In the first category, Kendon (2017) specifies how manual gestures or, as the author calls it, “kinesic action” (p. 168) appear “to make distinct different segments or components of the discourse, providing emphasis, contrast, parenthesis, and the like, or where it marks up the discourse in relation to aspects of its structure such as theme-rheme or topical focus” (p. 168). As an example, when speakers want to give prominence to certain stretches of speech, it is common to see the presence of “batonic movements of the hand” (Kendon, 2017, p. 171) to express it. Thus, discourse-related gestures are the ones that speakers use to parse their discourse and structure it through the display of gestures (Cienki, 2017; Kendon, 1995). The second category corresponds to gestures with pragmatic functions (Kendon, 1995, 2017), which have also been labeled as interactive gestures (Bavelas et al., 1992; Kendon, 1995) and recurrent gestures (Bressem and Müller, 2014; Ladewig, 2014). These categories, however, are not as straightforward as they seem since every author might include different gestures as belonging to them. Pragmatic gestures are those in which the gesture primarily performs a speech act (Cienki, 2017) or, as Kendon (1995) would argue, gestures that “appear to give a visible expression to the illocutionary act intended by the speaker” (p. 264). On the same note, Bressem and Müller (2014) identified a catalog of German recurrent gestures, all of which “manifest the speech acts or illocutionary force of what a speaker is saying” (Kendon, 2017, p. 171).

In conclusion, the annotation of gesture alignment was based on two main criteria: gesture form and gesture function. When it came to gesture form, three parameters were considered: hand shape, orientation, and movement. Gesture function was characterized by three broad categories: representational, deictic, and pragmatic. It is generally acknowledged that gestures can serve more than one function at the same time. For this reason, the annotation process sought to determine the most predominant function in each case by annotating each video more than once, discussing the ambiguous cases, and resorting to a second coder to analyze a subset of the data. This process will be discussed next.

2.3 Reliability

A second coder analyzed three videos, that is, 25% of the total dataset (12 videos). The second coder was an external researcher with experience and knowledge of gesture studies. A set of guidelines were written to explain and provide examples of the categories included in the annotation. The videos did not contain any annotations, and the second coder was asked to carefully watch the videos and determine the presence of gesture alignment considering the abovementioned criteria. Once she had finished annotating the videos, a Cohen’s kappa was obtained for the three videos. Cohen’s kappa is a statistical measure that is used to quantify the level of agreement between raters, and it goes from −1 to +1. A kappa coefficient of 0.0 shows a level of agreements that is due to chance, whereas a kappa of 1.0 represents a perfect agreement (Holle and Rein, 2013). The analysis was performed using ELAN, based on the work of Holle and Rein (2013), and the annotations needed to have a 50% overlap to match. All the values can be found in Table 2. Various authors have specified that the quality of the reliability can be expressed in different categories: from poor to almost perfect (Landis and Koch, 1977) or from virtually none to substantial (Shrout, 1998). For this reason, we have added a column indicating the strength of the agreement. Raw agreements were also included, but these do not consider chance agreement (Holle and Rein, 2013).

Table 2

Table 2. Inter-coder reliability of 25% of the data.

As can be perceived from Table 2, most dimensions obtained a low agreement (Holle and Rein, 2013), which happened due to many cases of unmatched annotations. These unmatched annotations could be explained by the following factors: (1) the second annotator obtained files without annotations in them, which shows the difficulty of analyzing alignment “in the wild”; (2) the second annotator watched the videos two times, whereas the first author watched them on several occasions; (3) the presence of different categories that needed to be applied. After Cohen’s kappa was calculated, the first author and second annotator discussed each video to reach agreement in the annotations. In this sense, while the Cohen’s kappa is low, many of the unmatched instances were addressed in the discussion, and cases in which both parties would not reach agreement were not considered in the final analysis. The annotations from the videos that were not checked by the external annotator were reviewed after the meeting in order to apply similar criteria to the rest of the sample.

After finishing the first round of annotations, it became clear that the identification of gesture alignment in the Later category presented different challenges, especially when dealing with pragmatic gestures, as most of these gestures are performed with the palm open, either facing up or down. When these gestures appeared after the 10-s time window, it was difficult to determine the prime and target due to their pervasiveness in communication. For this reason, it was agreed that the Later category would focus on representational and deictic gestures, as these gestures fulfill referential functions, and it was possible to assume that the gestures shared a common referent or topic. The Later category was still applied with pragmatic gestures, but only if these were different from palm-open gestures.

3 Results

3.1 Overview of the results

The office hour consultations lasted an average of 840.9 s, that is, close to 14 min (SD = 242.11). In total, we found 148 prime–target pairs during the consultations with an average of 11.9 prime–target pairs (SD = 3.8) per dyad. Table 3 presents the number of gesture pairs in every video. It also considers the amount of time in the consultation that the teacher and student were talking. These values were taken from ELAN, and they consider the stretches of time in which speakers were actively talking. It is possible that participants were holding the floor for longer, because significant pauses in speech were not considered in these values.

Table 3

Table 3. Number of gesture pairs per video.

As the table shows, teachers spoke more than students in most of the consultations with teachers speaking, on average, for 561.6 s, that is, 9 min and 22 s (SD = 219.69). The longest time in which a teacher talked happened in video UI6 with 1144.4 s or 19 min and 4 s. When it comes to students, they spoke on average 264.9 s, that is, 4 min at 25 s (SD = 69.2). Video UE5 was the only one in which a student spoke longer than a teacher: 304.94 s (5 min and 5 s) by the student versus 290.7 s (4 min and 51 s) by the teacher. A two-sample Kolmogorov–Smirnov test indicates that the distribution of the students’ and teachers’ time is not the same (D = 0.91667, p < 0.01). This is in line with what MacArthur (2016) had previously indicated about this corpus of office hour consultations: the “corpus as a whole comprises 55,718 words, of which 38,384 were uttered by the lecturers and 17,280 by the students” (p. 31). The number of words and the time spent talking reflect the relevant role of teachers in office hour consultations.

In the next sections, we describe our findings following the categories included in the annotation, that is, the temporal dimension of the alignment, the direction of the copying gestures, and a characterization of the gestures according to their function and form. The first sections give an overview based on the frequency and percentages of each category. After this, we present an in-depth analysis of relevant examples of gesture alignment. Through these cases, we introduce some roles that gesture alignment plays in this specific form of teacher–student interaction.

3.1.1 Temporal dimension

The analysis considered three categories: Simultaneous, Consecutive, and Later. Out of the 148 pairs, 105 gesture pairs (73.4%) appeared within the 10-s time lag in the Consecutive category, 29 pairs (20.3%) were in the Later category, and 9 gestures (6.3%) were coded in the Simultaneous category. Figure 1 shows the temporal dimension of the gestures. As it can be seen, there is variation between consultations regarding the temporal dimension of alignment. While most interactions have Consecutive gestures, there are some with a higher frequency of gestures in the Later category (see UE5 or UI5).

Figure 1

Figure 1. Frequencies of Consecutive, Later, and Simultaneous categories in each consultation.

3.1.2 Direction of alignment

In terms of the direction of alignment, 81 (56.6%) of the gestures were copied by teachers, and 62 gestures (43.4%) were copied by students. However, a two-sample Kolmogorov–Smirnov test indicates that the distribution of the direction of alignment is the same (D = 0.3, p > 0.05). Following Figure 2, the results show important differences between consultations, as in some cases teachers tended to copy students more (see, e.g., UI2 or US1 in Figure 2) and some students tended to copy teachers more (see UE1 or UE2).

Figure 2

Figure 2. Frequencies of the direction of alignment per consultation.

We also analyzed a subset of the data to see if there were differences in who copied whom, depending on the temporal dimension. We looked at the Consecutive cases and saw that, out of the 105 pairs, 57 gestures were copied by teachers and 48 gestures were copied by students. However, the difference in distribution was not significant (χ²(1) = 0.77, p = 0.37). We also looked at the Later cases, where we found 29 gesture pairs in total, and 22 of them (75.9%) involved copying by teachers versus 7 cases (24.1%) of copying by students. The difference between both distributions was significant (χ²(1) = 7.75, p < 0.05). It is not surprising that more gestures were copied by teachers in the Later category considering that teachers speak significantly more than students in office hour consultations.

3.1.3 Characterization of gesture alignment

The annotation included three categories for gesture functions: representational, pragmatic, and deictic gestures. From the 143 gesture pairs, we found 52 representational gestures (36.4%), 57 pragmatic gestures (39.9%), and 34 deictic gestures (23.8%). Figure 3 displays matching gestures according to their gesture function in each consultation. Pragmatic gestures are present in most of the dialogues, as well as representational gestures. In the case of deictics, there is significant variability, as some students and teachers used them on various occasions (see, e.g., UI8), while in others they are not present (UE2, UE4, or UE5).

Figure 3

Figure 3. Frequencies of aligned gestures per consultation according to their function.

We checked if similar results were obtained by analyzing subsets of the data based on the temporal categories (see also Table 4). The prime–target pairs that appeared in the Consecutive category showed interesting differences between gesture functions. Out of 105 gestures in this dimension, 49 of them were pragmatic (46.7%), 26 gestures were representational (24.8%), and 30 gestures were deictic (28.6%). The use of pragmatic gestures was the highest in terms of frequency, contrary to the overall results, where representational and pragmatic gestures were similar in frequency. The copying of pragmatic gestures in the Consecutive category was done almost equally by teachers and students, concerning 27 gestures (55.1%) and 22 gestures (44.9%), respectively. Deictics were used less frequently than pragmatics, but they followed the same trend in the direction of alignment, since teachers copied 17 of the students’ pointing gestures (56.7%) and students repeated 13 of teachers’ deictics (43.3%). Representational gestures were equally likely to be copied by teachers (N = 13, 50%) and students (N = 13, 50%).

Table 4

Table 4. Frequencies of aligned gestures per temporal dimension according to their function (* reflects a significant result).

The sample of gesture pairs along the Later category was small, with 29 gestures, and most of these gesture pairs involved representational gestures (N = 22, 75.9%). There were also three pragmatic gestures (10.3%) and four deictic gestures (13.8%). Out of the 22 pairs of representational gestures, 16 instances (72.7%) were copied by teachers, and 6 were matched by students (27.3%). It appears that certain representational gestures introduced initially by students were brought up by teachers to reply to their questions. It could be that these gestures had specific characteristics that made them more salient to teachers and, therefore, they reused them in the consultation. To sum up, the analysis of gesture function in this dataset showed that most gestures were pragmatic and then followed in frequency by representational gestures. The direction of alignment was similar in both cases, but there were differences when we considered the temporal dimension of the copying. For a clearer overview of these frequencies in the Consecutive and Later temporal dimension, see Table 4.

In addition to the gesture function, gestures needed to share formal parameters (handshape, orientation, movement) to be considered as aligned. Table 5 presents an overview of the formal parameters of the 148 gesture pairs. Most gestures shared three parameters (51 gestures) or two parameters, such as handshape and orientation (24 gestures) and handshape and movement (20 gestures). Handshape on its own and combined with other parameters was a relevant feature to determine the presence of alignment. Movement was important to determine 18 cases of gesture alignment, and, though smaller in frequency, there were some cases of orientation and movement, and orientation alone. A summary of aligned gestures considering their function and form can be found in Appendix 2.

Table 5

Table 5. Overview of formal parameters in matched gestures.

This section has introduced an overview of the 143 prime–target pairs found in the 12 office hour consultations with the goal of answering the research questions of the study regarding the direction of alignment, the temporal dimension of matching gestures, and the description of the main patterns of gesture alignment in this form of teacher–student interaction. Section 3.2 provides examples of the communicative role of gestural matching during consultations, and its relevance when meaning is being negotiated.

3.2 Case studies

Gesture alignment in educational settings has been found to play a role in the co-construction of meaning or in the establishment of mutual understanding. There are many ways in which this can be seen in office hour consultations. The following examples depict different layers of these meaning-construction processes, which can be summarized as follows: (1) gesture alignment as a tool to negotiate content and signal mutual agreement, (2) gesture alignment as a helpful resource for overcoming L1-and L2-related issues, (3) gesture alignment as default due to the presence of recurring gestural forms in interaction.

3.2.1 Gesture alignment as a tool to negotiate content

Gesture alignment has been said to reflect mutual understanding between speakers, and, especially in the classroom, it has been shown to highlight teachable moments (Majlesi, 2015). The analysis of the aligned gestures used in office hour consultations evidenced that shared understanding can be expressed in various ways: from the active construction of knowledge to the presentation of similar gestures to state agreement. The following example illustrates how gestures can be a relevant educational resource when teachers and students are actively constructing meaning. In the following figures, letters in bold show when the gestures are used, from preparation to retraction, and underlined letters indicate an emphasis in speech. The speech section includes the intonation units of each speaker (Chafe, 1994), which are separated by a slash mark. According to Chafe (1994), these units act as “the linguistic expression of information that is, at first, active in the consciousness of the speaker” (p. 69).

The excerpt in Figure 4 comes from an office hour consultation between a female lecturer and a male student in the discipline of Business Studies. In the sequence, the teacher answers a student’s question regarding the materials needed to prepare for the exam; the student notes that he has already read the tutorial and material from the lectures, but that it was difficult for him to read them in English. In frame A, the teacher replies by saying that the student needs to read the class readings, because they provide a foundation for the content (“but I do need you to have that foundation level first”). In the bold portion of her speech, she produces a gesture with both hands open and facing down. One hand is placed on top of the other, thus representing the idea of one thing resting on the “foundation” of the other, which can be seen as a metaphorical gesture in the sense that she is representing having theoretical underpinnings, an abstract idea, by showing a physical foundation.

Figure 4

Figure 4. Teacher and student matching gestures referring to the “base” or “foundation” (A–D).

The lecturer resorts to variations of this gesture on several occasions (see frames A and B), a common strategy when teachers explain concepts, as we explained through the notion of catchments (McNeill, 2010a; Pozzer-Ardenghi and Roth, 2008). The self-repetition of the gesture reinforces the idea of an abstract foundation, and, at the same time, the metaphor is being enriched by these variations. Her hands represent the readings and other course materials, which are important elements of the theoretical foundation. Frame B illustrates the last time she uses the gesture, as she says: “you have to have that done first.” In frame C we see the presentation of another metaphor related to the idea of the foundation, but different from the previous one, because she represents a pyramid with her hands. Instead of using the gesture of the pyramid, the last gesture to be depicted by the teacher, the student goes back to the gesture representing the foundation (see frame C). His palm is facing down, and he does a horizontal movement that traces the “base” of the foundation. At the speech level, the student only says “the,” but, a few seconds later, he repeats a similar gesture while adding, “but have the base” (see frame D). The student reuses the gesture and, although there are only a few seconds in between frames C and D, there are differences between the gestures: he adds information via speech (“the base”) and, contrary to the first gesture, his fingers are bent. Both gestures related to the base are a way of elaborating on the teacher’s ideas and expressing shared understanding. However, this mutual understanding is actively being negotiated with the self-repetition of the teacher, the matching gesture of the student, the self-repetition of the student, and the verbal reformulation of the student (“the base”). Thus, although the teacher is the one providing the answer, the student also participates in this co-construction of knowledge.

Following McNeill (2010a), the notion of catchment refers to self-repetition. However, the concept has also been used to describe both phenomena: self-repetition and other repetition. From the previous example, we notice that catchments and alignment are related, in the sense that both processes—self-repetition and other repetition—can interact in environments where the main goal is achieving common understanding. Future research could determine if the presence of self-repetition can have an impact on the addressee’s gesture production. Oben and Brône (2016) have already described how cumulative priming promotes gesture alignment between speakers. The notion of cumulative priming, according to the authors, is based on the hypothesis that “the more the interlocutors hear/see a word/gesture, the more likely it will be that they align to that word/gesture” (p. 39).

In the previous case, the student uses the teacher’s conceptualization of the course materials, which were represented as the foundation of the class. The next example shows the opposite direction of alignment, since the teacher is the one using the student’s gestures. Figure 5 contains a sequence extracted from a consultation between a male teacher and a male student in the discipline of English Studies.¹ The teacher asks the student if he likes the division between lectures and tutorials that they implement at the university in Ireland, and, while asking this question, he opens his hands with the palms open and facing each other (frame A in Figure 5) and distinguishes the lectures and tutorials as two imaginary objects in space, one located in the left and the other one in the right (frame B). The teacher uses a demonstrative (“do you prefer that or do you prefer…”) that is subsequently reused by the student in his speech (“I prefer that”). However, instead of displaying the same gestures previously introduced by the teacher (i.e., the representation of lectures and tutorials as two objects in space), the student uses a deictic to highlight his preference for the Irish system (see frame B) by pointing toward the ground. The deictic gesture goes in line with the speech as a beat gesture in frame C, performing similar downward movements which each intonation unit. In frames D and E, the student distinguishes between the system in Ireland (pointing to the ground) and the one in Spain (pointing to the right side). Frame E presents a deictic gesture in the form of a palm up open hand as he says, “what I actually do in Spain.” The teacher asks a follow-up question, seeking to clarify the student’s response (see frame F). He uses the demonstrative once again (“you prefer this”) with both palms facing down. Finally, in frames G and H, the student makes two similar deictic gestures, reinforcing the previous idea that he prefers the Irish system.

Figure 5

Figure 5. Pointing gestures used to establish a comparison between Ireland and Spain (A–H).

The deictic gestures shown in Figure 5 are cases in which concrete deixis can reflect abstract referents. In this case, interactants present an opposition between here and there, where “here” is the place where the interaction is taking place, and, at the same time, it refers to the Irish educational system. On the contrary, “there” is a place located at the right side of the student, which corresponds to the Spanish educational system. Figure 5 is a clear example of how gestures and speech can be recycled to clarify information. The sequence begins with the teacher asking “do you prefer that” (frame A), referring to the Spanish system, but the student expresses a mismatch by replying “I prefer that” (frame B) and pointing to the ground. In this sense, verbally they are repeating the same words, but they are referring to different referents, which creates the misunderstanding. Speech (the verb to prefer and the demonstratives this/that) and gesture (deictics) are used by both speakers in order to reach understanding about the opposition between the Irish and the Spanish system. Once again, we see cases of self-repetition that interact with other repetition. The student uses the same gestures in frames B–C and G–H, which go in line with his intonation units. While the location of the gestures in frames B–C and G–H is not the same, since gestures in frames B–C are on the right side of the student, and gestures in frames G–H are more toward the left, this aspect was not considered in our annotation. We also notice changes in the direction of alignment, as the student copies the teacher, but then the teacher copies the student, and so on.

Alignment is introduced on a turn-by-turn basis in Figures 4, 5, and various resources can be used to achieve shared meaning. However, mutual understanding can be expressed in different ways. The following sequence (Figure 6) is an example of how paired gestures can be used to elaborate on ideas later in the interaction. This excerpt comes from a consultation in the discipline of English Studies between a female teacher and female student. The teacher is explaining to the student that she needs to read the seminar notes and readings to check if there are any open questions or debates. When she refers to the different opinions that people might have, both of her palms are open and facing up, but her fingers are curved as if she were holding an object in each hand (frame A in Figure 6). Her hands perform upward and downward movements, as if she were weighing different options in her hands. The student, then, asks another question: she has read the course material, which includes the slides, seminars, and essays, but after going through the material she has not found similarities between the lectures and seminars (frame B). In that same frame, we notice that, when she utters the word “similarities,” the student makes a similar gesture as the one presented by the teacher, but this time 20 s after the first gesture. The topic, however, is not an exact reproduction of the one discussed by the teacher, but the student builds on what the teacher previously expressed regarding finding topics in the course material that spark debate. The student is not able to follow the teacher’s advice, because she has not been able to find a connection between all the elements of the course material. Both gestures are almost identical, but the first gesture highlights the differences of opinions that can be found in the course material, and the second one expresses the similarities or connection between lectures and seminars.

Figure 6

Figure 6. Teacher and student using metaphorical gestures to convey the meaning of differences and similarities (A, B).

The example in Figure 6 is different from the previous examples due to two main aspects: (1) the overall topic, because they are not referring to the same ideas, and so the student uses matching gestures to focus on the similarities instead of the differences; and (2) the timing, because these gestures are presented sequentially within the Later time window. Despite the differences, the repeated gesture performed by the student could also be considered as a way to construct common ground between speakers.

As a fourth, and final, example, we include a common sequence found in the dataset, where matching gestures are used to signal agreement in a unimodal (gestures alone) or multimodal (gestures and speech) way. Contrary to previous sequences, participants do not repeat each other’s gestures to clarify information or construct their own understanding of the content; they use similar gestures to express obviousness or agreement.

Figure 7 comes from a consultation in the discipline of Hispanic Studies between a male student and male teacher. Previously, the student asked about the assessment of the class, and, in frame A, the teacher explains the elements that are considered in the evaluation. At the end of his response, the teacher opens both of his palms and they face up, a version of the palm up open hand (henceforth: PUOH; frame A in Figure 7). Simultaneously, the student makes a similar gesture, although with one hand and with fingers bent (frame B in Figure 7).

Figure 7

Figure 7. Palm-up open-hand gesture to signal agreement (A, B).

The PUOH is pervasive in communication and has been shown to fulfill various functions (Bavelas et al., 1992; Cooperrider et al., 2018; Müller, 2004). In this example, the first PUOH could be presenting information, whereas the second PUOH gesture appears to indicate agreement or obviousness, one of the many semantic fields associated with this gesture (Cooperrider et al., 2018; Kendon, 2004). However, it could also be that both PUOH gestures are indicating obviousness. According to Cooperrider et al. (2018), the implicit question being expressed in the latter case is: “How could it be otherwise?” or “What else could one say?.” The idea of obviousness or agreement seems consistent with the speech, since the student is repeating “yeah yeah” as he is performing the gesture. We can argue that gesture alignment, in this case, is only used to signal agreement with what was said by the teacher.

In this section, we have described four examples in our dataset in which gesture alignment is used to construct, clarify, or express shared understanding. We have included sequences in which representational, deictic, and pragmatic gestures are shown to be important to construct agreement in office hour consultations. Matching gestures have been shown to be useful in various ways, from discussing and negotiating content, to signaling agreement or obviousness. However, these consultations were also L1–L2 dialogues, and so we want to highlight cases where gesture alignment contributes to disambiguate and, more generally, overcome L1–L2 issues.

3.2.2 Gesture alignment as a helpful resource for overcoming L1-and L2-related issues

Research on gesture alignment in educational contexts has shown that matching gestures play a role in dialogues between native and non-native speakers (Majlesi, 2022; Smotrova, 2014; Smotrova and Lantolf, 2013), especially when it contributes to disambiguating words in the L1 language. In the literature, most conclusions come from second language acquisition classrooms or language cafés. The consultations included in this study did not belong to those specific contexts, but the fact that teachers were L1 English speakers and students were L1 Spanish speakers with varying levels of English created instances in which gesture alignment proved to be an essential tool to reach understanding. Once again, the goal was to reach agreement, but instead of dealing with course-related issues, gestures are copied to solve communication problems or fill in the gaps at the speech level.

Figure 8 contains a sequence taken from an office hour consultation in the discipline of Health and Safety between a male lecturer and a female student. As they are reaching the end of the consultation, the student asks what happens if she fails the exam, because she is in her last semester and is an Erasmus student, so she will not be at that university for the repeat exam. Before answering her question, the teacher asks if she is in her fourth year (frames A and B). In both frames, the lecturer uses more gestures to be clear about what he is asking: “you” with a deictic gesture (frame A) and “four” showing the number four with his hands (frame B). The gesture representing the number four is positioned in front of the student, making it easier for the student to see it, to which the student replies with a “yeah” (frame C) and proceeds to do a palm-up gesture with her fingers showing the number four. Frame C also shows the moment in which both teacher and student are using the same gestural handshape.

Figure 8

Figure 8. Teacher and student representing the number four with their hands (A–C).

Similar to a previous example, the PUOH signals agreement, but, at the same time, the student is replying to the teacher’s question by representing the number four with her hands. The gesture replaces a complete spoken answer through speech, such as “yes, I am in my fourth year.” Gestural matching can be useful in L1–L2 dialogues when there are differences in language skills between interactants. This relates to the concept of foreigner talk, a form of talk that takes place when native speakers interact with a non-native speaker and adjust how they speak to be understood better (Beebe and Giles, 1984; Tellier et al., 2021). The excerpt in Figure 8 can be considered as an example of this type of talk because the teacher accompanies his speech with prominent gestures consistent with his speech. The deictic performed by the teacher as he is uttering “you” and the representational gesture with the number four act in unison to formulate a question that the student understands. In conclusion, we have included an example that shows the use of gesture alignment to disambiguate communication in L1–L2 dialogues. During these consultations, the presence of gesture alignment was also used to clarify or facilitate communication between speakers, which certainly interacts with other concepts, such as foreigner talk.

3.2.3 Gesture alignment as the default due to the presence of recurring gestural forms in interaction

There are many types of gestures that appeared in our dataset: various types of palm open gestures, deictics toward concrete or abstract referents, concrete representational gestures, metaphorical gestures, just to name a few. We have sought to provide a representative overview of the data, but it is important to mention that some cases of alignment happened as the default due to the presence of recurrent gestural forms in interaction. Figure 9 describes a sequence in which teacher and student make use of a recurrent gesture, gestures with recurrent forms that “have undergone processes of conventionalization” (Ladewig, 2014, p. 1560).

Figure 9

Figure 9. Precision grip being used by teacher and student (A, B).

The excerpt comes from an office hour consultation in the discipline of Business Studies between a female lecturer and a male student. Both speakers display a gesture called the precision grip with the thumb and forefinger, which sometimes takes the form of a ring (Bressem and Müller, 2014; Kendon, 1995, 2004). Scholars agree in saying that the semantic core of this gesture is the indication of precision, so it can be used to specify, clarify, or emphasize something said by the speaker (Bressem and Müller, 2014; Kendon, 1995). In the example, the student asks about the assessment of the exam, and the teacher replies by saying that he needs to reply to the main points, adding that “that’s what’s important” (frame A in Figure 9). During this portion of her speech, she does a precision grip with her left hand (frame A), highlighting the “most important” aspect. After more than 30 s, and based on the answer given by the lecturer, the student asks about the marking scheme. He wonders what happens with the assessment score if he does not write the exact statements that appear in the marking scheme. While he is saying “I do not have the exact” (frame B), he uses another precision grip gesture. On this occasion, the gesture highlights the idea of “exact statements,” similar to the notion of “most important,” which was previously expressed by the teacher. The gesture was one of the few cases of pragmatic gestures that happened in the Later category.

Other recurrent gestures that appeared in the interactions were cyclic gestures (Ladewig, 2011), a gesture with a continuous circular movement of the hand that has been found to express cyclic continuity, a process, and duration (Bressem and Müller, 2014). The addressee might resort to a similar gesture due to multiple reasons: it could be influenced by the first gesture (or prime); it could be used to achieve mutual agreement, as in previous examples, or elaborate on what was previously indicated by the speaker; or it could be that these recurrent gestural forms are present in these dialogues simply due to their pervasiveness in communication. Especially in the Later cases, one might wonder if these are cases of alignment or rather recurrent gestures that are highly productive in this setting.

4 Discussion

Gesture alignment, or gestural matching, has been studied for decades in experimental and naturally occurring settings. However, when reading the literature, it has become clear that research within psychology and (cognitive) linguistics, has tended to focus on iconic gestures (including metaphorical gestures) and has controlled the referents that participants need to discuss, whether these are cartoons, tangrams, or items in some sort of controlled task. These studies have also given priority to some dimensions related to meaning that have not been explicitly considered in this research. Furthermore, research on gestural matching in formal or informal educational contexts has been based mostly on observational studies from a perspective of interaction studies or Vygotsky’s sociocultural theory. There are extensive examples of gesture alignment in classroom interactions, but the results are scattered in terms of their findings due to the use of different terminologies and theories to explain gestural matching (e.g., catchments, internalization, or dialogic embodied action). The notion of catchments appears as a relevant concept that needs to be differentiated from alignment. Although we have summarized it as self-repetition, we have also seen examples in the literature where both concepts have been used interchangeably. Similarly, it seems like there is a relationship between self-repetition and other repetition, as we have previously discussed in the selected case studies, which has also been described by Oben and Brône (2016).

The present study had a descriptive goal, as it has sought to present the main trends of gesture alignment, understood as the cross-participant repetition of gestures, in a specific type of teacher–student dialogue: office hour consultations. As we have stated, office hour consultations are known as spaces where students ask their course-related questions, and lecturers provide answer to these questions. These consultations were held between L1 speakers of English, the lecturers, and L2 speakers of English, the students. The topics covered by these consultations were diverse, from the course assessments to technical concepts of the given discipline. Our analysis included 12 videos out of a corpus of 27 dialogues, and we were able to characterize gesture alignment in the following way.

Our first research question dealt with the patterns of alignment in teacher–student interaction. Most aligned gestures in the consultations were pragmatic gestures, and they were mostly introduced by students, even if they spoke significantly less than teachers did. Representational gestures were oftentimes used to negotiate or clarify content, and a similar process happened with deictics when they referred to abstract referents. We included examples of these gestures in our in-depth descriptions. Other gestures also appeared in our dataset, such as deictic gestures when they are pointing to concrete referents. Especially in educational contexts, teachers use resources available in the space around them to explain ideas. Office hour consultations are not an exception, considering that teachers tend to point at computers, essays, and guidelines, among other materials. The function of deictics in teaching has been explained by different scholars (Alibali and Nathan, 2012; Majlesi, 2014), but future research could address this in the context of gesture alignment.

We presented examples of the different functions of gesture alignment in educational discourse. The first one relates to the construction of meaning, and we showed different manifestations of shared understanding. The initial sequences dealt with the negotiation or clarification of content using representational and deictic gestures. These cases introduced gestures that were presented on different occasions to reach an agreement about the topic being discussed. Mutual understanding was also expressed with the presentation of palm-up open-hand gestures, which are useful resources to signal agreement. At the same time, these gestures can be performed with or without speech. The second case highlighted instances where gesture alignment was useful to solve issues in L1–L2 dialogues. As discussed in previous sections, most findings of gesture alignment in educational interaction come from second language acquisition classrooms. One characteristic of these examples is that gestures are introduced to help speakers reach understanding. In our examples, gestures are performed to summarize sentences, as we saw when the student recycled the gesture with four fingers extended instead of saying, “I’m in my fourth year.” In the third and final case, we included an example of gestural matching through the precision grip, which showed the recurrence of specific shapes in certain communities. Recurrent gestures, such as cyclic gestures or palm-up open-hand gestures, could be used to express mutual understanding, but they might also occur because of their pervasiveness in communication. This study did not seek to test a theoretical hypothesis regarding alignment, but these findings seem to support previous studies that have highlighted the role of gesture alignment in grounding processes (Chui, 2014; Cienki et al., 2014; Holler and Wilkin, 2011; Majlesi, 2015; Tabensky, 2002).

Our second research question was related to the temporal dimension of alignment. In this sense, the time between prime and target showed to be an important parameter to identifying aligned gestures. Although we were able to find repetition in longer stretches of time, our results showed that most gestures appeared within a 10-s time window. The identification of gesture pairs should include the temporal proximity or adjacency between prime and target, as other scholars have already stated (Louwerse et al., 2012; Rasenberg et al., 2020). In the case of the Simultaneous category, that is, gestures being performed within a time window of 250 ms or less, there were rare cases of matching gestures in our dataset. Most cases of the Later category were found during the explanations given by teachers, which makes sense if we consider that teachers talked more in the consultation. In relation to this, our third research question addressed the direction of alignment. Our results showed that both teachers and students copied each other, but there was important variation between consultations. Especially in the Consecutive category, both teachers and students copied each other for different reasons, which were described in our qualitative studies.

These findings contribute to the field of gesture studies by systematizing the existing scattered studies on gesture alignment in teacher–student interaction. They also address gaps in the literature by explicitly studying the temporal dimension and direction of the alignment. There are, however, aspects that should be considered by future research. We applied descriptive statistics to the data, but we did not include a statistical baseline of gestural matching. Studies with an emphasis on quantitative analysis usually include this baseline to determine if the alignment happens due to chance. For this reason, we have tried to describe trends, and have avoided making claims about what counts as low or high levels of alignment. Future studies could take this into consideration to obtain more robust conclusions about teacher and student dynamics. Related to this, future research on alignment could also find ways of normalizing the data, if natural interactions are being used. The dataset considered teachers and students, which were also L1 and L2 speakers. It would be interesting to see if similar findings happen with teachers and students with the same L1.

As we previously mentioned, the original purpose of these consultations did not include the analysis of gestures or alignment. The goal of the videos was to analyze metaphor in speech; therefore, there could be many aspects explaining the presence or absence of gesture alignment, such as the gender of the dyads, asymmetry between interactants, the topics being discussed, the length of the interactions, individual differences, and so on. A follow-up study could include and control for these aspects in a deliberate way. Another limitation was the low agreement between raters at the beginning, despite the gesture expertise of the second annotator and the presence of clear guidelines to support the annotation. The low agreement was addressed by discussing every case in 25% of the videos, but clearer criteria are needed to overcome the inherent difficulties of analyzing gesture alignment in natural interactions. In this sense, the analysis of natural conversations should always consider an external coder to discuss the annotation criteria. While some cases of alignment were straightforward when it came to their identification, others required further discussion, which shows that there might be degrees of alignment (i.e., how aligned are two instances?). The presence of parameters makes the identification procedure clearer.

We have focused on gesture alignment in educational settings, but we have highlighted the specific dynamics that take place in office hour consultations. This means that these findings might differ in other environments, such as group interactions inside the classroom. It is possible to speculate that more dialogical approaches inside the classroom might enable more or different cases of gesture alignment, but this could only be answered by future studies. Finally, this paper has described one level of alignment, but the concept of multimodal alignment has been used by scholars when they analyze two or more semiotic levels (Louwerse et al., 2012; Oben and Brône, 2015, 2016; Rasenberg et al., 2022). The role of lexical and gesture alignment in educational environments should be considered by future research conducted in educational contexts.

Data availability statement

The data analyzed in this study is subject to the following licenses/restrictions: Participants signed a consent that indicates that images of them will be anonymized. Requests to access these datasets should be directed to Fiona MacArthur ZmlvbmFtYWM1NkBnbWFpbC5jb20=.

Ethics statement

The studies involving humans were approved by Ethics Committee—University of Extremadura (Spain). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

PO: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review & editing. AC: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing. BO: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing. GB: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was supported by the Becas Chile Scholarship Program, ANID, Chile (grant no. 72190473) to follow doctoral studies abroad. The first author wishes to thank the VU Amsterdam for providing financial support during the COVID-19 pandemic, and the program Erasmus+ for supporting a 6-month research stay in KU Leuven, which benefited this research. The second author’s contribution was supported by a fellowship at The Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcomm.2024.1457533/full#supplementary-material

Footnotes

1. ^The video belongs to the extended dataset of 27 videos, because we only included one video per lecturer in the analysis.

References

Alibali, M. W., and Nathan, M. J. (2012). Embodiment in mathematics teaching and learning: evidence from learners’ and teachers’ gestures. J. Learn. Sci. 21, 247–286. doi: 10.1080/10508406.2011.611446

Gesture alignment in teacher–student interaction: a study concerning office hour consultations using English as the lingua franca

1 Introduction

1.1 Gestures in education

1.2 Gesture alignment

1.3 Repetition of gestures in educational environments

1.4 Current study

2 Materials and methods

2.1 Procedure

2.2 Annotation

2.2.1 Temporal dimension

2.2.2 Gesture form

2.2.3 Gesture function

2.3 Reliability

3 Results

3.1 Overview of the results

3.1.1 Temporal dimension

3.1.2 Direction of alignment

3.1.3 Characterization of gesture alignment

3.2 Case studies

3.2.1 Gesture alignment as a tool to negotiate content

3.2.2 Gesture alignment as a helpful resource for overcoming L1-and L2-related issues

3.2.3 Gesture alignment as the default due to the presence of recurring gestural forms in interaction

4 Discussion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

Footnotes

References

94% of researchers rate our articles as excellent or good