Teaching as evolutionary precursor to language

Gärdenfors, Peter

doi:10.3389/fcomm.2022.970069

ORIGINAL RESEARCH article

Front. Commun., 05 December 2022

Sec. Psychology of Language

Volume 7 - 2022 | https://doi.org/10.3389/fcomm.2022.970069

This article is part of the Research TopicChallenges in Language Evolution ResearchView all 5 articles

Teaching as evolutionary precursor to language

Peter Gärdenfors^1,2^*

¹Department of Philosophy and Cognitive Science, Lund University, Lund, Sweden
²Paleo-Research Institute, University of Johannesburg, Johannesburg, South Africa

The central thesis of this article is that the evolution of teaching is one of the main factors that lead to increasingly complex communicative systems in the hominin species. Following earlier analyses of the evolution of teaching, the following steps are identified: (i) evaluative feedback, (ii) drawing attention, (iii) demonstration and pantomime, (iv) communicating concepts, (v) explaining relations between concepts, and (vi) narrating. For each of these step the communicative and cognitive demands will be analyzed. The focus will be on demonstration and pantomime, since these seem to be the evolutionarily earliest unique human capacities. An important step is the transition from pantomime for teaching to pantomime for informing and how this in turn leads to communicating concepts. As regards explaining relations between concepts, the focus will be of the role of generics in teaching and communication. Analyzing these topics involves combining cognitive science with evolutionary theory, archaeology and theories of communication. Two factors are important as a background: (i) the evolution of prospective planning, that is, planning for future goals, and (ii) the evolution of a theory mind. These capacities are central in explaining how more advanced forms of teaching, communication and cooperation emerged along the hominin line.

Introduction

The central thesis of this article is that the evolution of teaching is central for the evolution of increasingly complex communicative systems in the hominin species. Building on analyses of the evolution of teaching (D'Errico and Banks, 2015; Kline, 2015; Gärdenfors and Högberg, 2017, 2021), the focus will be on the roles of demonstration and pantomime as stepping stones to forms of communication that go beyond what is found in non-human animals.

Background

There exist several theories that attempt to explain the evolution of language. It is, however, not sufficient to propose an evolutionary account of the benefits of linguistic communication The question is not just why the hominins evolved language, but, equally importantly, why chimpanzees and other apes did not. Since we share a relatively recent common ancestor, the task is to show that language provided adaptive benefits for proto-humans but not for proto-chimpanzees. Many theories concerning the evolution of language fail this “chimp test” (Bickerton, 2002, 2009; Gärdenfors, 2004; Johansson, 2005; Dessalles, 2007). For example, Bickerton (2007, p. 514) argues that many species are social, so social dynamics is not enough to explain why language is unique to humans. He therefore rejects Dunbar's (1996) thesis that language replaces grooming in the larger hominin groups.

When explaining why language is unique to the hominin line, one must, ultimately, rely on the differences in the ecology of the early hominins and how it differs from that of proto-chimpanzees. One such ecological factor is that the hominins gradually changed their habitat to more open landscapes. As a consequence, the hominins came to use a wider range of foods than other apes. The availability of these food sources was more transient and scattered in the landscape than the mainly vegetarian food that was collected by the other apes.

Analyzing these topics involves combining cognitive science with evolutionary theory, archaeology and theories of communication. In earlier work, I have argued that the ecological factors led the hominins to evolve more advanced prospective planning, that is, to plan for future needs, and to more advanced forms of cooperation (Gärdenfors, 2003, 2013; Osvath and Gärdenfors, 2005; Gärdenfors and Osvath, 2010; Gärdenfors et al., 2012; Geurts, 2020). Cooperating about future goals presumes that one is able to refer to non-present objects and actions—what is sometimes called the displacement (Hockett, 1960) or detachment (Gärdenfors, 1995) of communication. Furthermore, advanced forms of cooperation depend on a more developed theory of mind (ToM) (Tomasello, 1999; Gärdenfors, 2004, 2007). The evolutionary pressure generated from the need for cooperation has thus been one factor that drove the evolution of language (Brinck and Gärdenfors, 2003; Gärdenfors, 2004; Gärdenfors et al., 2012; Sterelny, 2014). A more advanced ToM is also required for intentional teaching.

Then the question becomes how the hominin communication systems evolved from a signaling system that we find in other animals to a system of signs—gestures or words—that makes it possible to cooperate about the future. I shall argue that the evolution of teaching, which can be seen as a form of cooperation, can be used to explain many aspects of the emergence of advanced communication. I will not argue, however, that teaching is the sole evolutionary root of language.

Only humans have natural pedagogy, that is, the capacity to transmit cultural and other forms of knowledge through communication (Csibra and Gergely, 2006). To find a clue to what aspects of hominin life have led to this capacity, two other aspects of the adaptation to foraging in open landscapes should be considered: (i) the manufacture and use of tools that are more advanced than those used by other apes; and (ii) the development of food preparation techniques, in particular cooked food (Wrangham, 2009). In order to be reliably transmitted between generations, these cultural expansions depend on teaching.

When comparing different forms of teaching and different forms of communication I apply the following methodological principle:

Cognitive parsimony: If the cognitive capacities required for an activity or technique A are included in those required for an activity or technique B, then, A is evolutionarily prior to B (unless the additional capacities required for B are evidenced synchronously or earlier than A)¹.

Even though this principle does not make any dating possible, it is at least possible to provide evidence that an activity or a technique is evolutionarily older than another.

I shall first present a summary of the steps in the evolution of teaching proposed by Gärdenfors and Högberg (2017, 2021). I agree with Sterelny (2012, p. 2014) who writes that “If language (or protolanguage) evolved as a system of gesture, the evolution of elaborated manual skill and the evolution of gestural communication would support each other”. Then I analyze demonstration and pantomime and compare them, arguing that cognitive parsimony indicates that demonstration is evolutionarily older than pantomime. I then provide arguments indicating that pedagogical uses of pantomime are evolutionarily prior to the more communicative uses. The teaching of concepts and their relations require further developments of expressive capacities. Finally, some ideas on how pantomime and gesture evolved into narratives are outlined.

The evolution of teaching

Only among humans is teaching intentional, socially structured, and linguistically mediated. Compared to individual learning or trial-and-error activities, the cost of teaching is low in relation to its benefits. Several studies underline intentional teaching as central for cultural transmission when learning complex, cognitively opaque skills such as the making of stone tools (Stout and Chaminade, 2012; D'Errico and Banks, 2015; Kline, 2015; Gärdenfors and Högberg, 2017).

Imitation, emulation, and facilitation

Before analyzing different forms of intentional teaching, I present three activities that are related to teaching. The first two come from Tomasello's (1999) distinction between learning by emulation, where the learner tries to reach the same outcome as that by the model actions, and learning by imitation, where the learner tries to perform the same actions as those of the model (see also Tehrani and Riede, 2008; Caruana et al., 2013).

The “artificial fruit” experiments by Whiten et al. (2005) investigated the differences between the two types of learning. At the beginning, it seemed that chimpanzees emulate while children imitate. More recent studies, however, indicate that the learning of the chimpanzees situation is more complex (Whiten et al., 2009). Imitating familiar motor actions in novel situations seem to be easier for chimpanzees than copying new motor actions (Myowa-Yamakoshi and Matsuzawa, 1999). A difference may be that chimpanzees as well as children often perceive the intention of the model and therefore emulate in such situations (Froese and Leavens, 2014).

Thirdly, an individual facilitates the learning of another by scaffolding the environment so that the learning individual learns faster than it would otherwise have done. An example is meerkats that bring back scorpions, often disabled by removal of their sting (Caro and Hauser, 1992). As the pups grow older, the meerkat adults give the pups increasingly intact prey. However, the adults do not gauge the competence of the pup, but only rely on changes in pup begging calls (Thornton and McAuliffe, 2006; Hoppitt et al., 2008). This suggests that the adults cannot evaluate the competence of the pups and that their behavior is non-intentional.

Levels of teaching

I next present a summary of the six levels of teaching analyzed in Gärdenfors and Högberg (2017, 2021) (for related classifications of teaching, see D'Errico and Banks, 2015; Kline, 2015). The ordering of the levels reflects the degree of ToM that is required as well as the increase in the complexity of the communication between teacher and learner.

(1) Evaluative feedback: The teacher approves or disapproves of the learner's behavior (Castro and Toro, 2004).

Animal data on this form of teaching include chimpanzee mothers, who take away poisonous food from infants, and gorilla, chimpanzee and macaque mothers, who facilitate and encourage infants' locomotion (Maestripieri, 1995, 1996; Whiten, 1999). The teaching is typically performed by approving or disapproving signals, but is sometimes non-intentional.

(2) Drawing attention: Here, the teacher's intention is that learner focuses on a particular object, feature or action.

Among humans, drawing attention is often achieved by declarative pointing (Bates et al., 1975; Brinck, 2004b; Tomasello, 2009) which means that the teacher intends to communicate. Other methods than pointing can also be used. For example, human infants draw attention to an object by showing it even earlier than they point (Bates et al., 1975). Among non-human animals, attention if often achieved via alarm calls. In most cases, however, such signals are presumably non-intentional and not dependent on the knowledge of conspecifics (although see Crockford et al., 2012 for an intentional case).

For both these levels of teaching, one finds non-intentional as well as intentional forms. They do not require any form of ToM, neither on part of the teacher, nor on the learner.

(3) Demonstration and pantomime: These forms of teaching involve intentionally showing somebody else how to perform a task or how to solve a problem.

This level seems to be where humans depart from other animals². Demonstrating builds on ToM since it presumes that the teacher understands the lack of knowledge in the learner and that the learner perceives that something can be learnt. Demonstration also requires that the teacher and the learner jointly attend to the teacher's actions (Bruner, 1995; Tomasello, 1999; Gärdenfors, 2017). In pantomime the intended action is not really performed but just represented.

The abilities to demonstrate and pantomime constitute a breakthrough in the teaching and transmission of culture among hominins. Stout (2018, p. 257) argues that “the emergence of a human technological niche increasingly reliant on cooperation sharing, and the intergenerational reproduction of complex subsistence skills” was particularly important [see also the “technological pedagogy hypothesis” in Stout and Chaminade (2012)].

(4) Communicating concepts: Unlike animal calls, this involves communicating something that is general about a category and checking that the learner understands the extension of the concept.

Teaching how to categorize plants or animals are central examples of how cultural knowledge is transmitted. Among modern humans, the main method to teach a concept is to use a word (or a gesture) which refers to the concept together with some technique, such as pointing, for drawing the attention to examples of what the concept covers. This involves highlighting characteristic properties. For example, learning that champignons have brown or pink lamellae makes it possible to separate champignons from the deadly poisonous destroying angel mushroom.

Concept teaching also involves ToM since it requires that the learner understands that the teacher is intentionally using a word, sound, or gesture as a communicative sign, that is, that the word, sound, or gesture is used to “stand for” something else (Zlatev et al., 2005). Purely symbolic communication is, however, not required for this level, although it makes communication about concepts more efficient.

(5) Explaining relationships between concepts: The teacher's intention in explaining is typically that the learner understands the semantic or causal relationship between two concepts.

Examples of causal relationships are “if you eat this kind of berry, you will become very sick” and “a track that looks like this is made by an eland antelope”. In Section Narrating, I argue that generics fulfill this function. Unlike the previous ones, this level presumes that communication is detached, that is, it refers to things not perceivable in the communicative environment.

(6) Narrating: This involves chaining descriptions of events into a causally coherent whole.

In all human cultures, narration plays a central role (Barnard, 2011; Ferretti et al., 2017). Typical uses of narratives are gossip and entertainment and such narratives cannot be seen as direct intentional teaching. Narratives often have a moral, however, that is intentionally conveyed and hence it can be seen as a type of teaching.

It is important to note that narratives need not involve spoken language. Pantomiming is sufficient for narration (Sibierska, 2017). However, what distinguishes narration is that it represents a globally coherent sequence of events (Ferretti et al., 2017). Thus, it presupposes a well-developed event cognition (Radvansky and Zacks, 2014) which in turn builds on advanced causal thinking (Gärdenfors, 2021).

I next turn to a more detailed analysis of levels (3)—(6) since my thesis is that these forms of teaching have contributed to the evolution of language.

Demonstration

Demonstration and pantomime form the first level of teaching where such teaching activities seem to be unique to humans and it is therefore natural to focus on this level. This and the following section present analyses of demonstration and pantomime, respectively. My aim is to show that they have been evolutionary precursors to later forms of communication and, in the end, of language. In brief, “showing how” is a stepping-stone for advanced forms of communication.

The structure of demonstration

Demonstrating can be defined as that a “teacher” intentionally shows a “learner” how to perform a task. Demonstration is central in so called “natural pedagogy” and it can be found all human societies (Csibra and Gergely, 2009, 2011). Showing a child how to tie shoelaces, how to ride a bike, and how to use a smartphone are well-known everyday examples.

When the teacher demonstrates how to a certain set of actions should be performed, for example knapping a Levallois flake or making a fire, I have identified the following characteristic criteria (Gärdenfors, 2017):

(a) The demonstrator actually performs the actions involved in the task.

(b) The demonstrator makes sure that the learner attends to the series of actions.

(d) The demonstrator exaggerates and slows down some parts of the actions in order to facilitate for the learner to perceive important features.

In comparing demonstration with non-human animal communication, it is easy to find distinctive differences (Wacewicz and Zywiczyński, 2021). Demonstration is:

(e) Voluntary, in the sense that the action is deliberately performed. Donald (2012) says that the action is autocued.

(f) Intentional, since the teacher wants that the learner imitates what is demonstrated.

(g) Honest, since there exists no reason to deceive a learner.

(h) Directed to one or a few individuals³.

Unlike ordinary action, demonstration also satisfies the following criterion:

(i) Demonstration is both an action and a representation since the motions involved in the act correspond to some action (maybe involving objects), namely the action the teacher wants the learner to perform, but at the same time the demonstration differentiated from it by the teacher and the learner.

Therefore, the teacher's actions “point to” what the learner is meant to copy afterwards.

Often the learner gives some form of feedback, indicating that he or she has grasped the teacher's intention and what is being demonstrated. When the learner tries to perform the desired actions, the teacher approves or disapproves (level 1 of teaching). A demonstration is sometimes accompanied by verbal comments, but this is often not necessary.

In connection with criterion (b), Csibra and Gergely (2006, p. 149) note that “human communication is often preceded, or accompanied, by ostensive signals that (i) disambiguate that the subsequent action (for example, a tool-use demonstration) is intended to be communicative and (ii) specify the addressee to whom the communication is addressed”. Gergely et al. (2007) have shown experimentally that the ostensive nature of the teacher's actions is central to demonstration.

A consequence of criteria (b) and (c) is that demonstrating presumes advanced ToM for both the teacher and the learner. The best way to satisfy (b) is that the teacher and the learner jointly attend to what is demonstrated. (c) presumes that the teacher realizes that the learner lacks relevant knowledge and also that the learner understands the teacher's intention.

Demonstration is, however, not restricted to direct teaching but is also applicable in other situations. One important case is when a learner can demonstrate to a teacher that she has mastered the actions of a task. For example, a child can show that she now knows how to tie her shoelaces. Another example, typically occurring in a legal case, is that some forms of narratives can be expressed by a witness demonstrating how somebody behaved (the borderline between demonstration and pantomime may not be sharp in such cases).

As regards (d), note that demonstration involves learning by imitation rather than by emulation. The goal of the demonstration is not the most important aspect, but rather the actions which lead to it. By emphasizing the initial and final states of an action, the teacher assists the learner when segmenting the action sequence. Apart from establishing joint attention, the highlighting adds to the ostensive behavior that a teacher uses to make the learner attend to the demonstration (see Gergely et al., 2007).

Furthermore, demonstration can be used in autocued rehearsal (think of a dancer in front of a mirror). Here Donald's (2012) mimesis hypothesis is applicable as a starting point. It states that “the ability to produce conscious, self-initiated, representational acts that are intentional but not linguistic” (Donald, 1991, p. 168) mediated between those of the our common ancestor with the apes and modern humans. He claims that mimesis is a main adaptation since it improved tool production. It is clear that demonstration presumes the capability of mimesis.

Archaeological evidence

From an archaeological point of view, one may ask when in the hominin history one can find some evidence for demonstration. Gärdenfors and Högberg (2017) proposes an answer, by arguing that already the teaching of the techniques used to make Oldowan tools depend on demonstration (see also Morgan et al., 2015, p. 5). Demonstration would then be at least 2.5 million years old.

The argument is based on so-called core maintenance. This technique is mastered by knapping flakes from the core so that it will be possible to continuing striking further flakes. Delagnes and Roche (2005) show experimentally that core maintenance depends on planning. For an apprentice to learn core maintenance, a teacher must demonstrate how a flake can be detached in a way so that the knapping of another flake will be possible, which in turn should make it possible for the next flake to be detached, etc. For this purpose, the teacher must demonstrate (or pantomime) (i) how to best hold the core, (ii) the best striking angle, and (iii) the correct movement of the arm and hand holding the hammer stone when knapping. The learner should then practice, often for an extended period. In support of this position, Stout (2018, p. 260) writes: “For Oldowan knapping, the salient demands are at the level of elementary movements, perhaps comparable to the articulatory/phonological level of speech (or sign) processing. Consistent with this, Oldowan knapping by modern human subjects recruits portions of ventral premotor cortex […] neighboring those involved in speech production and perception”.

Wynn et al. (2011) have argued that the techniques used by hominins in making Oldowan tools can also be achieved by apes. In support of the claim they refer to the knapping behavior of Kanzi and Panbanisha, two bonobos who have been trained to knap by humans. However, they never reached the skill level of Oldowan knappers (Toth et al., 1993). A reason for this is that the bonobos did not rehearse the techniques that had been shown to them, but merely engaged in knapping when encouraged by the teachers, or when a reward was given. Crucially, their knapping showed no signs of core maintenance.

Pantomime

In current research concerning the origins of human communication, pantomime is proposed as a step in the evolution of symbolic language (Arbib, 2012, 2018; Gärdenfors, 2017, 2020; Abramova, 2018; Zywiczyński et al., 2018; Brown et al., 2019). Pantomime is often viewed as a form of communicative gesturing. One definition of pantomime comes from Arbib (2012, Ch. 8) who writes that it “involves expressing a situation, object, action, character, or emotion without words, and using only gestures, and other movements”. Brown et al. (2019) defines it as follows: “Pantomime refers to iconic gesturing that is done for communicative purposes in the absence of speech”. Zywiczyński et al. (2018) provide a broad presentation of different uses of the concept, concluding with the following characterization: “[W]e take pantomime to be a non-verbal, mimetic and non-conventionalized means of communication, which is executed primarily in the visual channel by coordinated movements of the whole body, but which may incorporate other semiotic resources, most importantly non-linguistic vocalizations. Pantomimes are acts of improvised communication that holistically refer to a potentially unlimited repertoire of events, or sequences of events, displaced from the here and now.”

In contrast to the communicative function highlighted in these definitions, my aim in this section is to argue that pantomime has developed out of demonstration (Gärdenfors, 2017, 2020) so that pantomime emerges from its use in teaching. This position is in agreement with Sterelny (2012, p. 2146) who writes that “once we have the capacity to demonstrate and to practice, it is available for the stimulus independent production of iconic representations”. My definition of pantomime goes as follows: an intentional pattern of movements of the body or parts of it (voice not excluded), the intention of which is to represent an action or a sequence of actions. Notice that my focus is on representing actions, which means that not all forms of gesturing count as pantomime. Pantomime is often restricted to action involving the whole body (e.g., Gullberg, 1998, p. 97), but I take a broader perspective and allow that only parts of the body are used in a pantomime. I also allow non-verbal vocalizations to be parts of a pantomime (Zywiczyński et al., 2018)⁴. The advantage of vocal imitation or iconicity might have been a selective force in driving the control of the vocal apparatus (Sterelny, 2012). On my view, pantomime is the most central form of the mimetic ability that Donald (1991, 2001, 2012) considers to be crucial in the evolution of the human mind.

The structure of pantomime

Pantomime is similar to demonstration in that it fulfills the criteria above except for (a) and, in the communicative case (g). The crucial difference is that (a) is replaced by the following, since in pantomime the actual actions are not performed, but only simplified versions of them:

(a') The mimer performs the movements of the actions in the task without actually performing the actions.

In many teaching situations, the teacher cannot perform the action that the learner is supposed to perform because then the learning opportunity is foregone. For example, teaching somebody how to knap a Levallois flake when only one core is available cannot be made by demonstration because once the flake is made the earlier state of the core cannot be reproduced.

It is cognitively more challenging to grasp the intention of a pantomime than to understand a demonstration. The role of a demonstration is evident as soon as the learner understands that it functions as a form of teaching. In contrast, the learner must also understand that the teacher intends the pantomime to stand for a real action and that the teacher wants the learner to understand this. Unlike demonstration, pantomime is not primarily an action, but a representation of an action. Pantomime therefore fulfills the following criterion (Zlatev et al., 2005):

Communicative sign function: The agent intends for the act to stand for some action, object or event for an addressee, and for the addressee to realize that the act is a representation.

The communicative sign function involves a second order intention that the agent intends that the addressee understand the communicative intentions (Gärdenfors, 2003, section 6.3; Bar-On, 2013; Moore, 2018). In line with this, Arbib (2012, p. 217–218) writes: “Where imitation is the generic attempt to reproduce movements performed by another, whether to master a skill or simply as part of a social interaction, pantomime is performed with the intention of getting the observer to think of a specific action or event.” Similarly, Mittelberg (2019) claims that pantomimes are inherently metonymic and they therefore create associations to already available knowledge about actions.

Beyond invitations to imitate, demonstration and pantomime may also help the learner to perceive new effectivities, that is, new actions that the learner can perform (Zukow-Goldring and Arbib, 2007). For example, when a bringing a new toy to a child, a parent often demonstrates how the toy functions and how it can be used in play. Zukow-Goldring and Arbib (2007) calls such processes “assisted imitation”. A more archaeologically directed example is that if you have a stone knife that you know how to use to cut meat, someone can show you how to use it to open mussels.

A central feature of pantomime is that it is open-ended. The autocuing that led to the skill for pantomiming provided a rich source of gestures that could form the basis for a great variety of meanings (see also Stout, 2018, p. 262). In line with this, Arbib (2012, p. 217) writes:

“[B]uilding on the skill for complex imitation, pantomime provided the breakthrough from having just a few gestures to the ability to communicate freely about a huge variety of situations, actions, and objects. Where imitation is the generic attempt to reproduce movements performed by another, whether to master a skill or simply as part of a social interaction, pantomime is performed with the intention of getting the observer to think of a specific action or event.”

Pantomime is a form of pretense. The pantomimes that are part of pretense play are enactments of a story that is created by the players during their interactions. In this way, play stories are participatory rather than detached narratives that are separated from the play activities.

Pretense is maybe the most basic form of implementing the communicative sign function. The reason for this is that one uses two representations of the same object or action when pretending—one's perception of the object or action and an imagined version of it (Leslie, 1987). For example, when a boy pretends that a red box is a fire engine, he knows that it is a box but at the same time he “sees” it as a fire engine that he can interact with. By suppressing his perception, he can use his fantasy instead. His image is a false representation of his perception of the real thing. Leslie (1987) claims that such imagined objects or actions are necessary to be able to pretend. He writes that small children's pretense “is an early symptom of the human mind's ability to characterize and manipulate its own attitudes to information. […] In short, pretense is an early manifestation of what has been called a theory of mind” (Leslie, 1987, p. 416).

In contrast, demonstration does not build on the double worlds that are used in pantomime. Pantomime can therefore be seen as using pretense to extend demonstration. For this reason, demonstration is less cognitively demanding to perform. According to the principle of cognitive parsimony, it follows that demonstration should appear evolutionarily earlier than pantomime—at least in teaching contexts.

From pantomime for instruction to pantomime for communication

The previous subsection argued that the evolutionary origin of pantomime is used in teaching as a development of demonstration⁵. However, pantomime is also used for other directly communicative aims, for example, in describing a plan, in narrating (Sibierska, 2017) or as part of telling a joke. In this section, I argue for the thesis that pantomime for communication is an exaptation of pantomime for instruction, and thus that the teaching function is evolutionarily primary⁶.

Haiman (2018, p. 46) makes a distinction between (dramatic) discourse which shows what is communicated and indirect (narrative) discourse, which tells about it [the distinction was originally introduced by Rimmôn-Qênān (1983)]. He makes the distinction in the context of the use of ideophones, but it also applies to demonstration and pantomime. The distinction between showing and telling emerges when determining what is the intention of a pantomime. Firstly, I can pantomime an action that I want you to copy. This is a form of showing and it is typical in a teaching situation. Secondly, I can pantomime an action as part of a message (request, command, warning, narrative, play element, etc.). Gesture researchers have focused on the second use of pantomime (communicative act). A demonstration is indeed a gesture, but it only conveys the first type of intention—demonstrating is showing. However, when pantomime is used for narration, it is also telling.

When showing somebody how to perform something, you cannot be dishonest. However, when telling about something, you can. Hence demonstration and pantomime for teaching are honest forms of communication, while pantomime for communication need not be.

Another important difference between demonstration and pantomime for instruction is that pantomime for communication is often detached (Zywiczyński et al., 2018, Sect. 3.8). I can, for example, pantomime how to set up the spring trap that I want you to construct tomorrow. In narrating, the reference of a pantomime can, of course, be far detached in space and time, and even refer to imaginary events. Using pantomime for communication therefore entails a broadening of the mental horizons in space and time, and hence requires more extended cognitive capacities.

Yet another difference is that pantomiming functions as an instruction when it is used for teaching, while pantomiming narratively is part of forming a common ground (Clark, 1992; see also Tomasello, 2009, p. 67). Hence, when pantomime is used for teaching, it is proto-imperative, while when a pantomime is used communicatively, it is a proto-declarative.

Brown et al. (2019) distinguish between egocentric and allocentric pantomimes⁷. In egocentric pantomime, the reference is determined in relation to the body of the miming person. When a pantomime is used in teaching, the agent that is supposed to perform the action that is pantomimed need not be specified since this is implicitly assumed to be the learner. In contrast, when pantomime is used in its communicative function, it is often allocentric. For example, you can pantomime a fight between two persons that you witnessed in the street. Then is must be specified who is the agent of the action and sometimes also what object is acted on or where the action is performed (Brown et al., 2019). Detaching the actions from the mimer makes the pantomime more difficult to interpret for the recipient. One way to achieve this is to add signs (gestures or sounds) that refer to the agent, the object and the location.

Representations of events typically contain information about causes and effects. In demonstration both cause (the action) and effect (the result) are manifest, while in pantomime for instruction only the cause is included. If the effect needs to be expressed, another gesture is normally required. For example, if I pantomime how to turn the key in the lock in a box, I must also show that the lid then springs up automatically. And, in pantomime for communication, the actor need not be the agent of the cause. In that case the agent must be specified by other means in order to determine who performs the depicted action.

From an evolutionary point of view, one may ask whether non-human animals, in particular apes, have the ability to pantomime. Some researchers are skeptical, for example Zuberbühler (2013, p. 136), who writes that “pantomiming is conspicuously absent, apart from isolated anecdotes”. On the other hand, Russon and Andrews (2011) have presented evidence for pantomiming in orangutans. Their subjects lived in a rehabilitation camp and were used to communicate with humans. They conclude that “pantomime could have been within the grasp of the common human-great ape ancestor” (p. 316). Their notion of pantomime is broader than the one used here and includes cases of deception. Furthermore, most of the evidence they analyze agrees with Gibson (2013, p. 209) observation that apes only gesture about actions requested of the addressee. In line with this, 17 out of 18 examples in Russon and Andrews (2011) were classified as imperative (see e.g., Boesch and Tomasello, 1998). Hence these cases are pantomimes for instruction. Only one case—enacting a shared memory—is classified as declarative: this is the case of the female Kikan acting out how the person next to her had doctored her foot, when it was cut. This episode, however, may also be interpreted as an apprentice showing that she has learnt what to do. Hence, it is not unequivocal that it is a case of pantomime for communication (see also Ferretti et al., 2017, p. 10 for a similar criticism).

Table 1 summarizes the arguments of this section⁸. Pantomime for instruction is contrasted with, on the one hand, demonstration, and, on the other hand, pantomime for communication. All ten factors suggest that the cognitive abilities involved in pantomime for communication are at least as extensive as those for pantomime for instruction. The arguments presented here therefore support, via the principle of cognitive parsimony, the thesis that demonstration is evolutionarily prior to pantomime for instruction, which in turn is evolutionarily prior to pantomime for communication.

TABLE 1

Table 1. Ten factors distinguishing demonstration, pantomime for instruction, and pantomime for communication.

Pantomime compared with other forms of gesturing

There exist many proposals for what characterizes gestures, some of which are very general. Most of them are not based on an evolutionary perspective⁹. Kendon (2004, p. 15) definition restricts gestures to “utterance uses”, that is, a communicative function, often performed together with speech. As I argue, pantomime has its primary uses in instructional contexts not connected to utterances. Therefore, a more inclusive definition should be sought for. Following Warglien et al. (2012, p. 23), I therefore consider as gestures “goal-directed communicative body movements, i.e., such that require interpretation from an audience for achieving the gesturer's goal”¹⁰. The following quotation from Kendon (2017, p. 168) supports this definition: “In my view, this suggests that gestures can best be understood as forms of action derived from how one uses one's hand to show or change the shape of form of things—to pick things up, let them drop from one's hands, place one's hands around an object, grasp an object, do something with an object, carry our patterns of action. and so forth.”

Most researchers have treated pantomime as a form of gesture. McNeill (2013, p. 483) describes pantomime as gesture without speech. However, vocal sounds can be parts of a pantomime. Communication is, in its nature, multi-modal (Gillespie-Lynch et al., 2014). For example, I can, by modulating the pitch of my voice, pantomime an up-and-down movement (Ekström et al., 2022) or imitate the sound of an animal while pantomiming its movements.

A basic distinction can be made between indexical gestures, where the semantic ground is spatio-temporal contiguity (for example, pointing) and iconic gestures, where the ground is similarity (for example, pantomime).

My position that pantomimes characteristically express actions can be amplified by considering the semantic domains of different types of gestures. One can distinguish three kinds of representational gestures which correspond to three types of domains:

(i) Location or direction. This involves the domain of physical space, which is the characteristic referential domain for indexical gestures.

Indexical gestures are fundamental for drawing attention (level 2 of teaching). Pointing is the primary example, but directing attention can be done by other deictic gestures such as giving, showing (with object in hand), requesting (open palm), and by using gaze direction.

(ii) Actions. In previous work (Gärdenfors, 2007, 2014a, Ch. 8; Gärdenfors and Warglien, 2012), actions have been represented as patterns of forces.

A pantomime can then be characterized as a gesture the meaning of which is based on the force domain. This entails that the represented action is iconically enacted. This analysis accords with Kendon's (2004, p. 160), since he identifies pantomime with enactment that is oriented toward actions.

(iii) Object properties. Gestures can represent the shape, size, length, height, depth and maybe other properties of an object.

These properties each belong to an object category domain (Gärdenfors, 2014a, Ch. 6). Brown et al. (2019, p. 6) argue that tracing the outline of an object with a hand (or the hands) is a common way of describing an object with gestures.

It is, of course, difficult to say anything about the evolutionary order of these types of gestures. From a developmental viewpoint, however, pointing develops early in children, so they can then communicate about the spatial domain. As regards actions vs. objects, Ortega et al. (2017) note that signing children have a bias to interpret signs as actions. When children have two signs for the same concept to choose from, they initially prefer an action-based variant. This gives them the opportunity to link a label to schemas grounded in their action experiences.

From pantomime to protolanguage

I have characterized pantomime as a way of depicting actions. The fact that the force patterns of a pantomime can simultaneously contain information about the properties of an object complicates the domain analysis presented above. For example, a gesture showing how a bottle is placed on a desk can combine the placing movement and a hand-shape indicating the shape of the object (Gullberg, 2011).

A more inclusive view is to see pantomime as a combination of gestures for actions with gestures for object properties (or objects) and places. Such a proposal can be found in Zywiczyński et al. (2018, Section 3.7), who write that “pantomimic acts are ‘the size of' propositions or utterances rather than smaller component units; rather than being elements of a larger communicative whole, they express complete, self-contained communicative acts”.

When pantomimes were exapted from their original function as a request to copy in a teaching situation to having a declarative function, as part of a narrative, or as an element in a play sequence, they formed the seeds for a larger set of conventionalized gestures. Arbib (2012, p. 224) writes that “[p]antomime is not itself part of proto-sign but rather a scaffolding for creating it”. Here proto-sign refers to a communication system that is conventional and combinatorial.

Dividing of gesture references into locations, object properties and actions, allows the three categories to be seen as proto-demonstratives, proto-adjectives and proto-verbs. I thus propose that the three types of semantic domains for gestures constitute the seeds for three of the main word classes¹¹. Gestures for proto-nouns would, in general, develop by conventionalization of characteristic properties of objects, but they can also develop out of verbs. A gesture for a property or a type of action that is characteristic for a person may by metonymy become a proto-name for the person¹². For example, a gesture describing the fluffy hair or the limping of a person could come to serve as a reference to the person. Similarly, a gesture for a property or a type of action that is characteristic for a category of objects may, in the same way, become a proto-noun.

Once these proto-word classes are in use, it is natural that gestures are combined and thereby more complex messages can be communicated. This would be the first steps toward the compositional communication of a protosign (Arbib, 2012) or protolanguage. Note that the process outlined here deviates from Arbib's (2012) theory about holophrases. Here the pantomime is taken to be the original holophrase, which is then complemented with signs for further thematic roles that are part of the event that is communicated, rather than broken down into smaller elements.

Communicating concepts and their relations

Conventionalization

Several authors have noted that pantomime is detached. Pantomime is, however, not conventional. Corballis (2014, p. 190) proposes a next step in the evolutionary process:

“Pantomime, though, is inefficient, and over the course of the Pleistocene, the pressure toward a more efficient and compact system may have driven the process of conventionalization. Iconic or pantomimic gestures were replaced by simpler signals whose meanings were acquired through association rather than through pictorial representation. Meaning is then carried through cultural transmission, rather than in the signal itself.”

It should be noted that in contrast to pantomime, the conventions of protosign (or protolanguage) must be learned.

Bickerton (2007, p. 513) claims that animal calls cannot predicate. So, what is it in human communication that makes it possible for us to predicate, that is, say something about entities? The answer can be formulated as that the result of a communicative convention is a label (gesture or word)¹³. Gelman and Roberts (2017, p. 7900) analyze the role of labels in cultural inheritance. They argue that “category labels work in an almost paradoxical way to ensure stability in the transmission process, but simultaneously to permit and even foster conceptual change”. Because young children act as if their own knowledge state is the same as that of others, this will improve learning. Sabbagh and Henderson (2007) also argue that children's understanding that words meanings are conventional makes their word learning more efficient. They write that “these limitation in children's abilities to reason about others' epistemic mental state may actually promote the development of an appreciation of conventionality” (Sabbagh and Henderson, 2007, p. 33). Labels for objects also license inferences that depend on underlying causal features of objects. As will be seen in Section Explaining relationships between concepts—the role of generics, such connections are taught via generics.

What has been called the principle of conventionality (Clark, 1993) states that words are efficient tools for communication when the form–meaning associations are known, shared, and expected within a language community. Already one-and-a-half-year-old children exploit this principle in their learning of language (Graham et al., 2006). Given the limited ToM of small children, it is likely that they believe that labels reflect what objects are called rather than what people call objects.

The gestures of great apes are sometimes learned via ontogenetic ritualization where individuals learn a gesture via regularly occurring dyadic interactions (see Arbib, 2012, Ch. 3 for an analysis of ritualization, and also Abramova, 2018). Non-human ritualizations never become labels that are shared within a society, but they remain signs that are used within a dyad of individuals, typically a mother and an infant. Therefore, the principle of conventionality may be a foundational block in the evolution of human communication that distinguishes it from that of other species.

Nevertheless, the “chimp test” requires an explanation of why conventionalization has not evolved among great apes or other species. One possible explanation, comparing hominins to chimpanzees, is that the hominins increasingly engaged in collective breeding. This means that more children spent time together and with other adults than their mothers, which would make it easier for the children to adopt the principle of conventionality (Volterra et al., 2018, p. 230). This explanation is supported indirectly by two other forms of conventionalization. Firstly, when Nicaraguan home signers met with other deaf children, a conventional sign language quickly emerged (Senghas and Coppola, 2001). Secondly, computer simulations of how a signaling systems become shared among artificial agents show that the more communicators that are involved, the faster the conventionalization occurs (Steels, 1998). In passing, it should be noted that collective breeding makes teaching more effective since more learners can be taught simultaneously and be taught by teachers that are more skilled than the average parent (Henrich, 2004).

Communicating concepts

Within the cognitive sciences researchers disagreed on how the notion of a concept should be characterized. It is, however, sufficient for my purposes that having a concept involves the ability to recognize a pattern (Gärdenfors and Lindström, 2008). Some patterns representing concepts are perceptual and others, for example kinship relations, more abstract.

Evolutionarily, concepts have been exploited for everyday decision making, for example learning to recognize edible mushrooms, distinguishing the tracks of a hyena from that of a leopard, and recognizing an appropriate platform in Acheulean tool manufacture (Sterelny, 2012, p. 2142).

The fourth level of teaching—communicating concepts—presumes a system of conventional labels where categories and perhaps places can be referred to in addition to the actions depicted in pantomime. In human societies, the main methods to teach a concept for a category is to use a label (sound or gesture) standing for the concept together with pointing or some other technique for drawing attention to what is characteristic of the concept.

Csibra and Gergely (2009, 2011) argue that natural pedagogy is unique to humans, because other animals are not capable of learning generic knowledge from communication. Animal communication is confined to the here and now. Mattos and Hinzen (2015) suggest that natural pedagogy is one of the main functions of language. They write that humans have a “specific capacity to acquire, through communication, different kinds of information—respectively, knowledge about kinds and knowledge about particular events, actions and state of affairs which we will call here simply ‘knowledge about facts'¹⁴.” They also argue that children learn knowledge about kinds earlier than they learn knowledge about facts.

An archaeological example

As an example of a concepts that was necessary to communicate at an early stage, Gärdenfors and Högberg (2017), following Stout (2011), analyze what is required for teaching the concept of a “platform” that is required for manufacturing Acheulean hand axes. Importantly, the subgoals of the manufacturing cannot be perceived directly from the action sequence of a teacher and they are therefore very difficult to learn via imitation. The subgoal features can therefore not be identified by the learner by drawing attention or demonstration, but they must be taught with the aid of concepts. The more convoluted a technology is, the more subgoals are involved (Stout, 2011; Lombard and Haidle, 2012; Stout and Chaminade, 2012; Mahaney, 2014). The subgoals involve understanding how the action sequences should be chunked. Such a chunking is difficult to communicate without using concepts. Another example related to stone tool production comes from Wynn and Coolidge (2012, p. 70) who argue the “distal convexity” of a core is a necessary concept to communicate in Levallois technology.

The ability to communicate with concepts is a central component in the evolution of a protosign or a protolanguage. If the above argument is correct, it follows that Homo erectus had this ability. Even though we cannot present any clear evidence that they could also communicate relations between concepts, I believe this is very likely (see the following subsection) and I thus submit that Homo erectus communicated with protosign or a protolanguage. The same conclusion has been reached by other researchers, for example Bickerton (2009) and Everett (2017) (also Barham and Everett, 2021), albeit their arguments are different from mine. For example, Everett (2017, p. 99) argues that Homo erectus needed language to be able to travel.

Explaining relationships between concepts—The role of generics

The fifth level of teaching—explaining relationships between concepts—requires a fully detached communication system that also can express causal connections.

I claim that generics form the main communicative tool for explaining relations between concepts. In particular, they provide information about the semantic structure of concepts. Gärdenfors and Osta Velez (2022) distinguish between two kinds: (a) Property generics dealing with characteristic properties of objects (“Ducks lay eggs”), and (b) causal generics (“Sharks kill people”). To this, one can add normative generics (“Boys don't cry”).

Generics have been attested in all languages. Although, on the surface, they look like quantified sentences, generics are not ordinary sentences. Gärdenfors and Osta Velez (2022) argue that they should be seen as expressing expectations about relations between concepts. Linguistically, they can simply be formulated as relations between labels rather than as sentences, for example, “finders, keepers; losers, weepers”.

Generics form an efficient method of transmitting information about categories and their relations. In this way they speed up cultural transmission (Gelman and Roberts, 2017). Generics have a central role in teaching, in particular in what is called “natural pedagogy (Csibra and Gergely, 2009). We tell our children, already at a young age, things like: “cats say meow, dogs say woof, and cows say moo¹⁵”. Later in school, they learn generics like “tigers have stripes”, “copper conducts electricity” and “democracies have freedom of speech”. Such property generics is a way of presenting characteristic properties of various categories (Leslie, 2008). Learning about categories is primarily done via their characteristic properties. And when it comes to causal generics such as “smoking causes cancer” and “dogs bite people,” they function as guidelines for caution in actions (Sterken, 2015). Archaeologically and anthropologically, detached communication about the relations between animals, their tracks and their behavior is difficult without using some form of generics (MacDonald and Roebroeks, 2013). Although speculative, this connection could be an explanation of why generics are so central in language.

Several studies argue that young children understand generics and can distinguish them from non-generics on the basis of several types of cues. Gelman and Pelletier (2010) show that already by about the age of two and a half, children start producing generics and at the age of four generics constitute 3% of the sentences produced. This may not seem as a high rate, but it should be considered in the context of all other sorts of utterances that children can make. They rely on generics when they learn about concepts and they use them when drawing inferences. Cimpian and Scott (2012, p. 429) write about how children use generics to build up common knowledge: “Children's realization that generic facts are widely known may be a crucial step in this direction—that is, a crucial step toward learning what knowledge can reasonably be presupposed when communicating with others.” In line with the position that generics are not ordinary sentences, children also interpret generics as different from quantificational sentences containing “all”, “some” or “most”.

Narrating

There is one linguistic unit that is not required for the previous levels of teaching, but which is central to narration: the sentence. A narrative consists of a sequence of sentences. The sixth level—narrating—therefore requires a protolanguage. As mentioned previously, Sibierska (2017) argues that pantomime (in the extended form) is sufficient for narration. Therefore, I agree with Ferretti et al. (2017, p. 8) that the origin of narratives coincides with pantomime (for communication).

A central question is what role sentential structures play in the evolution of communication. My proposal is that the primary function of sentences is that they describe events¹⁶. According to an earlier analysis of events (Gärdenfors and Warglien, 2012; Warglien et al., 2012; Gärdenfors, 2014a,b), events are based on causal relations: an event typically contains information about an agent who is the cause of an action that leads to a result related to a patient¹⁷. While causal generics express causal relations between concepts, sentences express causes or effects in single events. My hypothesis is that in the transition from protosign to protolanguage, the holophrases represented by a pantomime are supplemented by other components which represent core parts of the event (Gärdenfors, 2014a; Gärdenfors and Osta Velez, 2022).

Apes do not narrate. At best, they can express declaratives, mostly in the form of single signs (Lyn et al., 2011). During hominin evolution, causal cognition has been extended considerably, from understanding of the causation of your own actions, to understanding hidden causes of different kinds, such as physical causes, medical causes, and the intentions and beliefs of other individuals (Lombard and Gärdenfors, 2021). This extension of causal cognition is a necessary requirement for the capacity to narrate.

The ToM involved in understanding narratives includes communicative sign function and joint beliefs. As regards communication, a narrative is typically detached (Corballis, 2015). This holds in particular for gossip. Gossip normally contains expressions of the type “X did A to Y,” which involves identifying thematic roles such as agent, action and patient. Therefore, narration presumes a communicative system that contains at least a minimal level of syntax (Gärdenfors, 2012).

It is obvious that narrating has become a central part of human culture. It is, however, less clear that teaching has been the main driving factor behind the evolution of narration. Firstly, narration is important for planning for cooperation. Unless sentential structures involving agents and objects, it is difficult to express a sequence of actions that is to be performed by a group of individuals.

Secondly, it has been argued that a key factor in cooperation in humans that distinguishes us from other species is the role of reputation (Nowak and Sigmund, 2005; Gärdenfors et al., 2012). One important mechanism in determining the reputation of an individual is gossip. Gossip is a form of narrative that often contains information about whether an individual can be trusted or not, which is a crucial factor in selecting who to cooperate with. Furthermore, myths are often presented in form of narratives and they function as carriers of common knowledge (joint beliefs). They contain much of the knowledge accumulated by a society (Donald, 1991).

Even though teaching may not have driven the evolution of narration, narration is closely connected to teaching. Narrative teaching is present in all cultures (Csibra and Gergely, 2009; Kline, 2015). Narratives are useful in conveying information about causal relations, in particular when the causes are not perceptually present.

My aim in this article has been to analyze the role of teaching in the evolution of communication. As a final remark, I note that a similar trajectory can be found in the development of children's communication. Following Mattos and Hinzen (2015), one can distinguish three levels in the early development of children's communication: (i) infants' initial vocabulary is linked to the different categories of objects they point to; (ii) they express generic information about categories; and (iii) they describe events using a sentential structure. It should be noted that there is a clear parallel to these levels with the communication required for levels 4, 5 and 6 of teaching. This argument provides some indirect support to the ordering of the levels of teaching that have formed the backbone of my analysis.

Conclusion

My main thesis in this article is that the evolution of teaching is central for the evolution of increasingly complex communicative systems in the hominin species. Building on the argument that demonstration is evolutionarily prior to pantomime for instruction, which in turn is evolutionarily prior to pantomime (Gärdenfors, 2021), I have shown that such a series can provide an evolutionary explanation of different functions of communication.

I have focused on the role of demonstration and pantomime since this is the level of teaching where it seems that the hominins separated from the forms of teaching found in other species. In particular, this is the level where the communicative sign function appears. This is also the level where communication becomes detached from the here and now, which is a necessary condition for planning for cooperative actions.

I am not claiming, however, that teaching is the only evolutionarily driving force. More generally, my position is that the evolution of communication is connected to advanced cooperation that is unique to the hominins (Gärdenfors, 2003; Gärdenfors et al., 2012). In line with this, teaching can be seen as a form of preparation for future cooperation.

Data availability statement

All relevant data is contained within the article or in the references.

Author contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Funding

This research was funded by Lund University.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^Progovac (2016, p. 3) proposes a similar principle for the evolution of syntactic structures.

2. ^An observation of a chimpanzee showing somebody else how to perform an action is a mother who shows her infant how to hold a stone in order to crack a nut (Boesch, 1991). This behavior is, in my opinion, better seen as facilitating. Another possible example about demonstration in elephants is presented in Bates et al. (2010). Russon and Andrews (2011) data on spontaneous demonstration in orangutans will be discussed below.

3. ^Some forms of animal communication also have some (but not all) of these properties.

4. ^Cf. Orwell (1948/1968, p. 11): “Primitive man, before he had words, would rely upon gesture, and like any other animal he would cry out at the moment of gesticulating, in order to attract attention. Now one instinctively makes the gesture that is appropriate to one's meaning, and all the parts of the body follow suit, including the tongue”. And Armstrong and Wilcox (2007, p. 68) write that “there never was a time when visible gestures were unaccompanied by vocalizations”.

5. ^My characterization of pantomime therefore differs from that of Brown et al. (2019) since they delimit it to “iconic gesturing that is done for communicative purposes in the absence of speech”.

6. ^In line with this thesis, Zywiczynski et al. (2017) distinguish between cognitive and communicative forms of pantomime.

7. ^Brown et al. (2019) suggest that this distinction should replace the distinction between character viewpoint and observer viewpoint that has been used by MacNeill and others (Cassell and McNeill, 1990; McNeill, 1992; Cartmill and Goldin-Meadow, 2012).

8. ^The table is an extension of the table in Gärdenfors (2020).

9. ^See Brinck (2004a) for an analysis of the origins of pointing.

10. ^Here I only consider representational gestures, so that, for example, beat and emblem gestures are excluded (Kendon, 2004, Chs. 9–11).

11. ^Indirect support for the role of proto-verbs comes from Aussems and Kita (2021) who show that seeing gestures that depict verb referents help children generalize verb meanings and learn more verbs from the same subcategory.

12. ^This is how “person signs” are introduced in sign languages.

13. ^A label is an icon or a symbol in the terminology of Pierce (1932).

14. ^This distinction is the same as the distinction between “knowing what” and “knowing that” in Gärdenfors and Stephens (2018).

15. ^Children's picture books of animals and other object categories highlight the characteristic properties of the categories.

16. ^I consider states to be special cases of events.

17. ^This theory builds on conceptual spaces (Gärdenfors, 2000, 2013, 2014a) where actions are modelled as force vectors (or patterns) and results as vectors describing a change in some property of the patient.

References

Abramova, E. (2018). The role of pantomime in gestural language evolution, its cognitive bases and an alternative. J. Lang. Evol. 3, 26–40. doi: 10.1093/jole/lzx021