Systems of Communication: Aspects of Culture and Structure in Speech Surrogates

James, Lucas

doi:10.3389/fcomm.2021.653268

ORIGINAL RESEARCH article

Front. Commun., 09 August 2021

Sec. Psychology of Language

Volume 6 - 2021 | https://doi.org/10.3389/fcomm.2021.653268

This article is part of the Research TopicSurrogate Languages and the Grammar of Language-Based MusicView all 14 articles

Systems of Communication: Aspects of Culture and Structure in Speech Surrogates

Lucas James*

Undergraduate Program in Linguistics, Dartmouth College, Hanover, NH, United States

The practice of speech surrogacy is used for communication across many cultures. Previous work has historically engaged with the study of speech surrogates as part of anthropological or ethnomusicological inquiry; more recently, scholars have explored aspects of the formal relationship between spoken and surrogate linguistic structures. How speech surrogates function as systems of communication is not yet well understood. Based on evidence from an interdisciplinary corpus of documentation, characteristics of culture and discourse, as well as features of linguistic structure, are shown to play a role in fostering communicability in speech surrogates. Cultural constraints are linked to the development of a speech surrogate-mediated discourse within a community of practice, facilitating comprehension of the surrogate system. Moreover, specific structures including formulas, enphrasing, and framing devices are identified as common to various speech surrogate traditions, suggesting a common function as aids to communication. This analysis points to the need to investigate speech surrogates as linguistic systems within a discursive context.

Introduction

Speech surrogacy is (broadly) the practice of imitating verbal speech without the use of the larynx, often by means of whistling or through the use of musical instruments. The best-known examples include Silbo, the whistled Spanish of the Canary Islands, as well as “talking drums” throughout West Africa, but from the 19th century onwards hundreds of speech surrogates have been attested spanning every inhabited continent (James et al., 2021).

Scholarship on speech surrogates has been historically incidental to documentary work in anthropology or ethnomusicology. Bagemihl (1988), asking “But is it language?”, noted that speech surrogates were at the time a marginal subject within linguistics. He concluded that “virtually all of the sources which I have examined … fall into the category of descriptive, nontheoretical studies … There is an obvious reluctance to delve into an area which smacks of the nonlinguistic” (26). Since many speech surrogates are integrated into larger cultural and musical traditions, a typical early approach was to analyze them as cultural performance or a “musical process” (Kaminski 2008), with a few notable exceptions ( Stern 1957; Carrington 1949; Nketia (1971) 1976). This situation recalls past attitudes on writing systems in linguistics: “writing [being] clearly a cultural rather than biological endowment … seemed accordingly less interesting [to linguists] than spoken language” (Sampson 2015, 47)—though unlike writing, the origins of speech surrogacy are as yet unclear.

In recent years, speech surrogates have more commonly been taken seriously as a part of language, since researchers have recognized formal similarities between spoken and surrogate modalities. Analyses suggest that speech surrogates can reproduce aspects of speech including acoustic and phonetic properties (e.g., Rialland, 2005; Meyer, 2008), speech rhythm (e.g., Seifart et al., 2018), phonemic structure (e.g., Villepastour, 2010; McPherson, 2018), and morphosyntactic processes (e.g., Winter 2014). Such contributions show how speech surrogates rely on practitioners’ linguistic competence. This makes linguistics well-suited to the task of understanding speech surrogate structure. To adapt terminology from folkloristics: within cross-disciplinary speech surrogate studies, nonlinguistic perspectives may focus on texts and context, while linguistics provides the tools to study texture, the actual structures that make up surrogate speech (cf. Dundes, 1980, 20).

Advancements in the structural accounts of speech surrogates have not yet been matched with a flourishing of insight into functional questions: How are they used and understood as linguistic communication systems? How do they fit into the discursive lives of their listeners and practitioners? In-depth studies on speech surrogate communicability have been rare in linguistics, though some authors have explored aspects of perception and discourse (see Previous Work on Speech Surrogate Communication).

Surrogate language communication, embedded as it is within larger systems, may at first seem to stray from the domain of language proper. But Vigliocco et al. (2014) challenge the distinction between “language proper, i.e., language as a structured system amenable to linguistic analysis and communication, i.e., the broader context of language use, which includes the use of other channels of information” (1). They argue that “[t]he majority of language studies have been firmly focused on language proper, to the exclusion of context and multimodal expression that contribute to utterance and meaning construction” (1). This paper makes the case that the study of speech surrogates is relevant to linguistics not only because it illuminates the former category (“language as a structured system”), but also because it is part of the latter (“context and multimodal expression”). In other words: speech surrogates both tell us about language and are language.

This paper focuses on the relationship between form and function in speech surrogates. There is a significant body of evidence that linguistic structure is shaped by its communicative niche (Coupé et al., 2019). An analysis of communication in speech surrogate systems suggests the same functional pressures on spoken and signed languages also affect speech surrogates. Studying these functional pressures can help us distinguish modality effects from universal tendencies, bettering our understanding of language writ large.

Defining Speech Surrogacy discusses the definitions crucial for the typology of speech surrogate communication. Previous Work on Speech Surrogate Communication provides a review of existing work on perception and discourse in speech surrogates. Challenges for Speech Surrogate Communication presents the challenges to communicability that speech surrogates pose, and Compensation Strategies in Speech Surrogate Communication analyzes some attested strategies to counteract these challenges. Discussion presents a discussion and avenues for further research.

Defining Speech Surrogacy

What is a “speech surrogate”? There is no broad scholarly consensus on how to define the phenomenon, or even what to call it, and the definitions rarely have the same scope from author to author. I prefer “speech surrogate” only because it is in common usage, probably because of Stern (1957) and because it is in the title of Sebeok and Umiker-Sebeok (1976) collection of articles in two volumes, an oft-cited source on the subject.

For Sebeok and Umiker-Sebeok, “speech surrogates” are “specimens of one species of transmutation, one which is 1) a true substitutive system, 2) a first-order rather than a second-order system, and 3) in the acoustic modality” (XIX). “First-order” refers to a direct relationship between language and sign, rather than one mediated via another system (such as Morse code’s reliance on the alphabet). Therefore, in other words, speech surrogacy is a practice of 1) systematically replacing utterances in 2) a spoken language with 3) other sounds. An alarm bell (say) does not count, as it violates 1) by having no systematic connection to language; Morse code does not count, as it violates 2) by referring to an alphabet rather than to speech itself; and alphabets in turn do not count, as they violate 3) for being silent. What is included is the practice of using communication through manipulating pitch, rhythm, and timbre in lieu of speech sounds, as typified by the “talking drum” and “whistled speech”. Based on the predominance of those two forms, Sebeok and Umiker-Sebeok often refer specifically to “drum and whistle surrogates” in the text.

Sebeok and Umiker-Sebeok’s definition contains a controversial assumption: how can one know if a system traces back to a spoken language? A number of articles included in the collection describe systems with no apparent connection to the phonology of their corresponding spoken languages. Instead, they arbitrarily signify words, phrases, or perhaps concepts; this is what Stern (1957) calls “lexical ideograph” systems, in contrast with phonologically-based “abridgment” systems (172). Sebeok and Umiker-Sebeok write:

“In some drum and whistle systems, particularly those observed in Oceania and South America, the symbolic principle appears to be dominant over the iconic … More work is necessary on this sort of drum and whistle surrogate before it will be possible to discuss in detail its exact semiotic nature” (XIX).

I am not aware of any theoretical work that solves this puzzle. Some scholars, including (Nketia, 1971), exclude lexical ideograph systems entirely from the domain of surrogate speech. For the purposes of this analysis, I find it necessary to include them. As I describe in Compensation Strategies in Speech Surrogate Communication, the line between abridging and lexical ideograph systems is harder to draw than it appears. Moreover, many of these systems are used in functionally identical circumstances. So, for a full description of the speech surrogate communicative niche—that is, using implements or whistling as a genuine replacement for the speech act—I think it is important to account for these systems.

This leads to a second point of clarification: what is meant by “communication”? This paper cannot take a broad view of all the things that speech surrogates mean. For a not insignificant (though diminishing) population scattered around our planet, speech surrogates are woven into the fabric of life among their families and neighbors. Speech surrogates are meaningful insofar as they are shared; as such, they are never separate from the identity of the communities that practice them, often part of their music, spirituality, orality, and entertainment. Even for generations losing their grip on the practice, the sound of a surrogate language can signify a lot: shame at a perceived failure to uphold tradition (Coulter 2007), longing for the past or a homeland left behind (Poss 2005). This territory of meaning, linked to tradition and identity, is vast and unsuited for the analysis presented here.

Instead, this paper is focused on how people use speech surrogates to talk to (and about) each other. As meanings attributable to surrogate language go, this is an important one. Speech surrogates of all kinds are used to praise (Kaminski 2008), insult (Vercelli 2006), ridicule (Ames et al., 1971), or court others (Armstrong 1953; Catlin 1982; Dugast (1955)); to tell stories (Armstrong, 1953), to refer to people’s names (Dugast, 1955) or their titles (Nketia, 1971; Kaminski, 2008), as well as the names of places (Nketia, 1971), clans (Seifart et al., 2018), and ancestors (Nketia, 1963); to ask for help (Burridge, 1959) or money (Strand, 2009); to address announcements to a whole community (Cloarec-Heiss, 1999) or to a specific person (Damm, 2003); to broadcast calls for celebration (Wojtylak, 2016), mourning (Lewis, 2018), and worship (Neeley, 1999); to coordinate hunting (Blench, 1987), warfare (Gourlay, 1982), agriculture (Wilken, 1979), and group performances (Stone, 1972); and so on. In other words, speech surrogates are often used the same way we all use language around other people.

This is true regardless of a system’s structural properties. A Canarian whistler, a Bora manguaré drummer and an ‘Are’are conch player use radically different methods to communicate, and these differences must have an impact on the way their messages are produced and understood. But all three practitioners have a common purpose: to expand the human communicative palette beyond its natural boundaries. A speech surrogate message might sound louder or travel farther than the human voice, or deliver a heightened register to a culturally meaningful text; it may represent the voice of authority, or simply enliven otherwise plain language. It is this quality—the expansion of communicative possibility—that represents the significance of speech surrogate traditions, and that attracts this investigation into their structure and function.

Previous Work on Speech Surrogate Communication

Speech surrogate documentation usually gives some indication of the system’s use in communication; detailed theoretical accounts are much rarer. Background knowledge for this analysis comes from a thorough review of the sources listed in the Online Database of Speech Surrogates (ODSS) (James et al., 2021), which catalogues information on roughly 200 speech surrogates of over 100 language varieties. These include recent and very early literature alike, from this decade to the late 19th century. Of those, about 60 have descriptions contained in Sebeok and Umiker-Sebeok (1976). Another dozen or so are found in Meyer (2015), which provides a detailed overview of the typology of documented whistling systems, and is my primary source on that topic. I give preference to substantial works over brief descriptions, and avoid impressionistic comments for which I cannot elaborate on all relevant examples individually. All transcriptions are included verbatim.

I recognize the challenge of comparing studies from across more than a century of developments in linguistic theory. Though early transcriptions usually lack the theoretical grounding that would permit a retrospective phonological analysis, properties of syntax and discourse are often included. Moreover, some of the systems under discussion have now fallen from use and are thus out of the reach of further fieldwork. Many speech surrogates are extinct or endangered because of cultural and technological changes to communication around the world. Along with redoubling our efforts to document living systems, we need strategies to mine the existing literature.

One body of evidence on speech surrogate communication concerns the ability for listeners to understand abridging systems analytically; that is, how they associate individual sounds of surrogate speech with phonemes in their spoken language. Experimental work on this subject has been performed for several whistling systems: whistled Béarnaise in Aas, France (Busnel et al., 1962), whistled Turkish in Küskoy, Turkey (Busnel, 1970; Moles, 1970), and the Silbo Gomero of the Canary Islands (Rialland 2005). These are thoroughly reviewed by Meyer (2015), and I will not reiterate them here. All of these experiments tested the comprehension abilities of practitioners, finding cross-linguistically that whistled phrases could be identified at greater-than-chance rates, though with wide variations in accuracy depending on the type of utterance (from sentences and words to nonsense syllables).

In addition to phrase-identification tasks, several neurolinguistic studies have been performed in the past 2 decades, all on whistled speech processing. Carreiras et al. (2005) used fMRI technology to analyze neural activity in a processing task of Silbo Gomero. While listening to recorded whistle speech, skilled practitioners experienced activation in the left superior temporal gyrus and right-hemisphere superior–midtemporal region, cortical areas associated with speech processing. No such activation was recorded for control participants unfamiliar with Silbo Gomero. The authors posit that “the language-processing regions of the human brain can adapt to a surprisingly wide range of signalling forms” (31). The experimental group included only skilled whistlers rather than participants merely familiar with (but not practitioners) of the whistling system.

Güntürkün et al. (2015) and Villar González et al. (2020) used dichotic listening tasks to probe the localization of language processing in whistled Turkish and Silbo Gomero, respectively. Participants were tasked with identifying a whistled signal after listening to two simultaneous stimuli, one in each ear. Both papers conclude that the standard attested effect of left-hemisphere superiority in language processing tasks is not reproduced for whistled speech, but Meyer et al. (2019) identify methodological issues in the experiments that may have downplayed any effect of left-lateralization. Like Carreiras et al. (2005), both studies relied on proficient whistlers.

Poss (2012) performed a phrase-identification study of an instrumental speech surrogate. The study’s participants were Hmong speakers who were knowledgeable listeners (but not necessarily skilled practitioners) of the raj speech surrogate, an aerophone abridgment system based on Hmong tone and consonant types. The experiment tasked participants with translating raj speech, with stimuli drawn from common phrases in the repertoire. Participants were highly successful at identifying raj phrases. Moreover, “incorrect” translations frequently had similar tone patterns to the expected value, suggesting that listeners were sensitive to the surrogate system’s tone encoding rules; no such pattern was identified with the system’s encoded consonant types. This indicates that Hmong listeners take the surrogate language’s grammar into account when processing individual words, but only selectively. The author also notes that “there is evidence that some subjects responded in terms of phrases rather than individual words” (Poss 2012).

Each of these studies focused on a very granular level of speech surrogate comprehension: the linguistic processing aspect. A more global analysis—one that can help to relate speech surrogates structures to their cultural and discursive context—calls for other methods. Particularly useful in the literature have been interviews with practitioners alongside analyses of corpora of speech surrogate performance. One substantial source is the work of Paul Neeley (1994), Neeley (1996), Neeley (1999), which describes slit-log drumming in Mekomba, an Ewondo-speaking community of Cameroon. Neeley (1999) provides a book-length analysis of a single genre of drummed Ewondo performance: calls to Christian worship as performed by the community’s then catechist Antoine Owono over a 4-month period in 1988. The book breaks down the abridging system, which uses pitch and rhythm on the slit-log drum (nkul) to reproduce the underlying tonal phonology as well as (variably) the structure of consonant clusters in the spoken language (64). It also presents an interdisciplinary set of discourse, textual and rhetorical analyses. Neeley characterizes the Mekomba drumming tradition as a fixed, conventional, and unidirectional transaction between drummer and audience, the latter being all Mekomba residents within earshot of the signal. The catechist gave a twice-weekly performance to convince residents to attend Christian church services; the exact verbal content of the drummed messages was not widely understood by residents, but the general meaning of the performance was universally understood.

Works by Cloarec-Heiss (1986), Cloarec-Heiss (1999) and (Arom and Cloarec-Heiss, 1976; Arom, 2007) examine another slit-log drum system of Central Africa: the Banda-Linda lenga of the Central African Republic. As in the Ewondo system, the lenga is used to encode the tonal phonology and certain segmental features of the spoken language. The data come from Ippy, a small community in the eastern Central African Republic where “any Banda-Linda speaker can understand drum messages … [but] the actual social function of drum language restricts its use to a few emergency situations requiring one or more people to go to the place from which the message originates” (146–148). The authors present an information-theoretical account of the way that Banda-Linda surrogate speech is encoded by the drummer and decoded by the listener. They identify patterns in the drummed messages that make them easier to decode, including fixed phrases and a formula for message organization. They also point out that decoding is a retroactive process relying on the short-term memory to continuously recast the interpretation of the signal based on new stimuli.

Seifart et al. (2018) analyze a corpus of slit-log drum messages from a Bora community in the northwest Amazon. They demonstrate that the communication system encodes both the tonal phonology and speech rhythm of the Bora language. Drummed texts are shown to contain several kinds of “enphrasing”, or conventionalized elaborations that make words and phrases more identifiable. Small distinctions in vowel-to-vowel timing intervals on the drum are shown to systematically correlate with speech rhythms; these distinctions are observed to be informative when decoding drummed messages. A formulaic structure for organizing messages is also argued to reduce ambiguity in the system.

Sicoli (2016) studies the pragmatics of whistled conversation in a Chinantec-speaking community of Oaxaca, Mexico. The whistling system reproduces the surface realization of Chinantec tone as well as glottal stops and stress patterns. Sicoli argues that “this yields a very productive and flexible morphophonological system for both the spoken and the whistled registers” (413). The system is shown to be “generative … making it possible to chat, conduct business, and make plans” (413). Sicoli gathered a corpus of 40 whistled conversations in the community of San Pedro Sochiapam consisting of both naturalistic conversations and the results of an experimental communication task. Practitioners in both settings were highly successful at communicating pertinent information using short, simple utterances. However, localized communication failures were attested, which conversation partners repaired using a limited set of standard questions and interjections. Sicoli argues that the whistled modality limits conversation to single-proposition utterances, and that the attested conversational repairs help regulate this constraint.

It should be inferred that experimental work on speech surrogate communication is rare for instrumental systems and wholly unavailable for lexical ideograph systems. Whistling is logically the first avenue for experiment methods, since some systems are very well described, and the practice itself is not cumbersome to record and analyze. However, corpus-based work points to intriguing similarities and differences between whistled and instrumental communication. Future experimental work should include instrumental systems. Furthermore, what continues to be revealed through neuroimaging and linguistic processing studies should be balanced with interview and corpus-based approaches that can place these findings into context.

Challenges for Speech Surrogate Communication

Modern implements used to directly extend the range of the human voice, like the telephone, sound recordings, or the public address system, are designed to preserve as much of its acoustic detail as possible. Given the importance of the visual signal in language (both spoken and, of course, signed), even more faithful is the video broadcast. A comparison is often drawn between these implements and speech surrogates, which scholars have called “telegraphic instruments” (Church 1898; Verbeken (1922)), “drum-telephones” (Verbeken, 1922), “loudspeaker[s]” (Neeley 1999), “ancient text messages” (Villepastour 2010), and “musical newspapers” (Bebey 1999). Practitioners of some systems have drawn the same connection: “Nekgini speakers playfully liken their slit-gongs to a telephone system, and it is common to hear one person say to another, “ring me on a slit-gong”” (Leach 2002). A 1970s Solomons Islands newspaper, The Solomons News Drum, was so named in reference to the island’s indigenous drum-signaling practices (Linton 2013). But the analogy is not total. The acoustic fidelity of a cellular phone, allowing it to transmit a close acoustic rendering of the human voice, is a technological innovation that postdates the origins of the known speech surrogate traditions.

In its absence, practitioners face a serious challenge. Natural languages are complex systems, endowed with detail, nuance, and (in principle) limitless possibility. The tasks we entrust to our natural linguistic faculties are not easily undertaken by artificial means. Any literate person knows how hard it can be to express oneself clearly in the written word, and to interpret the writing of others, from the literary journal to the instruction manual or the text message. That is in spite of the sophistication of writing itself, which has only been invented three or so times in human history (Daniels and Bright 1996) and for which even the basic principles require years of schooling to acquire. The system of English writing employs, at minimum, 26 distinct two-dimensional forms; the conventions of capitalization add another 26, and punctuation at least nine more. Speech surrogates are practiced in channels of much narrower bandwidth, some of the most common being slit-log drums, trumpets of ivory or bamboo, and wooden flutes that produce only a handful of distinct tones. Even the anatomical whistle, which is produced by the vocal articulators just as in verbal speech, is far simpler than the human voice timbrally and occupies a more limited frequency range (Meyer 2015). There are of course some differences between the orthographic and auditory reductions that compensate in the opposite direction—for one, writing lacks the ability to distinguish meaning through small gradient variations in rhythm, as can be found in certain speech surrogate systems. Nevertheless, producing a wide range of sounds suitable for the nuances of human language can be a tall order.

This, then, is the essential problem facing speech surrogate practitioners: the possibility for meaningful contrast is less in speech surrogates than in natural language.

As a result, speech surrogates cannot and do not account for all of natural language’s complexities. This is especially clear in the phonological domain: as Stern (1957) explains: “an abridging system, while preserving some phonic resemblance to the base utterance, represents only part of its phonemic qualities” (125). For systems based on tonal languages, that usually means stripping away segmental features to primary reproduce aspects of tonal phonology [as in Kele drumming (Carrington 1949)]. A handful of segmental features can be encoded in addition to tone, such as vowel length and syllable structure (McPherson 2018b). Non-tonal languages may be reduced to other prosodic features such as pitch-accent (Caughley, 1974), or to segmental features like vowel formants and consonants (Rialland, 2005). Seifart et al. (2018) find for Bora drumming that gradient rhythmic contrasts reintroduce some phonetic detail alongside categorical ones, but the overall effect of segment loss is still large. Lexical-ideograph systems do away with phonology altogether, leaving only lexical units that themselves tend to be arranged in simpler syntactic forms than in verbal speech. Translating from natural language to speech surrogate inevitably diminishes the linguistic content of the signal.

This process has several consequences for speech surrogate communicability. An important one is the need for acquisition: as a rule, surrogate speech does not superficially resemble verbal speech, and therefore needs to be acquired in addition to it. This is obviously true for lexical-ideograph systems, for which the acoustic signal diverges freely from verbal signals. But it is also true in varying degrees for abridgement systems. The sound of a slit-log drum, which is common in abridging speech surrogate traditions of Central Africa and South America, consists of transients with sharp attacks and quick decays, generally lacking all of timbral variations produced by the oral cavity. Other instruments make closer approximations of speech sounds: the Yorùbá “talking drum”, which permits the drummer to regulate gradient pitch, follows the contours of Yorùbá post-lexical tone (Akinbo, 2020). These are nevertheless lacking in the timbral and articulatory contrasts making up segmental phonetics. Whistling systems have an easier route to phonetic detail, since they are produced from the vocal tract. Even so, as Meyer (2015) describes:

[whistling] is very different from the human voice both in its mechanism of production and in its acoustic form. A whistle consists of a simple narrowband melodic line modulated in frequency and amplitude … the voice shows a complex distribution in frequency. (73)

Abridging systems can also diverge from the surface realization of a base utterance, further masking sonic resemblance. The Sambla balafon surrogate system encodes the underlying tonal phonology of the language (Seenku) as well as aspects of syllable structure (McPherson 2018b). Post-lexical contour tone simplification is present in the spoken language but not emulated on the balafon. Characteristic intonational patterns of downdrift and declination are also omitted from the balafon surrogate system, which is limited to discrete pitches. The surrogate forms are therefore perceptually distinct from the realization of tone in spoken Seenku.

Similarly, Ewondo drumming ignores downstep in the spoken language, reducing contrastive pitches on the surface from three (high/low/downstepped high) to two (high/low) (Neeley 1999). James et al. (2021) note the same pattern in at least two other African speech surrogates: the Efik double bell and Luba slit-log drumming. Both systems, which make use of only two pitches, were originally analyzed to be simplifications of three- or four-toned languages (Simmons, 1955/1976; Burssens, 1936). Later, the languages were later analyzed as two tones and a downstep process not encoded in the speech surrogate (NKongola and Maddieson, 1973; Glewwe, 2019).

As a result, even phonology-based surrogate speech is not easily intuited from knowledge of the spoken language. To those unfamiliar with the system, speech surrogates sound like musical instruments being played or pitches being whistled. This is borne out in the neurological evidence that listening to whistled speech does not activate speech processing areas in the brains of naïve listeners (Carreiras et al., 2005). As I discuss in Compensation Strategies in Speech Surrogate Communication, this means surrogate speech is a skill that must be learned independently from fluency in the base language.

Another challenge is ambiguity. The loss of contrast means (for abridging systems) that distinct units in speech translate to homophonous sequences in the surrogate channel. In a speech surrogate based solely on tone, two words with the same tone melody are identical, even if they are distinguished by segmental contrasts in speech. This effect has been noted extensively (e.g., Carrington, 1976, 1949/1976; Herzog, 1945/1976; Simmons, 1955/1976; Stern, 1957) to significantly increase the number of homophonous items in the speech surrogate lexicon. Homophony is rarely a problem for communication in everyday speech, but it is also rarely so widespread as it is in surrogate speech.

A third challenge, less commonly discussed in the literature, is in processing. Surrogate speech comprehension is taxing on the cognitive faculties. Both abridging and lexical ideograph systems require listeners to interpret complex auditory signals under much more difficult circumstances than everyday speech. Diminished redundancy means a listener must make use of every detail of the signal in order to interpret the message. A speech surrogate signal may be produced beyond a listener’s sightline, eliminating the visual and contextual clues that aid in the interpretation of natural language. Local ambiguities mean that listeners have to hold larger chunks of information in their short-term memories at once rather than interpreting a signal segment by segment (Cloarec-Heiss 1999); unlike writing, a speech surrogate message cannot be revisited and reinterpreted indefinitely. These factors are all compounded in systems that have an overall lower frequency of use than natural language. There are no attested examples of communities for which surrogate speech is the primary means of verbal communication. Participants such as women or children in areas are unwilling or forbidden to practice the system themselves, and are given no formal instruction—yet acquire fluency in the system nevertheless.

Given these challenges, it is tempting to say that the comprehension of surrogate speech is overreported, and that speech surrogacy is typically most meaningful as musical performance or traditional pastime rather than as language. To be sure, early scholarly accounts erred on the side of exaggeration in suggesting that speech surrogates were as expressive as verbal speech and could permit (fantastically) an unbroken line of communication from one end of a continent to another. As Goodwin (1937) observed: “the drum language of West Africa has been built up by careless journalism into one of the wonders of the world” (234). But there is yet evidence that speech surrogates around the world are commonly used for communication, from relaying news and sending invitations (Cloarec-Heiss 1999) to cracking jokes and swapping insults (e.g., Taaken Sàmàari, Ames, Gregersen, and Neugebauer 1971). As Previous Work on Speech Surrogate Communication shows, this evidence is increasingly being supplemented by experimental and neurolinguistic methods showing how that speech surrogacy manifests linguistically in the brain. We are therefore left with the question not of whether speech surrogates can be effective linguistic communication systems, but how.

Compensation Strategies in Speech Surrogate Communication

In this section, I present a preliminary typology of compensation strategies used to overcome the surrogate modality’s shortcomings. As we shall see, the pathway to speech surrogate communication runs through cultural and structural factors alike. It blurs the formal lines between abridgment and lexical ideograph systems. Hearkening back to Viglioccoet al. (2014), speech surrogates rely on cultural context (Constructing a Discourse: Cultural conventions) as well as linguistic form (Frames, Formulas, and Enphrasing: Structural constraints) in the construction of meaning.

Constructing a Discourse: Cultural conventions

Speech surrogates, abridging and lexical ideograph types alike, are bounded by conventions that cannot be derived from their base languages. As discussed in Challenges for Speech Surrogate Communication, a speaker of a language is not automatically qualified to practice a speech surrogate based on their language. That is not to say that any person cannot develop one—from my own experience, if a reader does not already whistle a (spoken) language natively, they can choose to begin at any time, and may be fluent in minutes. For that matter, any musician ought to be equally capable of developing a speech surrogate on their instrument. But this does not mean that they will be understood or understand the surrogate speech of others, including those who speak the same language. Instead, inclusion in the discursive ecosystem of a particular speech surrogate requires insider knowledge of how it is used (Internalizing the Rules) and what it is used for (Topic Limitations).

Internalizing the Rules

Surrogate speech varies from community to community, even for abridging systems based on the same language. (Clarke, 1934). remarks for the Congo that “[s]trangers going into a new locality, although their spoken language may have only slight differences, do not as a rule understand the language of the drum” (418).

Neeley makes a similar observation about two drummers in neighboring Ewondo-speaking communities, where the drumming systems are based on the spoken Ewondo language:

Though residing only a few kilometers apart, the two drumming catechists have idiosyncratic ways of drumming and of verbally interpreting the drum patterns. Each one is oriented towards a local speech community and will probably be only partially understood anywhere else. (Neeley 1999)

The same is evidently even true of whistled speech, despite its roots in spoken articulatory phonetics (Meyer, 2015). As Wilken (1979) observes in his study of the whistled Spanish of Oaxaca, Mexico:

[D]ifferent villages have distinctive methods of whistling, in essence whistle dialects, even though all are whistling Spanish. Thus residents of Tetlatlahuca say that they can more or less follow the whistle speech of other pueblos in the município but can pick up only a word or two in a sentence whistled in Tepeyanco (884).

Likewise, (Classe, 1957), notes two distinct varieties of Silbo Gomero which are “not always mutually intelligible … one expert silbador informed me that it took him three or 4 days to become sufficiently familiar with the style of whistlers from other parts of the island to understand everything they whistled” (974).

Clearly, then, one can only become a member of a speech surrogate community if one learns its conventions—what applies for one community does not apply to another. While we still know little about speech surrogate acquisition, it seems that it can be learned by passive exposure or explicit pedagogy. The former is observed in vibrant traditions, especially whistling systems, which are mastered in childhood (Cowan, 1948; 1976; Hurley 1968; Stern 1957; Wilken 1979). In communities where whistling is commonplace, non-practitioners may develop working knowledge of whistling out of necessity (Meyer, 2015). However, even in such environments, comprehension and production do not develop apace with natural language; children understand whistled speech only several years after they begin to talk (134).

For most systems, anything more than basic knowledge requires conscious effort to acquire, even when input is abundant. The residents of Mekomba, an Ewondo-speaking community studied by Neeley (1999), heard biweekly calls to church by the catechist, an expert speech surrogate drummer. As a result, “[m]any people have the receptive competence to understand roughly any short phrase that is commonly drummed” (154), but “the verbal formulas [of the drum language] are not well understood by the majority of Mekomba residents” (164), and “few people understand all the intended words” (161).

For the Sambla balafon surrogate system, Strand (2009) says that time spent away from the village and individual interest dictate a Sambla listener’s understanding. “Most Sambla can understand their name and at least a handful of common phrases” (224). However, at Sambla musical celebrations, only a “core group” of attendees “approach the baan [balafon] between songs and engage in instrumental-verbal exchange with the soloist” (234).

In the Reite communities of Papua New Guinea, who use a slit-log drumming system of lexical ideographs, a similar distribution is reported:

Only a few men and women are skilled in using the full range of beat combinations which enable one to say such complex things as, “the whiteman will come to eat banana in [a particular] hamlet tomorrow afternoon, as long as there is no rain”. Everyone, however, is able to hear their own name, and simple instructions (as in the favourite, “Hurry up!”) (Leach 2002).

(Carrington, 1976) reports that Lokele children began to understand drummed Kele at “five or 6 years old” (620), but it is not clear how often that was true in practice by the 1940s. His 1943 survey of Lokele schoolboys found only 36 percent could reproduce their own drum names on the drum, indicating a “marked decrease in drum-signaling … among the Lokele people” 552) in that period. He suggests this was because “Lokele youths and boys are becoming less and less anxious to learn the drum language,” partly due to the rise of literacy and telecommunication.

According to Poss, limited receptive competence in the Hmong raj tradition is widespread among Hmong-Americans raised in Thailand or Laos. “Many native speakers of Hmong who claim not to understand tshuab raj can pick out certain common expressions” (Poss 2005). The most knowledgeable listeners are “highly successful at interpreting musical messages even when they are taken out of context” (Poss 2012). This level of competence involves some casual pedagogy but is mostly attained through extensive listening: “The process of learning to play and understand words on the raj was informal. Relatives or friends might demonstrate a few phrases, but much learning took place through the observation of performances” (146).

Evidently, a roughly binary hierarchy can be found in communities of practice consisting of 1) skilled members who learn the conventions of the system as a whole, and 2) unskilled members who learn the most common messages in the system by heart. In Meyer (2015) typology of whistlers, these categories correspond to ““fluent whistlers”… [who] have mastered the production and perception of whistling … [and] “canonical whistlers”… [who] know set phrases understood by nearly everyone” (57). All systems must have unskilled members, since that population includes those who have yet to learn it. But the vitality of a speech surrogate is a function of their proportion in the community: as Meyer points out, “when the population is mostly composed of canonical whistlers, the whistled language is nearly dead” (57).

Skilled members’ competence comes from mastering the conventions of the tradition. McPherson, (2018a), McPherso, (2019), McPherson, (2020) work shows that a skilled practitioner of the Sambla balafon tradition can easily produce novel elicited forms, translating systematically between spoken language and surrogate speech. This is quite different from how the system is used in practice, where it appears in brief, mostly predictable verbal exchanges during traditional gatherings, as well as in instrumental adaptations of Sambla vocal music (McPherson and James, in press).

My preliminary study with Benjamin Nimbatara, a skilled Birifor gyil practitioner, offers additional evidence. The Birifor gyil of Ghana’s Upper West region has some commonalities with the Sambla balafon tradition. Both are found within Africa’s “western xylophone belt” (Mensah, 1982). Both are traditionally played in group ensembles at important cultural gatherings like weddings and funerals (Vercelli 2006; Strand 2009). Both reproduce the tonal phonology of their respective spoken languages on gourd-resonator xylophones. However, the “talking mode” found in the Sambla system is absent among the Birifor. Birifor surrogate speech is limited to traditional texts adapted into music via fixed rules of tonal text-setting, similar to the “singing mode” of the Sambla balafon (Vercelli, 2006). I nevertheless found that Benjamin translated most elicited words onto the gyil easily. His competence with the Birifor surrogate tradition was extensible to an impromptu “talking mode” he had never used before. While more research is needed to know how these elicited forms differ from traditional ones, this points to the way that skilled practitioners acquire fluency in the conventions of their speech surrogate.

We can conclude that, to comprehend surrogate speech, members of a speech surrogate community need first to acquire an understanding of its form. For unskilled members, attention is paid to the surface form of messages, with an emphasis on the most frequent; skilled members gain a mastery over the system’s underlying organizing principles.

Topic Limitations

Another strategy delivered at the cultural level is the construction of a conventional discourse wherein the same topics come up many times. This is an important tool for combatting the problem of ambiguity, since it allows listeners to use context to their advantage when interpreting a message. This strategy is expressed differently depending on the practice: certain implements are better suited to particular discourses.

For instance, loud implements used in community settings are less often used for private or sensitive information. Instead, they are “generally intended for use in community life as a means of stimulating or guiding social action or social behavior” (Nketia, 1971). This is a healthy restraint on the Universe of possible messages, since listeners can expect only those that have some relevance to them: announcements relating to local public life, information on prominent members of the community, and other topics on which they already have background knowledge. This is a general feature of public discourse in oral cultures (Ong 1977).

The arrangement is aided by the specialization of surrogate speech in a given community. Part of the problem of ambiguity is that long-distance communication strips away information about the identity and context of the interlocutors. If there are only a few practitioners, or if the context of use is regulated, the signals are more easily attributed to a source. An example of both conditions is the Ewondo drumming of Mekomba, Cameroon. During the time of Neeley’s study, “a handful of older men [were] recognized as having extensive ability on the nkul as a speech surrogate; several others [had] limited ability” (41). Moreover, there was only one catechist. Calls to church consisted of a single genre performed by a single individual at a predetermined time. This made messages heard at that time easy to interpret:

When Mekomba residents awaken to the sound of the nkul on Tuesday and Friday mornings, they quickly construct in their minds a contextual configuration. They assign to it a field, recognizing that a scheduled, public communication is being drummed at dawn. They assign to it a tenor, recognizing the personal relationships involved. They assign to it a mode, recognizing specific meaning and broad intention communicated by the speech surrogate formulas. Through this mental construction of the contextual configuration, the audience knows what action is expected of them. (164)

Whistled speech and other quieter speech surrogates are not usually constrained by public discourse in the same way that louder implements are. But the topics are, probably to a greater degree, bounded by their immediate context of use. As Cowan (1948) describes for Mazatec whistling:

In spite of the high probability of ambiguity, the actual instances where confusion occurs are amazingly few. This is due to the fact that whistling is most frequently (though not necessarily) concerned with topics immediately obvious to both parties to the conversation, and used in situations where cultural context plays a much greater part than in the spoken language. (1,390)

The context of use itself is mediated through social convention. Whistling is a “specialized version of a language used for specific purposes in particular social circumstances” (Sicoli 2016). More specifically, whistling systems are usually localized to outdoor environments and to the occupations of hunting and pastoralism (Sicoli, 2016), so topics will trend towards these areas. In Chepang whistling, for example:

There is considerably more ambiguity in whistled communication than in the spoken equivalent. But the very strong limitations on cultural context means that most of these ambiguities can be resolved. In fact, not only is whistle speech limited to certain situations, those of animal and bird catching, but within these it appears to be used only for relatively few, more essential, communications, particularly those relating to movements of the prey. (Caughley, 1974)

Whistling and quieter instruments are more amenable to private or secretive conversation, since they have a limited range of transmission and may conceal meaning from casual or unskilled listeners. While the Universe of topics included in “private conversation” is limitless, in speech surrogates it often means something more specific: courtship. This theme is common to speech surrogates across Southeast Asia and the Americas, especially whistling (Hurley 1968), woodwind instruments (Hmong raj,Catlin 1982; Gavião kotiráp, Moore and Meyer 2014), and jaw harps (Proschan 1994; Pugh-Kitingan and Jacqueline, 1982).

The prominence of this theme is unsurprising. Social theory tells us that (along with public discourse) courtship is among the most conventionalized social practices; participants choose to enact or negotiate sexual scripts which “encourage[] the conservative, highly ritualized, or stereotyped character that sexual behavior often takes” (Simon and Gagnon 1986). Speech surrogates aid in the enactment of sexual scripts, since they distinguish courtship interactions from normative conversation (Catlin, 1982). But it can equally be said that they are aided by the script: messages limited to the careful constructions of a romantic exchange will tend to towards being frequent and predictable—and therefore more effective implements for communication.

Frames, Formulas, and Enphrasing: Structural constraints

The strategies described thus have been established on the cultural level: members of a speech surrogate community understand speech surrogate messages by gaining knowledge about the system’s properties (Internalizing the Rules) and its uses (Topic Limitations). In this section, I examine the structure of individual messages in speech surrogates.

I find these strategies tend to operate in one or both of the following domains: roteness and elaboration. Some put constraints on the Universe of possible content, making speech surrogate messages more predictable and therefore easier to produce and interpret. I classify these as roteness-oriented strategies. Moreover, some systems regulate the complexity of messages, generating messages that are longer and more elaborate than the base language to eliminate ambiguities, or limiting their length to make them easier to parse. I classify these as elaborateness-oriented strategies. Several strategies operate in both domains.

Framing

What I call “framing” is a strategy for organizing messages in a surrogate system. It encourages messages to be structured in predictable, fixed forms. As such, it is roteness-oriented. However, in some cases, a framing strategy mandates additional elements—essentially discourse markers—that segment the flow of new content, making messages less ambiguous. This use is elaborateness-oriented.

Framing is widely attested in speech surrogates, displaying several cross-linguistic commonalities. Messages in many systems conventionally include an opening signal (Coulter, 2007), a closing signal (Sicoli, 2016), or both (Heepe, 1920; Carrington, 1944; Arom and Cloarec-Heiss, 1976; Carrington, 1944; Goethem 1976; Heepe, 1920; Neeley 1999). In discourse analysis, these signals correspond to the “aperture” and the “finis”, which demarcate the termini of a discourse (Longacre 1996). Within these two brackets, the structure of a message may be bracketed further, with definite “slots” for the names of the addressed party (Rialland 2005; Burridge, 1959), the source of the signal (Burridge, 1959), or markers that identify a particular kind of message content (Seifart et al., 2018). Carrington (1976) provides a representative example of some of these types in Kele drumming:

A drummer usually begins with the call: ki, kɛ, repeated two or three times. Then follows the name of the person or persons to whom he wishes to “speak”. His business follows. He concludes by drumming out the name of the person for whom the call is made, and then a series of beats on the low note terminates the communication. (546)

Seifart et al. (2018) identify a similar pattern for the Bora manguaré system. Manguaré messages begin with a choice of two sequences corresponding to the message “type”: íkʲòòkáré tsà-ʔíhkʲà “Come now!”, or íkʲòòkáré tsíβà-ʔíhkʲà “Bring now!”. The “type” is followed by the name of the “addressee”, itself divided into several components associated with clan identifications. The “message content” follows, and the message then terminates with the “end” sequence (Seifart et al., 2018).

Lexical-ideograph systems use framing in much the same way. For example, in the non-phonoplogical Alamblak nrwit (slit-log drum) system of Papua New Guinea, messages begin with an aperture: an initial, indefinite striking of the drum that “alerts people within hearing range that someone is about to say something on the nrwit” (Coulter 2007). Then, the drummer plays the signal corresponding to the place of the message’s intended recipient. That name itself constitutes a frame, comprising a signal “to inform the hearers that a place name will follow immediately” (Coulter 2007), a signal for the local clan’s totem, and a specific regional identifier. The drummer then plays the addressee’s name, also composed of distinct identifiers. Finally, the actual message content is given; once the drummed discourse is over, the nrwit player may add an optional “coda” to mark its completion (97).

In these cases, framing essentially imposes a “surrogate syntax”: a set of rules more restrictive, and more predictable, than the base language. With regard to whistling systems, then, an important distinction must be made. Unlike many instrumental speech surrogates, whistled languages tend to adhere closely to the vocabulary and syntax of everyday speech. In fact, on these grounds Meyer (2008) asserts that “[c]ontrary to a “language surrogate”, whistled speech does not create a substitute for language with its own rules of syntax” (70).

However, there are some instances where whistling systems do deviate from the syntax of the base language using framing. For example, Chintantec whistling employs an utterance-final particle réi¹³ as a finis, working “similarly to “over” in radio operator talk” (Sicoli 2016). Réi¹³ is not found in standard Chinantec speech contexts. Similarly, according to Rialland (2005), Silbo Gomero uses the vowel a as an aperture:

The vowel a plays a special role in Silbo in that it provides a reference point for whistlers, who usually begin their messages by whistling a followed by the name of the addressee: a Bernardo, a Maria, a Sebastian, a Domingo. (5)

Moreover, though whistling systems adhere closer to a language’s syntax, they appear to make use of less of it. Sicoli (2016) shows that Chinantec whistling displays only a few of the forms of conversational repair evident in the spoken language, selecting against those that produce syntactically complex forms. He argues that whistled speech in conversation is usually limited to short, semantically simplex utterances, disallowing “multiple actions and complex embeddings” in a single turn (427). This echoes Cowan (1948) observation for Mazatec whistling that “single utterances tend to be short” (5). This trend towards simplicity may even extend to morphology: Wilken (1979) notes that the whistled Spanish of Tlaxcala is mostly limited to the present tense (883).

This, too, is an elaborateness-oriented strategy: ruling out complex utterances and stripping away redundant linguistic features regulates the elaborateness of an utterance downward. While this likely makes whistled speech easier to produce and process, further research is required to assess the precise effects of discourse markers and pragmatic constraints in speech surrogate communication, particularly in whistled speech.

Formulas

In surrogate speech, formulas are groups of words combined into fixed, conventional phrases. To a large degree, formulas are characteristic of speech surrogate texts cross-linguistically. They have been attested for a great number of the documented abridging systems, particular the speech surrogates of Central Africa (Hulstaert 1935; Carrington, 1949; Dugast, 1955; Neeley 1999) and of West Africa (Herzog, 1945; Nketia, 1958; Ames et al., 1971), as well as in Hmong surrogate speech (Falk, 2004; Poss, 2005) and others. The arbitrary signals of lexical-ideograph systems are also essentially formulaic. With the evident exception of the Diola (Moreau, 1997) and Abuʔ-Wam (Nekitel, 1985) whistled languages, which apparently combine phonological encoding with more arbitrary conventional signaling, formulas are much less characteristic of whistling systems.

This strategy operates on the message level, crystallizing words, phrases and sentences so that they can be produced and processed in whole form. This makes it roteness-oriented, since it constrains the form a message can take. Since much of value has been written on this feature of surrogate speech, I refer the reader to Neeley (1999) and Ong (1977) for detailed analysis of their general properties; in Formulas, I discuss a variant of the formula in depth. This section is limited to addressing a point raised in those works that must be interpreted with caution.

Neeley (1999) and Ong (1977) each point to the oral-formulaic theory of Milman Parry and Albert Lord as a natural analogy for studying speech surrogate texts. Oral-formulaic theory, first popularized in Lord’s work The Singer of Tales (1960), has had a profound impact on the study of oral poetry and related fields. Lord advanced a cross-cultural analysis that identified the formula as the defining feature of epic poetry. The comparison is compelling: formulas are the building blocks from which epic poetry is formed, making up a repertoire that defines a whole tradition. Likewise, for many speech surrogates, formulas are the atoms of meaning from which all discourse is assembled. However, this analogy only goes so far, at least without abandoning some of Parry and Lord’s thesis.

In oral poetry, the formula is a “group of words which is regularly employed under the same metrical conditions to express a given essential idea” (Albert, 1960). They are constitutive of the texts of epic poetry in large part because they are the key to producing novel verses at great length: copious spontaneity is effortful, and improvisation is made more challenging by the constraints of poetic meter. Thus, rote phrases with the right metrical properties are relied upon to ease the poet’s creative burden over hundreds or thousands of lines.

Lord’s framing of this theory privileges the effort of the performer. For an epic poet in performance, the most important question is: what will I say next? The decision is made with immense pressure on the cognitive faculties, since it requires the working memory to fit every new development into the procession of a narrative which might last hours or days. Formulas are cognition-economizing devices, allowing a performer to readily develop verse under challenging conditions. Evidence from psycholinguistics confirms that formulas are effective tools for maintaining speech fluency when memory resources are tight (Kuiper, 2004).

We are currently without the benefit of psycholinguistic studies on surrogate speech production; we do not yet know the precise mental pathways practitioners take when they produce messages. Therefore, we cannot be sure if the performance context of speech surrogates is (always) comparably taxing to that of epic oral poetry. In my view, there is reason to think it is not.

One reason is that speech surrogate messages do not always share the epic poem’s discursive context: long-form solo performance. Some speech surrogate traditions, such as the qeej funeral performances of the Hmong people (Falk, 2004), do approach the length and narrative involvement of the Serbo-Croatian epic poetry studied by Lord, in which individual songs numbered in the hundreds or even thousands of lines. But speech surrogate messages can also be short and simple, even while making use of formulaic language—the “speech mode” of Bora drumming in the Amazon, for instance, employs formulaic constructions in messages that amount to only a few phrases (Seifart et al., 2018). Moreover, as Neeley (1999) argues for Ewondo drumming, speech surrogate performance is not always subject to metrical constraints. To be sure, speech surrogates used in musical contexts often adhere to established metrical patterns (Nketia, 1963) or even reproduce interactions of meter and lyric from vocal music (as in the “sung mode” of the Sambla balafon; McPherson and James forthcoming). But other abridging speech surrogates can be based entirely on speech rhythms, as determined by categorical factors like syllable structure (Neeley, 1999) in addition to gradient factors like vowel-to-vowel syllable timings (Seifart et al., 2018). In these cases, the rhythmic content of a message is determined purely by the choice of semantic content, not the reverse.

Surrogate speech may well require more cognitive effort to produce than everyday speech, given its unusual linguistic properties and its greater reliance on motor control to produce sounds. But it is not clear that this is what drives the prominence of the formula in surrogate speech, as Lord describes in the case of the epic poem. Lord writes: “the repeated phrases were useful not, as some have supposed, merely to the audience if at all, but also and even more to the singer in the rapid composition of his tale” (30). On the contrary, for the speech surrogate, the former seems more likely: formulas are of great benefit to the listener, as they do the essential work of disambiguating messages.

Enphrasing

Enphrasing is a special case of the formula that deserves special attention. The term is a coinage from Stern (1957), who defines it simply: “the lexical unit is replaced by a phrase” (127). Like other terms found in that paper, its definition has been (aptly) a bit elaborated by subsequent authors. A representative rewording can be found in Seifart et al. (2018): ““enphrasing”, i.e., elaborating words and sentences to make them longer and less ambiguous” (2). Enphrasing is a strategy that operates on the roteness and the elaborateness of a message at once.

Variations on this description are attested throughout the literature of Central and West African drumming systems from early sources onwards. For example, Heepe (1920/1976) observed that, while some units in the phonological Ewondo drumming system were literal transpositions of a single word, others were figurative circumlocutions: for instance, Ewondo awü “death” corresponded to the drummed Abüo (∼abo) äsi äfgm “He lies very quietly in the Earth” (333). Herzog ([1945] 1976) writes that the phonological Jabo drumming in Liberia consists largely of “periphrastic formula[s]” (555). Long lists of enphrasing texts are available for Kele drumming (Carrington, 1949, Carrington, 1976) and Beti drumming (Hulstaert, 1935). Outside of Central and West Africa, the phenomenon has been more recently described in Southeast Asia (Bradley, 1979; Poss, 2005) and the Amazon (Seifart et al., 2018).

Like other formulas, enphrasing is understood to be a strategy for targeting the ambiguity problem in abridging systems: “short words that would come out as homophones in drumming are replaced by longer, less ambiguous expressions” (Seifart et al., 2018). Carrington (1949) provides several examples to this effect in the phonological surrogate texts of Kele drumming. For instance, the word songe “moon” is conventionally replaced with a drummed sequence encoding songe li tange la manga “the Moon looks down on the earth” (Carrington, 1949). Carrington explains that songe (a word with two high-toned syllables) would be represented directly on the drum as a sequence of two high-pitched strikes; without enphrasing, it would be indistinguishable from koko “little bird”, also represented by a sequence of two high notes. However, songe li tange la manga is tonally distinct from the enphrasing given to koko: koko olongo la bokiokio “the fowl, the little one which says “kiokio'”, so their drummed representations are likewise distinct. This example shows that enphrasing is oriented to both roteness and elaborateness: the strategy transforms basic words into fixed, elaborate sequences to help the listener interpret otherwise ambiguous messages.

In its response to functional pressures, enphrasing makes the formal division between abridging and lexical ideograph systems less distinct. Just as framing creates its own independent “surrogate syntax”, enphrasing systems constitute a “surrogate lexicon” which refers only conventionally to the base language. When Kele drummers want to refer to the Moon, and more specifically, to the word songe, they must check the word against the speech surrogate lexicon to find its equivalent. They then drum that phrase, following the rules of the abridging system. The resulting drum sequence matches only the phrase in the surrogate lexicon in terms of phonology—relative to the base utterance, it is arbitrary. While Carrington’s songe and koko enphrasings both begin with the words themselves, that is not a requirement: a Kele enphrasing for lotika “orphan” is enphrased as “wana ati la saŋgo la nyaŋgo “the child has no father or mother”” (541). The surrogate phrase is therefore abstracted from the sound of the original word, even while retaining a connection to phonological structure. This echoes more typical lexical ideograph systems like the Tangu garamut signal-drumming of Papua New Guinea, where signals are acoustically abstracted from their referents, though they might still have roots in “linguistic [or] quasi-poetic rhythms” (Burridge, 1959).

Lord says that for the epic poet, “the formula means its essential idea” (65). In other words, an elaborated expression denotes only its most generic meaning; “The “drunken tavern” means “tavern”” (65). This applies equally to enphrasing. Regardless of what the enphrasing means in a literal sense, in the speech surrogate it denotes whatever word it is coindexed with in the base language.

Alternative interpretations for enphrasings are ruled out by fiat. The songe/koko pair provides a neat example: since songe and koko share the same tonal pattern, the first word of each enphrasing should sound the same. Therefore, the former could just as well be koko li tange la manga “the little bird looks down on the earth”. Neither the phonological structure nor semantic content rules out the alternate interpretation on the drum, since a flighted bird may look down on the earth, as can a Moon. It is easy to see how other such semantic ambiguities could pose a problem when common referents have similar properties, such as animal species. If “rhinoceros” and “elephant” were homophonous in the surrogate language, it would be prudent to avoid an enphrasing meaning “the big gray creature”, since the two animals could appear in interchangeable contexts. I can find no evidence that confusions of this sort ever happen in practice. With few attested exceptions, the surface form of an enphrasing is ascribed one meaning.

As a result, an enphrasing need not “make sense” in the base language. The meanings of the words in an enphrasing may change and fade. Carrington observes that “[a] second characteristic of the gong-phrases is that of the tendancy [sic] to use derogatory or diminutive words” (1949, 4), even for generic objects that are neither subpar nor small. For example, “the Kele gong-word for « fishing-net » is: biléme yáwɛ́ŋgo, which is translatable as: « a little bit of an old fishing net »” (4); similar constructions are attested in other drum languages of the Western/Central African group (Heepe, 1920; Hulstaert, 1935; Jacobs, 1954). Likewise, in Ewondo drumming “[a] few nouns are expanded within Antoine’s performance paradigm with diminutive and pejorative descriptions: man ntu eyie “small old cloth”… man etug nkul “small, old, broken drum” … otutu dza “poor little village”” (Neeley, 1999). Bora drumming has a similar pattern, using pejorative markers for nouns, as well as a repetitive morpheme for verbs. According to Seifart et al. (2018), “Bora speakers have no intuitions why elements with the literal meanings “deceased”, “repeated” and “damaged” should be used in manguaré messages (12). They argue that these constructions “do not carry any semantic value, but function purely to identify the preceding sequences of beats as representing nouns or verbs” (12). Interestingly, this echoes the evidence that diminutive markers aid in word segmentation during spoken language acquisition (Kempe et al., 2007).

By the same token, an enphrasing need not be a phrase used in the spoken language. Carrington (1949) notes that enphrasings in various Bantu drum languages of the Congo often take the form of bare noun-noun compounds, often synonyms: a Kele drum phrase meaning “news” encodes “mbóli sango … two words for « news »” (50), the Mbɔle drum phrase meaning “bird” encodes “tofulú átɔnɔli … two words for « little birds »” (52), and the Olombo drum phrase meaning “oil” encodes “sókó mainá … two words for « palm oil »” (53). Hulstaert (1935) also reports concatenations of two fish species names to refer to fish generally, and three antelope species names as the general term for animals (660) in Nkundo drumming.

To be sure, this kind of construction is not uncommon in spoken language. Co-hyponyms, constructions which name a category by compounding a few of its constituents, are attested cross-linguistically, as are synonym compounds (Wälchli, 2005). Though most common in the languages of East Asia (Wälchli 2005), Examples in English include “sol-fa” (Renner, 2008), and “subject matter”. Spoken Hmong is rich with co-hypernyms and synonym compounds, particularly among the “elaborate expressions” of its formal repertoire: Mortensen (2004) reports lab “monkey” + cuam “gibbon” → lab-cuam “simian” and ncauj “mouth” + lu “mouth” → ncauj-lu “mouth”, the latter “only used in flowery or ritual speech” (4). Unsurprisingly, these constructions appear in the Hmong surrogate languages. Poss (2005) lists elaborate compounds like leej niam leej txi “mother/father”, used to “create characteristic melodic contours that skilled listeners recognize immediately [and] can then be employed formulaically” (129).

But no such elaborate spoken repertoire is attested in Carrington and Hulstaert’s accounts. This is unsurprising, since such forms would be highly unusual for the language family. In Bantu languages NN compounding is usually unproductive and restricted to a few ancestral forms (Basciano et al., 2011). We can see, therefore, that enphrasings can include forms disallowed by a practitioner’s own synchronic grammar.

In spoken language, lexical ambiguities are usually resolved through context and pragmatic cues. Still, in certain contexts, redundant elaborations have the same function in natural languages as in speech surrogates: making an utterance less ambiguous. In English, we sometimes use synonymous collocations when two senses of a word are easily confused. An everyday example is the pair “funny-haha/funny-peculiar” (Timothy Pulju p.c.), but this usage is most common in technical contexts that demand precision of language: “sanction approval” in regulatory documents and “handicap (dis)advantage” in golf are both examples where the second word is synonymous with one of the two ambiguous meanings of the first. Synonymous binomials like “cease and desist” or “last will and testament” are also prevalent in English legal language, where they were historically used to forestall alternative interpretations of medieval English and Latin or French synonyms (Crystal 2005) and now help to clarify a word’s standard meaning in specific areas of law (Mellinkoff 2004).

Ong (1977) provides another example of the parallel processes between natural and surrogate language: Middle Chinese, an exception to the rarity of pernicious homophony in language. It is generally agreed that the ∼1,500 years between Old Chinese and present-day Mandarin Chinese saw a phonological simplification that vastly reduced the number of contrasting syllables. Since the Old Chinese lexicon was largely made up of monosyllabic words, this meant that most syllables acquired new homophones. Moreover, as an isolating language, morphology could not usually differentiate homophones in context. As Sampson (2013) argues, this produced widespread ambiguity in the spoken language (though not the writing system), to the extent that “Classical Chinese read aloud cannot be understood, without sight of the text, even by scholars who are very familiar with that language” (8).

Consequently, something had to change to keep the language intelligible. What resulted was a large-scale shift from monosyllabic words to disyllabic words through compounding, especially synonym compounding. For instance: Mandarin péngyǒu “friend” derives from Old Chinese forms of péng “friend” and yǒu “friend”; Sampson notes that “péng is seven ways ambiguous as a morpheme of Mandarin and yǒu is three ways ambiguous, but the compound péngyǒu is unambiguous” (Sampson 2013, 9). This process overhauled the lexicon, pushing the rate of disyllabicity from roughly 20% in Old Chinese to over 80% in Modern Chinese (Shi 2002, 70–72). As a result, utterances in the language are longer and (diachronically) more redundant, but less ambiguous—just as in enphrased surrogate speech.

We can conclude that enphrasing lexicons are partially independent from spoken language, though they respond to functional pressures in much the same way. This independence makes it more difficult to draw the distinction between enphrasing systems and systems of lexical ideographs. Like Kele drumming, systems like the signal drumming systems of Oceania have a surrogate lexicon that runs parallel to the base language. Regarding the Tangu slit-log drumming of Papua New Guinea, Burridge (1959) writes:

A child grows up knowing the call-signs of individuals, pigs, and dogs, and the phrases for localities and events, in the same way as he comes to know names and becomes aware of social situations. Standard signals and the events they refer to are associated in his mind as “whistle and train”, “chime and clock”, “hooter and factory”, “church bell and prayers” are associated in the mind of a European. Special signals come into his understanding as do family jokes, sporting metaphor, and the rest. (1,238–9)

In the same way, practitioners of enphrasing systems learn to associate words of their base language with their lengthier, fixed equivalents. The orientation towards roteness and elaboration, used to render surrogate speech less ambiguous, supersedes the formal distinctions between the systems.

Discussion

As Vigliocco et al. (2014) remind us, the study of linguistic structure cannot be disentangled from the structures that surround it—those of modality, culture, and context. In the case of speech surrogates, this holds a dual meaning. Speech surrogates are part of the linguistic ecosystem of the world; to fully understand language’s “broader context of use”, we must account for the surrogate modality. We cannot be surprised that the communicative pressures that shape speech apply their force in equal (or indeed greater) measure to the surrogate modality. At the same time, speech surrogates are a communication system of their own. To analyze speech surrogates in their full form, we need to account for their cultural context and structure in the same breath.

Shore (1991) writes: “[a] theory of meaning construction should account for the historical and local variability of conventional meaning systems” (11), and indeed it seems that speech surrogates are subject to the culture-specific negotiations that govern systems of meaning. What characterizes one speech surrogate does not apply to all, since such negotiations may well proceed differently from place to place. But more research is required to assess how exactly those negotiations take place. What accounts for the distribution and variety of speech surrogate structures around the world? It does not seem conducive to an explanation from innate factors, or strictly as a function of linguistic structure. The most common features—tone-based abridgement, elimination of consonant contrasts and so on—are nevertheless not universals. Their productive mechanisms, structures, and contexts of use all vary from culture to culture. If, say, tone is uniquely coded for speech surrogacy in the human language faculty, why are the surrogate systems of Papua New Guinea largely not based on tonal (or other) phonology, despite the presence of tonal languages in the region? And since whistling is a physically simple, effective means of communication, and is found in tonal and non-tonal languages alike, why do so few cultures have whistled speech? Meyer (2015) argument, sensibly, is that the practice is only developed by people for whom it is especially useful. Thus, as it stands now: the evidence suggests that speech surrogates have a conventional component that must not be ignored.

Evidence for the profound effects of functional pressures on speech surrogates ought to provoke reflection on all language use. If free, sophisticated communication were trivial in the surrogate modality, it would pose a challenge to functional explanations for language structure: why should language be so complex if its duties can equally be performed by a conch shell? Such a finding would lend credence to the theories of language origin by genetic accident or co-evolution. Instead, there is a wealth of evidence that speech surrogate communication is difficult, and that practitioners need to significantly adapt their communicative behavior to compensate for the challenge. Speech surrogates show how our linguistic faculties are molded by the circumstances of their use.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: ODSS: speechsurrogates.org/map.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Funding

Dartmouth College—Presidential Scholars Program provided $2000 stipend during the course of this research.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Akinbo, S. (2020). Representation of Yorùbá Tones by a Talking Drum: An Acoustic Analysis. Vancouver, BC: PsyArXiv. doi:10.31234/osf.io/43gf6

CrossRef Full Text

Albert, Bates. Lord. (1960). 2000. The Singer of Tales. Cambridge, Mass: Harvard University Press. Available at: http://archive.org/details/singeroftales00lord_0.

Ames, D. W., Gregersen, E. A., and Neugebauer., T. (1971). Taaken Sàmàarii: a Drum Language of Hausa Youth. Africa 41 (1), 12–31. doi:10.2307/1159675

CrossRef Full Text | Google Scholar

Armstrong, Robert. G. (1953). “46. TALKING INSTRUMENTS IN WEST AFRICA,” in Speech Surrogates: Drum and Whistle Systems. Editors A. S. Thomas, and J. Umiker-Sebeok (The Hague: De Gruyter Mouton). Available at: https://www.degruyter.com/view/book/9783110804423/10.1515/9783110804423-017.xml.