An acoustic study of vocal expression in two genres of Yoruba oral poetry

Akinbo, Samuel K.; Samuel, Olanrewaju; Alaga, Iyabode B.; Akingbade, Olawale

doi:10.3389/fcomm.2022.1029400

ORIGINAL RESEARCH article

Front. Commun. , 15 December 2022

Sec. Psychology of Language

Volume 7 - 2022 | https://doi.org/10.3389/fcomm.2022.1029400

This article is part of the Research Topic Science, Technology, and Art in the Spoken Expression of Meaning View all 12 articles

An acoustic study of vocal expression in two genres of Yoruba oral poetry

Updated

A correction has been applied to this article in:

Corrigendum: An acoustic study of vocal expression in two genres of Yoruba oral poetry
1. Read correction

$\nSamuel K. Akinbo$ Samuel K. Akinbo¹^*

Olanrewaju Samuel²

Iyabode B. Alaga²

Olawale Akingbade²

¹Department of Linguistics, University of British Columbia, Vancouver, BC, Canada
²Department of Linguistics and African Languages, University of Ibadan, Ibadan, Nigeria

This pilot study proposes an acoustic study of the vocal expressions in Ìjálá and Ẹ̀sà, two genres of Yorùbá oral poetry. For this study, we conducted an experiment, involving the vocalization of an original poem in speech mode, Ìjálá and Ẹ̀sà. The vocalizations were recorded and analyzed acoustically. The results of the study show that cepstral peak prominence (CPP), Hammarberg index and Energy of voiced sound below 500 Hz distinguish comparisons of Ẹ̀sà, Ìjálá and speech but are not as reliable as F0 height and vibrato. By comparing the pitch trajectories of the speech tones and poetic tunes, we show that poetry determines tone-to-tune mapping but can accommodate language when it is feasible. The results of our investigation are not only in line with the previous impressionistic observations about vocal expression in Yorùbá oral poetry but contribute with new findings. Notably, our investigation supports vocal tremor as the historical origin of vibrato in Ìjálá. As a result of this, we strongly recommend the instruments of phonetic science for the study of vocal expression in African oral poetry.

1. Introduction

One of the major challenges with the study of African oral poetry is lack of instrumental analysis, even when the analysis of vocal expression is based on terms with phonetic correlates (Babalọla, 1963; Ajuwon, 1977; Ọlabimtan, 1977; Ọlátúnj́ı, 1979). In this preliminary study, we address this issue by proposing an acoustic analysis of vocal expression in two genres of Yorùbá oral poetry.

Studies show that vocal expressions, such as pitch raising and increased loudness, communicate emotions (Scherer, 1985; Juslin and Laukka, 2003). For instance, high pitch and increased loudness are associated with high level arousal such as anger, happiness and excitement (Banse and Scherer, 1996; Juslin and Laukka, 2003; Johnstone et al., 2005; Goudbeek and Scherer, 2010; Lindquist et al., 2016; Scherer, 2021). The use of vocal expression is not limited to affective communication. For example, in many African oral traditions, various genres of verbal art are distinguished based on the vocal expressions that are associated with them (e.g., Uzochukwu 1981 on Igbo elegiac poetry of Nigeria; Boadi 1989 on Akan praise poem of Ghana; Brown 1995 on indigenous South African oral poetry Ọlátúnj́ı 1979 on Yorùbá oral poetry). The acoustic correlates of vocal expressions in affective speech have been extensively investigated (see the reviews in Scherer, 1986; Juslin and Laukka, 2003; Kamiloğlu et al., 2020), but the vocal expression in African oral traditions have mostly been analyzed without the instruments of phonetic science.

The present paper describes an acoustic study of vocal expressions in two genres of Yorùbá oral poetry. The study is based on Ìjálá and Ẹ̀sà, two genres of Yorùbá oral poetry. We will argue that the instrument of acoustic phonetics can offer valuable insights on the verbal aesthetics of African oral poetry.

Before turning to the details of this study, we present the basic sound inventory of Yorùbá in Section 2. The discussion in Section 3 focuses on the features of Ìjálá and Ẹ̀sà in the context of Yorùbá oral poetry. To understand the phonetic correlates of vocal expression in Ìjálá and Ẹ̀sà, we conducted a production experiment. The details of the experimental procedure are presented in Section 4. The results of the experiment are presented in Section 5. The discussion and conclusion are presented in Section 6.

2. Background on Yorùbá sound inventory

Yorùbá is a Volta-Niger language spoken in West Africa and prominently South-Western Nigeria (Blench, 2019). This section presents a description of the relevant sound patterns in Standard Yorùbá, which is the basis of this work.

2.1. Tone in speech

Yorùbá is a tone language, which means pitch contrasts bring about lexical or grammatical distinctions in meaning (Yip, 2002; Hyman, 2018). The language has three contrastive tones, namely H(igh), M(id) and L(ow) (Akinlabi, 1985; Pulleyblank, 1986).

(1) Yorùbá: Tonal minimal set

H bá ‘to meet'

M ba ‘tobraid'

L bà ‘to land'

As shown in (1), H tone is marked with an acute accent and L tone with a grave accent. However, M tone is unmarked (Bamgboṣe, 1965; Awóbùlúyì, 1978). Throughout this work, this tone-marking convention in Yorùbá orthography is adhered to.

2.2. Vowels

The vowel inventory of Yorùbá contains seven oral vowels and three nasal vowels, which are presented in (2) (Bamgboṣe, 1966; Awóbùlúyì, 1978; Pulleyblank, 1988).

(2) Yorùbá vowels (Pulleyblank, 2009, p. 868)

yes

The phonetic transcription of the vowels is in square brackets and accompanied with the Standard Yorùbá orthography. Except in graphs and tables, the orthographic transcription is used throughout this paper. The low nasal vowel “an” is often pronounced as “ọn” (Bamgboṣe, 1966; Awóbùlúyì, 1978). Considering that the difference between “ọn” and “an” is phonetic, Pulleyblank (1988, p. 237) analyze the free variation between the vowels to be ‘a low-level phonetic effect.” This phonetic distinction between “ọn” and “an” is not crucial to the goal of this paper. We now turn to how the tones and vowels are vocalized in Ìjálá and Ẹsà.

3. Basic description of Ìjálá and Ẹ̀sà

Ìjálá and Ẹ̀sà are some of the genres of oral poetry in Yorùbá culture. Most genres of Yorùbá oral poetry are associated with deities and ancestral devotion. Ìjálá is associated with Ògún, the Yorùbá deity of metallurgy and metal-related works (Babalọla, 1963; Ajuwon, 1977; Ọlátúnj́ı, 1979). Ògún is considered the patron of people who engage in metal works such as blacksmiths, goldsmiths, hunters and professional drivers. The devotees are obliged to pay homage to Ògún. One of the ways of paying homage to Ògún is through the chanting of oríkì, which is the embodiment of the eulogy and epithets about an entity, in this case about Ògún. The mode of performance of this oríkì in chant form is referred to as Ìjálá. In Yorùbá oral history, there are four hypotheses about the origin of Ìjálá. All the hypotheses point to the Yorùbá deity of metallurgy Ògún, but only two trace the origin of the vocal expression in Ìjálá to Ògún's geriatric voice and his alcohol consumption (see Babalọla, 1963, p. 6–12).

Ẹ̀sà, which is also known as Ìwí Egúngún, is associated with Egúngún creed of ancestral veneration (Olajubu, 1974; Adedeji, 1978). Periodically, the ancestral spirits physically manifest as Egúngún masquerade. To pay homage to the spirits of the departed ancestors, the devotees chant praises in Ẹ̀sà poetic mode.

Unlike most genres of music in the culture, most genres of Yorùbá oral poetry are not danceable and may not be accompanied with a drum performance. However, instrumental or vocal music can occur during the intermissions of the poetry performance (Babalọla, 1963; Ajuwon, 1977; Adedeji, 1978; Fámúle, 2018). The instrumental music might involve the representation of Yorùbá phrases with a talking drum (Euba, 1990; Villepastour, 2010; Akinbo, 2019, 2021). Depending on the genre or the mood of the performer, the oral performance may be closer to speech or music, thus they are considered semi-musical verbal arts (Babalọla, 1963; Ọlátúnj́ı, 1979; Ògúndej̀ı, 1991). Similar to Yorùbá, the genres of oral poetry in other African societies are also semi-musical (Uzochukwu, 1981; Boadi, 1989; Okpewho, 1992; Finnegan, 2007, 2012; Purvis, 2009).

Studies suggest that Yorùbá oral poetry can be identified based on the contents of the poem, the identity of the performer and vocal expression (Gbadamosi and Beier, 1959; Ọlátúnj́ı, 1979). The traditional contents of Ìjálá and Ẹ̀sà are eulogy and epithets, but the contents may include historical events, personal eulogy of hunters and non-hunters, social commentaries, humor and all aspects of human existence (see Babalọla, 1963; Ọlátúnj́ı, 1979; Idamoyibo, 2006). Most importantly, the texts of a specific genre (e.g., Ìjálá) can be used for other genres of Yorùbá oral poetry (e.g., Ẹ̀sà) (Gbadamosi and Beier, 1959; Ọlátúnj́ı, 1979). As a result of this, textual contents are not reliable in distinguishing various genres of Yorùbá oral poetry. For example, the popular Ìjálá chanter, Ògúndáre Fọ́yánmu, is widely known for incorporating contemporary socio-political issues in his poems (Olaniyan, 2013). A noteworthy example is the Ìjálá chant of Fọ́yánmu about the historic Nigerian tax war of 1969 (see Adeniran, 1974)¹. Other examples come from the syncretic practices of Yorùbá Christians and Muslims. Although Ìjálá is traditionally associated with Ògún, Yorùbá Christians and Muslims often incorporate Ìjálá chants (and analogously other genres of Yorùbá verbal art) in their religious worships (Idamoyibo, 2006, 2011; Ajibade, 2007; Olátúnj́ı, 2012; Dada, 2014). Thus, the content of a poem and the identity of the performer are not reliable in identifying genres of Yorùbá oral poetry.

There is consensus that various genres of Yorùbá oral poetry are best distinguished or classified based on vocal expression (e.g., Ọlátúnj́ı, 1979; Yai, 1989, etc.). For example, the contents of Ìjálá are always chanted in vibrato (Babalọla, 1963). Ẹ̀sà does not involve vibrato like Ìjálá, but a pattern of vowel insertion and lengthening (Olajubu, 1974; Adedeji, 1978). Regardless of the subject matter, a triply long vowel [ooo] at the end of the first poetic line is a recurrent characteristics of Ẹ̀sà (Olajubu, 1974; Olajubu and Ojo, 1977; Adedeji, 1978). When Christian and Islamic musicians incorporate Yorùbá oral poetry into their religious worships, the genres of oral poetry are recognized, not through the contents of the poem nor the identity of chanters, but the distinctive vocal expression which is associated with the chant.

Ìjálá shares the same vocal expression with Ìrèmọ̀jé, which is a funeral dirge for hunters (Babalọla, 1963; Ajuwon, 1977). Although Ìjálá can be adapted to suit any content, Ìrèmọ̀jé is restricted to funeral rites. As a result of this, there is an on-going debate as to whether Ìrèmọ̀jé is a distinct genre or a sub-genre of Ìjálá (see Babalọla, 1963; Ajuwon, 1977; Olajubu, 1984; Idamoyibo, 2006). The features of vocal expressions in Ìjálá have also been described in terms of rhythm (Babalọla, 1963; Ọlabimtan, 1977), but we do not consider rhythm in this work. Previous studies suggests that vocal expression in Yorùbá oral poetry involves stress given that it involves loudness and pitch raising (Siertsema, 1959; Babalọla, 1963; Ọlabimtan, 1977; Ọlátúnj́ı, 1979). The vocal expression in Yorùbá oral poetry and stress have loudness and pitch raising in common, but the phonetic properties are not as a result of stress in the oral poetry. Unlike stress which involves a syllable being prominent than the other in a word (Liberman and Prince, 1977; Halle and Idsardi, 1995; Kager, 2007), all the syllables of the words in Yorùbá oral poetry are produced with loudness and pitch raising (Babalọla, 1963). Most importantly, Yorùbá is a tone language, not a stress-timed language (Akinlabi, 1985; Pulleyblank, 1986; Kenstowicz, 2006).

In this work, we investigate the phonetic correlates of vocal expression in Ìjálá and Ẹ̀sà. Yorùbá, the textual bases of the poem, is a tone language, which means pitch contrasts bring about lexical distinctions. Considering that the melody of verbal arts such as chanting depends on pitch contour, we also investigate how poetic melodies interact with the linguistic demand of tone contrast. To answer the questions, we conducted an experiment. The details of the experiment are presented in the next section.

4. Methodology

4.1. Stimuli, participant and procedure

The stimulus in this work is an original poem which was composed by the third author of this paper. As shown in (3), the poem is written in Standard Yorùbá orthography. Oral performance is usually from memory, so in order to make it easier for the consultant to memorize, we selected the orìkí for its brevity. By selecting an original poem instead of a widely known traditional poem as the stimulus, we were able to control for the effect of content and familiarity.

One male native speaker of Standard Yorùbá was voluntarily recruited for this study. This consultant (age 28) was a fourth-year undergraduate of the Yorùbá study program at the Department of Linguistics and African Languages, University of Ibadan. The consultant had trainings in chanting various genres of Yorùbá oral poetry, including Ìjálá and Ẹ̀sà. A week before he was scheduled for a recording session, the consultant memorized the orìkí that was composed for the study. At the recording session, he was asked to recite the poem six times in normal speech mode. After reciting the poem in speech mode, he chanted the poem six times in Ìjálá and six times in Ẹ̀sà.

The renditions of the poem in speech mode and Ìjálá mode were recorded in a quiet room at the sampling rate of 48.1 kHz in wav format. Following Babalọla (1963, p. 121), each stretch “of utterances after a breath pause” is grouped as a line of the poem. In line with the observation in Ọlátúnj́ı (1979), the utterances within each pair of breath pauses form a meaningful whole. Based on the chanting of the poem in Ìjálá and Ẹ̀sà, the text of the poem were grouped into four lines.

(3) An original Yorùbá poem

Line 1 Adédùntán Àbẹ̀jẹ́ ọmọ Bàbálọ́jà

“Adeduntan, the child of Babaloja”

Line 2 ẹyinjúu Ọmọladùṇ baríọlá ọmọba Lépolóyin

“the eyeball of Omoladun, the honorable

princess of Lepoloyin”

Line 3 tẹ́ẹ́rẹ́ gbajó ọmọọ̀dọ̀ àgbà, ìdílẹ̀kẹ̀ ẹlẹ́rìn-ín ẹ̀yẹ

“suitably slim for dance, a wise child with

a bead-befitting waist”

Line 4 dúdú wù mí, dúdú dá'mi l'ọ́rùn máa wolẹ̀

máa rọra olówó tí ń f'owó ṣàánú

“(your) blackness is alluring, walk cautiously

(you) benevolent rich”

The tones and vowels of the text in speech and poetic modes were manually annotated in Praat (Boersma, 2001). For the three tones in the language (i.e., H, L, and M), F0(Hz) values of the pitch contour were extracted at 25%, 50%, and 75% intervals. To replicate the pitch contours as they appear in Praat windows for data visualization and analysis, each tone is labeled in a serial order, in this case T1.1, T1.2, T1.3...T2.1, etc, as shown in Figure 1. For each serially labeled tone, F0(Hz) values were extracted at twelve intervals. We extracted F0 values, intensity, formant and three spectral measurements (namely CPP, Energy below 500 hz and Hammarberg) from the annotations, using the Praat scripts written by Riebold (2013) and Xu (2013). Using the script tremor.praat (Brückl, 2021), we measured vibrato rate (rate of frequency tremor). The praat script only works on segments that are >3 s long, but the duration of all the vowels vocalized with vibrato is < 1 s. To make each vibrato vocalization at least 3 s long, each of the vibrato vowels was sextupled by itself. It is from the sextupled form that we extracted vibrato rate. See Brückl (2021), Riebold (2013), and Xu (2013) for more details on the scripts.

FIGURE 1

Figure 1. Annotations of vowels, tones, tone-sequences, and poetic lines.

In the next section, we discuss the motivation for each of the acoustic measurements that were utilized in this work. Our data and the R code of our statistical analysis are in the Supplementary material that is attached to this article.

4.2. Motivation for acoustic measurements in this work

Nine acoustic parameters were measured for the annotated vowels in order to detect the acoustic correlates of vocal expression. The parameters are fundamental frequency, intensity, cepstral peak prominence, energy below 500 Hz, Hammarberg index, formant 1, formant 2, duration and frequency tremor. The parameters were selected based on the description of Ìjálá and Ẹ̀sà in previous studies. In this section, each of these parameters are described.

4.2.1. Fundamental frequency and intensity

We start the discussion with fundamental frequency (F0) which primarily depends on the vibration rate of the vocal cords. F0(Hz) is measured in hertz (Hz). The perceptual correlate of F0 is pitch (Ladefoged and Johnson, 2015). Pitch contrasts that bring about lexical or grammatical meaning distinctions are called tone (Yip, 2002; Hyman, 2018). As shown in Section 2, Yorùbá has three tones. Given that the vocal expressions in Yorùbá oral poetry involve pitch raising, it is crucial to investigate how pitch contours of speech melodies are mapped to poetic melodies. For this reason, we extracted the F0(Hz) values of the tones in speech and poetic modes. To capture the full-time interval of the pitch contours, we extracted F0(Hz) values at twelve intervals for each tone. Recall that increased loudness is also a property of vocal expression in Yorùbá oral poetry. The perceptual correlate of intensity is loudness, but the relationship is not linear. Consequently, we also measured the intensity of all vowels in speech and poetic modes.

4.2.2. Cepstral peak prominence: CPP (dB)

Cepstral peak prominence (CPP) is the difference between the maximum cepstral peak value occurring within the boundaries of the expected phonational quefrencies and the corresponding value on the regression line fitted on the cepstrum (Hillenbrand et al., 1994; Hillenbrand and Houde, 1996). The degrees of glottal closure and vocal fold tension directly corresponds to the values of CPP(dB) (Kim, 1970; Iverson and Salmons, 1995; Inwald et al., 2011). Given that the glottal closure reduces the level of noise in vocal signal, CPP(dB) measures the level of noise in a vocal signal: the higher the noise, the lower the value of CPP and vice versa. In languages where the degree of glottal opening determines breathiness and aspiration, CPP(dB) is a reliable acoustic parameter for breathy and aspirated sounds (Hillenbrand et al., 1994; Blankenship, 2002; Esposito and Khan, 2012; Khan, 2012; Seyfarth and Garellek, 2018; Berkson, 2019). CPP(dB) was originally developed for measuring breathiness (Hillenbrand et al., 1994), but its usage has been extended to the evaluation of dysphonia. Studies show that the severity of dysphonia correlates with lower CPP values when compared to normal voice (Hillenbrand and Houde, 1996; Heman-Ackah et al., 2002, 2003, 2014; Awan and Roy, 2005; Awan et al., 2009; Fraile and Godino-Llorente, 2014, etc.). As reported in Wolfe and Martin (1997), CPP(dB) values in hoarse and breathy voice are lower when compared to strained voice. As a result of the findings in various studies, the American Speech-Language-Hearing Association (ASHA) recommends CPP(dB) as a tool for “measuring the overall level of noise in the vocal signal” and as “a general measure of dysphonia” (Patel et al., 2018).

The use of CPP(dB) values has also been extended to the evaluation of effortful speech production and emotive speech. For example, increased loudness and sustained vowel production in effortful speech production correlate with higher CPP(dB) values when compared to normal speech production (Rosenthal et al., 2014; McKenna and Stepp, 2018; Phadke et al., 2020). Given that chanting involves increased loudness and high vocal demand, we could understand the vocal features of the poetic modes by measuring CPP(dB) values. Considering that nasality can decrease CPP(dB) values (see Madill et al., 2019), we only extracted CPP(dB) values for oral vowels.

4.2.3. Energy below 500 Hz

Another acoustic parameter which is used in this work is the proportion of spectral Energy below 500 Hz (dB). This measurement is often used for evaluating vocal quality in affective speech (Tolkmitt et al., 1982; Johnstone et al., 2005; Scherer et al., 2017). Low values of the Energy below 500 Hz are associated with the tensioning of vocal cords (Tolkmitt et al., 1982; Scherer et al., 2002, 2017; Johnstone et al., 2005). The values of Energy below 500 Hz (dB) were only extracted for oral vowels.

4.2.4. Hammarberg index (dB)

We extracted the values of Hammarberg index for evaluating vocal expression. The Hammarberg index is defined as the difference between the energy maximum in the 0–2,000 Hz frequency band and in the 2,000–5,000 Hz band (Hammarberg et al., 1980). Studies suggest that increase in loudness and F0(Hz) correlates with lower values of Hammarberg index (Scherer et al., 2017; Hakanpää et al., 2021; Sundberg et al., 2021). As an addition measurement for pitch raising and increased loudness, we measured Hammarberg index for all oral vowels in speech and poetic modes.

4.2.5. Formant frequencies

Formants are defined as “a resonating frequency of the air in the vocal tract” (Ladefoged and Johnson, 2015, p. 315). The first two formants, namely formant 1 (F1) and formant 2 (F2) are important in determining the quality of vowels. Specifically, F1(Hz) is mostly determined by vowel height and F2(Hz) is determined by vowel frontness or backness. The values of F1 increases in loud and effortful speech and verbal arts, but the values of F2 is not consistent under the same condition (Huber et al., 1999; Traunmüller and Eriksson, 2000; Huber and Chandrasekaran, 2007; Koenig and Fuchs, 2019, etc.). To understand the effect of vocal expression on the acoustics of vowels, F1(Hz) and F2(Hz) values were extracted for all the oral vowels.

4.2.6. Vibrato rate (frequency tremor)

We also measured the rate of frequency modulation or tremor. When frequency modulation occurs as a result of alcohol withdrawal syndrome (Koller et al., 1985; Anouti and Koller, 1995), aging (Gregory et al., 2012; Martins et al., 2015), or neurological disorders that cause involuntary movement of muscles in the throat, larynx (voice box), and vocal cords, it is called vocal tremor (Deuschl et al., 1998; Hlavnička et al., 2020). When used intentionally in singing, frequency tremor is called vibrato (Seashore, 1938; Dromey et al., 2003; Nix et al., 2016). The typical values of vibrato rate range from 4 to 7 Hz (Seashore, 1938; Dromey et al., 2003; Nix et al., 2016). In neurological diseases, vocal tremor frequencies are categorized into slow (< 4 Hz), intermediate (4–7 Hz) or rapid (>7 Hz) (Deuschl et al., 1998; Charles et al., 1999). The slow tremor frequencies are prominent in all neurological disorders, but the intermediate and rapid tremor are mostly found in a subset of neurological disorders (Deuschl et al., 1998; Hlavnička et al., 2020).

4.3. Statistical analysis

The linear-mixed effect model was fitted to each acoustic parameters for each vowel and tone, to determine whether speech and poetic modes have a significant effect. In this case, the fixed effect is the modes (i.e., speech, Ìjálá and Ẹ̀sà), and the random effect is each iteration of the poems in all modes. For the tones, the random effect is the tone-bearing unit, in this case the vowels. This was done using the package “lme4” in R (Bates et al., 2014). We ran post-hoc pairwise comparisons for the mixed effect model using the package “emmeans” (Lenth and Lenth, 2018).

To calculate the correspondence between pitch trajectories of speech and poetic modes, we used Pearson correlation coefficient R which measures the strength and direction of a linear relationship between two variables. The value of R is always between +1 and −1. The closer the value of R is to +1, the stronger the positive relationship between the two variables. However, the closer the value of R is to −1, the stronger the negative relationship between the two variables. If the value of R is 0, it means there is no relationship between the two variables (see Rumsey, 2009, for a basic description this statistical measurement). A regression line in a scatter plot describes the strength and direction of the linear relationship between the variables under consideration.

The null hypothesis is that there is no difference between speech and poetic modes for all the acoustic parameters. If the p ≤ 0.05, there is a statistically significant effect of speech or poetic modes for the acoustic parameters. Therefore, we have a strong evidence against the null hypothesis. A p > 0.05 indicates weak evidence against null hypothesis (Rumsey, 2009; Wasserstein and Lazar, 2016). In the next section, we present the results of the acoustic analysis.

5. Results

We discuss the phonological attribute of vocal expression, before turning to the results of the acoustic analysis. In Ẹ̀sà, the triply long vowel [ooo] is inserted at the begining of the first word in line 1 even though the text does not have such vowel. If we recall that this is a recurrent attribute of Ẹ̀sà, we can say the long vowel is an attribute of vocal expression in Ẹ̀sà.

We now turn to the results of the acoustic analysis. One syllable in each of the first three lines were lengthened and vocalized with vibrato in the Ìjálá mode, but at no point was Ẹ̀sà vocalized with vibrato. All the syllables targeted for vibrato (except for one) are in the range of the last and penultimate word in each poetic line. The syllable that were consistently targeted for vibrato in each iteration of the poem contains the sequence [ba], but the syllables with the vowel [ɪ, ɛ] were variably vocalized with vibrato. Considering that the other vowels were not consistently targeted for vibrato, we only measured the duration of the vowel that was consistently targeted for vibrato, as shown in Figure 2A. The vowel targeted for vibrato in the Ìjálá mode is significantly longer than the corresponding vowel in Ẹ̀sà (p < 0.001), which in turn is longer than the corresponding vowel in speech mode. However, the distinction between Ẹ̀sà and speech modes for the vibrato [a] is not statistically significant.

FIGURE 2

Figure 2. (A) Vibrato [a] in Ìjálá and the corresponding non-vibrato [a] in speech and Ẹ̀sà modes; (B) Vibrato rate(Hz) of the relevant vowels in Ìjálá mode.

All the vowels that were produced with vibrato in Ìjálá have a vibrato rate in the range 1.6–2 Hz, as shown in Figure 2B. However, the vowels [ɛ] and [a] have the vibrato rate of 4.45 and 6.5 (Hz) respectively in one of their repetitions. Thus, the variation cannot be attributed to vowel-type. Considering that the vibrato rate of 4.45 (Hz) and 6.5 (Hz) are only found in two tokens, they are considered outliers.

Compared to Ìjálá, F0 values at 25%, 50%, and 75% intervals are higher in Ẹ̀sà for the three tones, as shown in Figure 3. Also at every interval, the values of F0(Hz) for each tone are higher in the poetic modes than the speech mode. The vocalization modes have a significant effect on the values of F0(Hz) for speech, Ìjálá and Ẹ̀sà comparisons (p < 0.001). This shows that the vocal expression targets all the tones in the language. The results also show that the three tones in the language have distinct F0(Hz) values regardless of the vocalization mode.

FIGURE 3

Figure 3. F0(Hz) values of H, L, and M tones in speech and poetic modes at 25%, 50%, and 75% intervals.

The pitch trajectory of the tone sequences in each poetic line is compared in Figure 4. In the figure, the y-axis contains the acoustic measurement of pitch contour in F0(Hz), and the x-axis contains the proportional time of tone sequences. The dark line is for the pitch contours of Ẹ̀sà, the dark gray for Ìjálá and the light gray line is for the pitch contour of speech. There are four panels in the graph, where each panel is for the sequence of tones in each poetic line.

FIGURE 4

Figure 4. Pitch contours of each line in speech and poetic modes.

As shown in Figure 4, the values of F0(Hz) are higher in poetic modes when compared to the speech mode. The values of F0(Hz) are higher in Ẹ̀sà than Ìjálá. Figure 4 also shows that the pitch trajectory of each poetic line in speech mode is similar to that of the corresponding poetic line in Ẹ̀sà and Ìjálá.

To check the degree of similarity between the pitch contours of speech and poetic modes, we applied Pearson correlation coefficient to the pitch contour. To investigate whether linear relationship between speech and poetic modes varies based on poetic lines, the correlation coefficient are applied to each of the four poetic lines for every speech, Ìjálá and Ẹ̀sà comparisons. The results are shown in Figure 5.

FIGURE 5

Figure 5. Correlation between the pitch trajectories of each line in speech and poetic modes: (A) Ìjálá vs. speech; (B) Ẹ̀sa vs. speech; (C) Ẹ̀sa vs. Ìjálá.

Figure 5 shows that there are statistically significant positive correlations between the trajectories of speech and poetic melodies (p < 0.001), but the degree of correlation varies by poetic lines and genres. Comparing the lines in Figures 5A,B, we see that the correlation between the pitch contours of Ẹ̀sà and speech is higher when compared to the correlation between the pitch contours of Ìjálá and speech. Figure 5 also shows that the pitch contours of Ìjálá and Ẹ̀sà are closer than they are to the pitch contours of speech.

Similar to the F0(Hz) values, the intensity of oral vowels is significantly higher in poetic modes than in speech mode (p < 0.001), as shown in Figure 6. However, the distinction between the intensity Ẹ̀sà and Ìjálá varies depending on vowel-type. As shown in Figure 6, the intensity is higher in Ẹ̀sà than Ìjálá for all vowels, except for the vowel [u]. The difference between the intensity of Ẹ̀sà and Ìjálá is only significant (p ≤ 0.004) for the vowels [o, ɔ, a]. The values of the Energy below 500 Hz(dB) are significantly higher in speech than poetic modes for all vowels (p ≤ 0.004), except the vowels [e, o, ɔ]. The distinction between the Energy below 500 Hz(dB) of Ẹ̀sà and Ìjálá is only significant for [o, ɔ]. We now turn to the results of CPP(dB) and Hammarberg index(dB), which are presented Figure 7.

FIGURE 6

Figure 6. Intensity and energy below 500(Hz) of oral vowels in poetic and speech modes.

FIGURE 7

Figure 7. CPP(dB) and Hammarberg index(dB) of oral vowels in poetic and speech modes.

The results of the statistical analysis indicate that the mean values of CPP(dB) are significantly higher in poetic modes when compared to speech mode (p < 0.001). However, the difference between the CPP(dB) values of Ẹ̀sà and Ìjálá is not significant. The values of Hammarberg index are significantly lower in poetic modes than speech mode (p < 0.001). The distinction between the Hammarberg index(dB) values of Ẹ̀sà and Ìjálá is only statistically significant for the vowels [i, u, ɔ] (p ≤ 0.013). The results of the formant values are presented in Figure 8.

FIGURE 8

Figure 8. Formant plots of oral vowels in poetic and speech modes.

There is an effect of vocal expression on vowel formants. As shown in Figure 8, the values of F1 for all oral vowels are higher in poetic modes than speech mode. For the values of F1(Hz), the distinction between poetic modes and speech mode is significant for all vowels (p ≤ 0.019), except the vowel [u]. The graph in Figure 8 also shows that, for all vowel except [u], the values of F1 are slightly higher in Ẹ̀sà than Ìjálá. However, the F1 distinction between Ẹ̀sà than Ìjálá is only significant (p ≤ 0.026) for the vowels [o, ɔ, a]. We now turn the values of F2(Hz). There is no obvious distinction between the values of F2(Hz) for speech, Ẹ̀sà and Ìjálá, except for the vowels [e, ɛ] that have lower F2(Hz) in poetic modes. Even in this case, the distinction is only significant for the vowel [ɛ].

In the next section, we discuss the results and their implications for the analysis of vocal expression in Ẹ̀sà and Ìjálá.

6. Discussion and conclusion

We set out in this work to understand the acoustic correlates of vocal expression in Ìjálá and Ẹ̀sà, under an experimental condition. The results of our investigation show that Hammarberg index(dB), Energy below 500 Hz(dB) and F1(Hz) distinguish the speech mode from each of the poetic modes for some vowels but are not as reliable as vibrato, F0(Hz), CPP(dB) and intensity(dB) which distinguish the poetic modes from each of the poetic modes for all vowels. For Ẹ̀sà vs. Ìjálá, Ẹ̀sà vs. speech and Ìjálá vs. speech comparisons, The most reliable acoustic parameters are vibrato and F0(Hz), given that vibrato only features in Ìjálá and that there is a significant effect of vocalization modes on F0(Hz), regardless of tone and poetic line. The results also show that there is a correspondence between the pitch trajectories of speech tones and poetic tunes, but the degree of correspondence varies by poetic lines and genres. Another distinctive feature that distinguishes Ẹ̀sà from Ìjálá is the epenthesis and lengthening of the vowel [o] in the first poetic line.

The acoustic correlates of vocal expressions in Ẹ̀sà and Ìjálá are consistent with increased vocal effort, given that higher values of F1(Hz), intensity and CPP are associated with increased vocal effort (e.g., Jessen et al., 2005; Rosenthal et al., 2014; McKenna and Stepp, 2018). The lower values of Energy below 500 Hz in poetic modes are also consistent with vocal tensing found in effortful speech. An increased vocal effort is expected as a feature of both genres, considering that vocal performance in a large space requires high vocal effort (Sundberg, 1977; Beechey et al., 2018) and that Ìjálá and Ẹ̀sà are typically performed to a large audience in an open space (Babalọla, 1963; Adedeji, 1978; Yai, 1989). It is probably vocal effort that previous research mischaracterised as stress.

The range of vibrato rate (1.6–2 Hz) in Ìjálá is atypical of the vibrato rate (4–7 Hz) in singing but consistent with vocal tremor as the historical origin of vocal expression in Ìjálá. Although the range of the vibrato rate reported in this work is prominent in all neurological diseases, it is difficult to tell whether the vibrato in Ìjálá historically developed from the vocal symptoms of alcohol withdrawal, aging or both. The pitch-height distinction between Ìjálá and Ẹ̀sà cannot be attributed to vibrato considering that the vibrato and non-vibrato sections of Ìjálá have lower pitch height than Ẹ̀sà, as shown in Figure 4.

Another notable finding of this study is tone-tune mapping. Studies on tone-tune mapping indicate that song melodies in a tone language are not determined by language, but music can accommodate language when it is musically feasible (Ho, 2006; Schellenberg, 2009, 2013; McPherson and Ryan, 2018). The results of our study is in line with the findings of studies on tone-tune mapping in singing, given that the correspondence relations between the pitch contours of speech tones and poetic tunes varies based on genres and poetic lines. As shown in the results of the correlation coeffieccient in Figure 5, the tune of Ẹ̀sà is closer to speech-tone melody than the tune of Ìjálá. This indicates that Ẹ̀sà is closer to speech than Ìjálá, in terms of ton-tune mapping. It remains to be seen whether this makes the chants of Ẹ̀sà more intelligible than Ìjálá.

Studies on affective use of vocal expression find that pitch raising and increased loudness are the most reliable cues for high level of arousal, such a excitement, fear and anger (Banse and Scherer, 1996; Juslin and Laukka, 2003; Johnstone et al., 2005; Goudbeek and Scherer, 2010; Lindquist et al., 2016; Scherer, 2021). It remains to be seen whether the pitch raising and increased loudness in Ìjálá and Ẹ̀sà are also associated with high level arousal such as excitement and happiness. Considering that Ìrèmọ̀jé is a dirge with similar vocal expressions as Ìjálá, future research involving more participants should compare the acoustic cues of vocal expression in Ìjálá and Ìrèmọ̀jé.

The major limitation of this work is that it is based on data from one participant. As a result of this, we cannot tell whether the acoustic correlates of vocal expression in this work are specific to the single participant or applies to other Yorùbá chanters. Thus, future research should replicate the present study on a larger population of Yorùbá chanters. Another limitation of this research is that we did not specifically look at the effect of vibrato on each lexical tone. To the best of our knowledge, the effect of vibrato on tone has not been studied, future research on singing or chanting in a tone language should investigate the interaction of tone and vibrato.

In sum, our study supports the observation in previous studies that vocal expressions, such as pitch raising, vowel epenthesis and lengthening, distinguish Ìjálá, Ẹ̀sà and speech. Contrary to the previous impressionistic observations, increased loudness as vocal expression does not distinguish Ìjálá from Èsạ̀ but the poetic modes from speech. In addition, we have shown that the vocal expression in Yorùbá oral poetry might be attributed to high vocal effort. Our analysis of tone-tune mapping in the poetic modes indicates that the poetic tunes correspond to the melody of speech tones, but the degree of correspondence varies based on poetic lines and genres. In addition to the analytical importance, the present study also supports vocal tremor as the historical origin of vocal expression in Ìjálá. It is important to note that the properties of vocal expression reported in this work were not possible to capture through older impressionistic observation methods. Given that properties of vocal expression in oral poetry are better captured in phonetic terms, we strongly recommend the instruments of phonetic science as valuable tools for the study of African verbal arts.

Author's note

This research was carried out when the first author was a student working under an SSHRC insight grant (435-2016-0369) awarded to Douglas Pulleyblank at the University of British Columbia. For comments and suggestions on various aspects of this work, we thank Douglas Pulleyblank, Rose-Marie Dechaine, Adélékè Adéèkó, Tolu Odebunmi, Akosua Addo, and the reviewers. Errors of fact or explanation are our own responsibility.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

Author contributions

SKA designed the experiment, analyzed the data, write up the background, results, and discussion. OS collected the data, analyzed the data, and edited the manuscript. IBA set up the stimulus and write up the background. OA collected the data and edited the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcomm.2022.1029400/full#supplementary-material

Footnotes

1. ^An audio of the Ìjálá poem by Fọ́yánmu can be found in this link: https://www.youtube.com/watch?v=KzTBR7VJknQ.

References

Adedeji, J. (1978). The poetry of the Yoruba masque theatre. Afr. Arts 11, 62–100. doi: 10.2307/3335415

CrossRef Full Text | Google Scholar

Adeniran, T. (1974). The dynamics of peasant revolt: A conceptual analysis of the Agbekoya Parapo uprising in the western state of Nigeria. J. Black Stud. 4, 363–375. doi: 10.1177/002193477400400401

CrossRef Full Text | Google Scholar

Ajibade, G. O. (2007). New wine in old cups: Postcolonial performance of Christian music in Yorùbá land. Stud. World Christian. 13, 105–126. doi: 10.3366/swc.2007.13.2.105

CrossRef Full Text | Google Scholar

Ajuwon, B. (1977). Funeral Dirges of Yoruba Hunters (Ph.D. Thesis). Indiana University.

An acoustic study of vocal expression in two genres of Yoruba oral poetry

1. Introduction

2. Background on Yorùbá sound inventory

2.1. Tone in speech

2.2. Vowels

3. Basic description of Ìjálá and Ẹ̀sà

4. Methodology

4.1. Stimuli, participant and procedure

4.2. Motivation for acoustic measurements in this work

4.2.1. Fundamental frequency and intensity

4.2.2. Cepstral peak prominence: CPP (dB)

4.2.3. Energy below 500 Hz

4.2.4. Hammarberg index (dB)

4.2.5. Formant frequencies

4.2.6. Vibrato rate (frequency tremor)

4.3. Statistical analysis

5. Results

6. Discussion and conclusion

Author's note

Data availability statement

Ethics statement

Author contributions

Conflict of interest

Publisher's note

Supplementary material

Footnotes

References

95% of researchers rate our articles as excellent or good