Speech vs. singing: infants choose happier sounds

Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

doi:10.3389/fpsyg.2013.00372

ORIGINAL RESEARCH article

Front. Psychol., 26 June 2013

Sec. Emotion Science

Volume 4 - 2013 | https://doi.org/10.3389/fpsyg.2013.00372

This article is part of the Research Topic Expression of emotion in music and vocal communication View all 30 articles

Speech vs. singing: infants choose happier sounds

Marieve Corbeil¹*

Sandra E. Trehub^1,2

Isabelle Peretz¹*

¹International Laboratory for Brain, Music and Sound Research, Department of Psychology, Université de Montréal, Montréal, QC, Canada
²Music Development Laboratory, Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada

Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age.

Introduction

There is considerable debate about similarities and differences in the processing of language and music (e.g., Pinker, 1997; Patel, 2008; Jackendoff, 2009; Peretz, 2009). Because the greatest differences arise from the presence of propositional meaning in language but not in music, comparisons in the early pre-verbal period are of particular interest (Trehub et al., 1993; Chen-Hafteck, 1997; McMullen and Saffran, 2004; Brandt et al., 2012), notably when both modes of parental communication are used to regulate infant attention and affect (Fernald, 1992; Papoušek, 1994; Kitamura and Burnham, 2003; Trehub et al., 2010). To date, however, the only study comparing young infants' behavioral responsiveness to speech and singing (Nakata and Trehub, 2004) used audiovisual stimuli, obscuring the relative contributions of auditory and visual expressiveness to infants' greater engagement with maternal music. Another study found no difference in newborns' neural responses to happy-sounding speech and singing (Sambeth et al., 2008). The present investigation examined infants' attentiveness to speech and singing on the basis of auditory cues alone.

Whereas verbal aspects of speech convey propositional meaning, non-verbal or prosodic aspects such as intonation and rhythm convey the speaker's affective intent and emotional state (Frick, 1985). Mothers across cultures speak and sing to their pre-verbal infants in the course of providing care (Fernald, 1992; Trehub and Trainor, 1998; Dissanayake, 2000; Trehub, 2000). Their manner of speaking or singing to infants (infant-directed or ID) differs dramatically from their manner in other contexts (adult-directed or AD; self-directed or non-ID)(Ferguson, 1964; Jacobson et al., 1983; Fernald and Simon, 1984; Trainor et al., 1997; Trehub et al., 1997a,b), with notable variations across cultures (Grieser and Kuhl, 1988; Fernald et al., 1989; Kitamura et al., 2002). In general, ID speech features higher pitch, expanded pitch contours, slower speaking rate, longer vowels, larger dynamic range, and greater rhythmicity and repetition than AD speech (Stern et al., 1982, 1983; Fernald and Simon, 1984; Fernald et al., 1989). These features, especially high pitch, expanded pitch contours, rhythmicity, repetition, and reduced speaking rate, make ID speech sound much more musical than AD speech (Fernald, 1989, 1992). High pitch, expanded pitch contours, and large dynamic range also reflect the heightened affective quality of typical ID speech, which contrasts with the affective restraint of typical AD speech (Trainor et al., 2000). Nevertheless, ID speech is finely tuned to the infant's age and needs, with mothers using relatively more comforting speech for 3-month-olds, more approving speech for 6-month-olds, and more directive speech for 9-month-olds (Kitamura and Burnham, 2003). Approving speech, with its higher pitch and greater pitch range, receives higher ratings of positive affect by adult listeners (Kitamura and Lam, 2009).

Unlike speech, singing is constrained by the prescribed pitch and rhythmic form of the material (i.e., specific songs). Nevertheless, ID versions of singing are also characterized by higher pitch and slower tempo than non-ID versions of the same songs by the same singers (Trainor et al., 1997; Trehub et al., 1997a,b). While repetition is an important aspect of ID speech, it is central to music in general (Kivy, 1993; Trainor and Zatorre, 2008) and to songs for young children in particular (Trehub and Trainor, 1998).

The available evidence indicates that infants find ID singing more engaging than non-ID singing (Trainor, 1996; Masataka, 1999) just as they find ID speech more engaging than AD speech (Fernald, 1985; Werker and McLeod, 1989; Pegg et al., 1992). One possible source of infants' enhanced engagement is the heightened positive expressiveness of typical ID speech and singing (Trainor et al., 2000; Trehub et al., 2010; Nakata and Trehub, 2011). In fact, infants exhibit preferential listening to speech that sounds happy rather than sad or inexpressive regardless of the intended audience (Kitamura and Burnham, 1998; Singh et al., 2002). For example, infants listen longer to happy AD speech than to affectively neutral ID speech even when the latter is higher in pitch (Singh et al., 2002). Note, however, that happy ID vocalizations are closer to AD vocalizations described as high-arousal joy/happiness or elation than to low-arousal joy/happiness (Banse and Scherer, 1996; Bänziger and Scherer, 2005). Infants also exhibit more positive affect to ID expressions of approval than to disapproval or prohibition even when the utterances are low-pass filtered (Papoušek et al., 1990) or presented in an unfamiliar language (Fernald, 1993). The general consensus is that positive vocal emotion, especially the high arousal variety, makes a substantial contribution to infants' interest in ID speech. Nevertheless, one cannot rule out alternative explanations such as the attention-getting potential of expanded pitch and dynamic range and the attention-holding potential of repetition. When these acoustic factors are controlled, however, infants exhibit preferences for the happier speech version (Kitamura and Burnham, 1998; Singh et al., 2002), suggesting that these acoustic features make secondary contributions to infant preferences. Infants' interest is also affected by their age and corresponding needs. For example, 3-month-old infants exhibit greater attention to comforting than to approving ID speech (Kitamura and Lam, 2009).

The influence of ID pitch contours is seen in infants' preferential listening for sine-wave replicas of ID speech that preserve the pitch contours (and timing) with uniform amplitude over those that preserve the timing and amplitude with unvarying pitch (Fernald and Kuhl, 1987). Despite the fact that infants display greater positive affect to approving than to disapproving ID utterances, they listener longer to the former only if they exhibit greater F0 modulation (Fernald, 1993). Interestingly, pitch modulation also makes important contributions to the differentiation of emotions in music and in AD speech (Scherer, 1986, 1995; Laukka et al., 2005). Across cultures, happy-sounding speech and music feature high mean pitch, large pitch variability, relatively high mean amplitude, and rapid rate or tempo (Juslin and Laukka, 2003). Smiling elevates pitch and increases amplitude by altering the mouth opening and shape of the vocal tract, contributing to the vocal qualities associated with happiness (Tartter, 1980). Tender speech and music, by contrast, have lower mean pitch, pitch variability, mean amplitude, and slower rate or tempo than happy speech and music (Juslin and Laukka, 2003).

Perhaps the two classes of songs for infants, lullabies and play songs, are caregivers' expressions of tenderness and happiness, respectively, as well as tools for soothing or amusing infants. In line with their soothing function, lullabies feature very slow tempo, low pitch, falling pitch contours, limited amplitude variation, and soothing tone of voice (Unyk et al., 1992; Trehub et al., 1993; Trehub and Trainor, 1998), properties that are shared with soothing ID speech (Papoušek and Papoušek, 1981; Fernald, 1989). Lullabies are also soothing to adult listeners, so it is not surprising that they are used, at times, as laments (Trehub and Prince, 2010) and in palliative care (O'Callaghan, 2008). Although play songs are commonly sung to Western infants, they are not universal, as lullabies are (Trehub and Trainor, 1998).

Maternal speech melodies are considered central to the expression of maternal affect and the regulation of infant attention and arousal (Fernald, 1992; Papoušek, 1994). Is it possible that musical melodies would be equally effective or even more effective in regulating infant attention and arousal? The melodies or pitch contours of expressive speech differ from those in music (Zatorre and Baum, 2012). In music, pitches are discrete and sustained, and steps from one pitch level to another are generally small, most commonly, one or two semitones, with larger pitch jumps being much less frequent (Vos and Troost, 1989). By contrast, pitches in speech glide continuously over a larger range (Patel et al., 1998), which is even larger in ID speech (Ferguson, 1964; Stern et al., 1982, 1983; Fernald and Simon, 1984). Moreover, pitches have precise targets in music but not in speech (Zatorre and Baum, 2012).

If the expanded pitch and dynamic range of ID speech underlies infants' greater attention to ID than to AD speech (e.g., Fernald, 1993), then infants could show more interest in ID speech than ID singing. If rhythmicity and predictability are relevant (e.g., McRoberts et al., 2009), then infants might exhibit more attention to ID singing than to ID speech. If positive emotion is the critical feature (Kitamura and Burnham, 1998; Singh et al., 2002), then infants could show greater interest in the stimulus expressing more positive affect regardless of whether it is speech or music. For adults, music generates a range of positive emotions from tranquillity and tenderness to joy and euphoria (Blood and Zatorre, 2001; Menon and Levitin, 2005; Zentner et al., 2008; Salimpoor et al., 2011). Some scholars contend that the expression of emotion by some form of music (e.g., protomusic) preceded language (Darwin, 1871; Mithen, 2005). Others regard speech, even at present, as a type of music, especially when considered in developmental perspective (Brandt et al., 2012). If the status of speech is privileged, as some contend (Vouloumanos and Werker, 2004, 2007; Shultz and Vouloumanos, 2010; Vouloumanos et al., 2010), then ID speech would be favored over forms of singing that exclude speech. Obviously, the aforementioned factors are not independent. Nevertheless, comparisons of infants' responsiveness to speech and music are a first step toward the long-range goal of identifying the acoustic features that attract and hold infants' attention. Such features may differ for infants of different ages, as reflected in age-related changes in listening biases for ID speech with comforting, approving, or directive tones of voice (Kitamura and Lam, 2009) and for regular or slowed ID speech (Panneton et al., 2006).

It is difficult to assess infants' degree of engagement with music and even more difficult to ascertain their aesthetic preferences. Instead of overt affective responses to music, infants commonly exhibit interest or attention, sometimes accompanied by reduced motor activity (Nakata and Trehub, 2004). The usual assumption is that longer listening to one of two auditory stimuli reflects preference or greater liking for that stimulus (e.g., Fernald and Kuhl, 1987; Trainor, 1996; Vouloumanos and Werker, 2004). In general, such “preferences” are assessed with the head-turn preference procedure, which is used with infants as young as 2 or 3 months of age (e.g., Trainor et al., 2002; Shultz and Vouloumanos, 2010). The procedure involves pairing one auditory stimulus with a visual display and a contrasting auditory stimulus with the same visual display, at the same or different locations, on a series of trials. Infants control the procedure in the sense that looking away from the visual stimulus terminates the auditory stimulus. In other words, they can choose to listen to one stimulus longer than another. The interpretation of longer or shorter listening times as positive or negative aesthetic evaluations is questionable in the absence of positive or negative affective displays (Trehub, 2012). At times, infants listen longer to familiar stimuli and, at other times, to novel stimuli (e.g., Rose et al., 1982; Volkova et al., 2006; Soley and Hannon, 2010). Even when infants show positive affect to one auditory stimulus and negative or neutral affect to another, their listening times to the stimuli may not differ (Fernald, 1993). Unquestionably, looking or listening times indicate infants' listening choice or relative attention to the stimuli, but the factors that contribute to such attention are unclear. Some listening biases may be innate, arising from the salience of biologically significant stimuli (e.g., human vocal sounds) or biologically significant parameters of sound (e.g., loud or unexpected). Other listening biases may arise from acquired salience, as in preferential responding to the sound of one's name (Mandel et al., 1995) or to a stimulus heard previously (Zajonc, 2001). Attention biases, regardless of their origin, are likely to facilitate learning (Vouloumanos and Werker, 2004).

In addition to the well-documented listening bias for ID over AD speech, there are reported biases for vocal over non-vocal sounds (Colombo and Bundy, 1981; Vouloumanos and Werker, 2004, 2007), speech over non-human vocalizations, (Vouloumanos et al., 2010), speech over human non-speech vocalizations (Shultz and Vouloumanos, 2010), musical consonance over dissonance (Trainor and Heinmiller, 1998; Zentner and Kagan, 1998), and familiar over unfamiliar musical meters (Soley and Hannon, 2010). Infants also exhibit considerable interest in vocal music (Glenn et al., 1981), but their exposure to music is much more limited than their exposure to speech (Eckerdal and Merker, 2009). To date, however, there has been little exploration of infants' relative interest in speech and singing. In the single study that addressed this question directly (Nakata and Trehub, 2004), 6-month-olds infants watched audio-visual recordings of their mother singing or speaking from an earlier interaction. Infants showed more intense and more sustained interest in singing than in speech episodes, as reflected in greater visual fixation coupled with reduced body movement. Infants' heightened interest in these maternal singing episodes could stem from mothers' propensity to smile more when singing than when talking to infants (Plantinga et al., 2011). In the present study, we used the head-turn preference procedure to assess infants' interest in speech and singing with unfamiliar materials and voices. As noted above, the procedure provides information about infants' listening choices or relative attention rather than their aesthetic preferences.

In line with age-related changes in infants' attention to the affective tone of ID speech (Kitamura and Lam, 2009), developmental changes might be evident in infants' responsiveness to ID speech and song. Accordingly, infants in the present research, who were 4–13 months of age, were divided into three age groups to explore the possibility of comparable age-related changes. In Experiment 1, infants were exposed to ID or happy-sounding speech syllables and soothing hummed lullabies produced by the same woman. The principal question concerned the relative efficacy of soothing hummed song and happy ID speech for attracting and maintaining infants' attention. In other words, is vocal music compelling for infants, as it is for adults, even in the absence of speech or properties associated with heightened arousal? If infants listened longer to hummed lullabies than to simple ID speech, it would challenge the prevailing view that infants have an innate or early developing preference for speech over any other auditory stimulus (Vouloumanos and Werker, 2004, 2007; Shultz and Vouloumanos, 2010; Vouloumanos et al., 2010). Experiments 2 and 3 narrowed the differences between speech and singing stimuli by comparing the same verbal materials that were spoken or sung with comparable or contrasting affective intentions. Specifically, infants in Experiment 2 heard sung vs. spoken renditions of the lyrics of a Turkish children's song, both in an ID/joyful manner. Infants in Experiment 3 heard the ID children's song vs. a spoken version of the lyrics in an AD or affectively neutral manner.

All of the stimuli in the present study were portrayed or acted rather than being recorded during actual interactions with infants and adults. Early research on infants' responsiveness to ID and AD speech (e.g., Fernald, 1985) used recordings of women's interactions with their infant and with an adult experimenter. Such stimuli differed dramatically in content as well as expressiveness, making it difficult to identify the factors contributing to infants' responsiveness. Later research used portrayals of ID and AD speech (e.g., Singh et al., 2002; Kitamura and Lam, 2009) so that the content could be carefully controlled across speech registers. When studying infants' responsiveness to ID and non-ID singing (e.g., Trainor, 1996; Masataka, 1999), it is possible to use recordings of mothers singing the same song in the presence or absence of their infant. Comparisons of natural ID speech and singing (e.g., Nakata and Trehub, 2004), however, necessarily differ in content as well as form. Because the features of ID speech and singing have been described extensively (e.g., Ferguson, 1964; Trainor et al., 1997), it is possible to create relatively natural portrayals of those stimuli. For practical as well as ethical reasons, most of the research on vocal emotion (e.g., Scherer, 1986, 1995; Juslin and Laukka, 2003) has used portrayals of various emotions rather than emotional expressions produced in natural contexts.

Experiment 1

The goal of the present experiment was to examine the possibility that infants might be more responsive to vocal music than to happy ID speech even for vocal music lacking the acoustic features (e.g., highly variable pitch and dynamics) and expressive intentions (high-arousal happiness) that have been linked to infant preferences for ID speech (e.g., Fernald, 1985; Singh et al., 2002). By using hummed songs, it was possible to generate vocal music without speech. Humming, usually with closed mouth, can be used to generate melodies with sustained nasal sounds that have low spectral amplitude (Kent et al., 2002). Because humming constrains amplitude modulation, it provides reduced scope for expressing high-arousal emotions. There are speculations, however, that humming played an important role in early hominid evolution, functioning like contact calls in other species (Jordania, 2010). At present, humming may be the most common type of informal, solitary singing.

We considered lullabies the musical genre of choice because of their suitability for humming, their universal use in caregiving (Trehub and Trainor, 1998), and their stark contrast with happy ID speech in acoustic features and affective intentions. As noted, lullabies transmit positive affective qualities such as tranquillity and tenderness both in their musical features and vocal tone. The ID speech stimuli approximated those used in previous research on infants' listening biases for speech (Vouloumanos and Werker, 2004, 2007). They consisted of nonsense syllables with typical exaggerated pitch contours and happy voice quality. For adults, it is likely that the lullabies, although unfamiliar, would have high aesthetic appeal, while the repetitive, high-pitched nonsense syllables would sound boring or worse. Nevertheless, the speech combined the exaggerated pitch contours and joyful expressiveness that have been linked to infant preferences in contemporary urban cultures (Fernald and Kuhl, 1987; Kitamura and Burnham, 1998; Singh et al., 2002). If infants share adults' aesthetic appraisals or favor universal forms, they would listen longer to the hummed versions of traditional lullabies. On the basis of previous research with Western infants, however, one might expect them to listen longer to the arousing and joyfully rendered speech.

Method

Participants

The sample consisted of 50 healthy, full-term infants who were 4.3–13.1 months of age (M = 8.6 months, SD = 2.6) divided into 3 age groups: 4–6 months (M = 5.5, SD = 0.48; n = 16), 7–9 months (M = 8.6, SD = 0.87; n = 16) and 10–13 months (M = 11.5, SD = 0.74; n = 18). No infant had a family history of hearing loss or personal history of ear infections, and all were free of colds or ear infections on the day of testing. An additional five infants failed to complete the test session because of fussiness. This experiment and others in this report were approved by the Arts and Sciences ethics committee of the University of Montreal, and written informed consent was obtained from all participating parents.

Stimuli

The speech stimulus, which was comparable to that used by Vouloumanos and Werker (2004) except for a different speaker, consisted of 12 variations of each of two nonsense syllables (lif and neem) spoken with ID prosody. Varied repetitions of each syllable had rising, falling, and rising-falling (i.e., bell-shaped) pitch contours. There were two versions of the syllabic sequence, differing only in the order of elements. Each sequence consisted of a semi-random ordering of syllables, with the constraint that any four consecutive syllables contained two instances each of lif and neem. Syllables were separated by silent inter-stimulus intervals (ISIs) of 300–500 ms, and the order of ISIs was randomly distributed, with a mean of 450 ms, as in Vouloumanos and Werker (2004). Each sequence was approximately 20 s in duration, and was repeated for an overall duration of 40 s. The music stimulus consisted of a hummed version of a lullaby. There were two traditional lullabies, one Chilean (in duple meter, AA form) and one German (in triple meter, AB form), each approximately 40 s in duration and each assigned to half of the infants. Hummed and spoken stimuli were produced by a native speaker of English who had considerable music training, singing experience, and experience with children. She was instructed to produce the nonsense syllables in a lively ID manner and to hum the melodies as if lulling an infant to sleep. She listened to many samples of ID speech and singing beforehand (including the Vouloumanos and Werker syllables) and used pictures of infants to help induce the appropriate mood for her speaking or lulling. Sample stimuli are presented in Supplementary Materials.

Acoustic features of the stimuli, which were measured with Praat software (Boersma and Weenink, 2010), are shown in Table 1. Because pitch extraction software is prone to octave errors, it is common to manually specify a minimum and maximum fundamental frequency (F0 in Hz) or to use a formula for setting the F0 range of each sound such as that suggested by De Looze and Hirst (2008): floor = q25 × 0.75; ceiling = q75 × 1.5. We used this formula for acoustic analyses in the present study. Mean F0 was higher for singing (M = 280.2 Hz) than for speech (M = 244.2 Hz, difference of 2.46 semitones), but speech was more variable in F0, amplitude, and timing. The standard deviation (SD) of F0, a measure of pitch variability, was 3.81 and 3.40 semitones for speech and singing, respectively. As can be seen in Figure 1, which depicts the F0 contours, changes in pitch were larger and more abrupt for the speech than for the humming stimuli. Amplitude variation (SD), measured in the voiced portions of each sound, was 9.31 dB for speech and 4.46 dB for singing. The timing of the syllables was varied deliberately as in Vouloumanos and Werker (2004).

TABLE 1

Table 1. Acoustic features of stimuli.

FIGURE 1

Figure 1. Fundamental frequency (F0) contours of 5-s excerpts from each sound type. (A) hummed lullaby (Chilean) and syllable sequence, (B) ID sung and spoken lyrics of Turkish play song, (C) ID sung and AD spoken lyrics of Turkish play song.

Apparatus

Testing was conducted in a sound-attenuating booth (IAC) 4 by 4 m in size. Infants were seated on their parent's lap facing a central computer monitor at a distance of 127 cm, with two identical monitors to the right and left side of the central monitor and at a distance of 152 cm from infants. Parents wore earphones (ER-4 MicroPro with reusable ER-4S eartips) with an approximate attenuation of 35 dB and earmuffs (Peltor H10A, Optime 105) with an approximate attenuation of 30 dB. They heard continuous music through the headphones to mask the sounds presented to infants. The walls and table for the monitors were covered with black cloth to reduce visual distraction and optimize attention to the target stimuli. A camera immediately above the central screen provided a continuous record of infant visual behavior on a monitor outside the booth. Two loudspeakers (Genelec 8040A) located behind the lateral monitors transmitted the sounds at a comfortable listening level, approximately 60–65 dB (A). The procedure was controlled by customized software on a computer (Mac Pro 8 cores) located outside the booth.

Procedure

The head-turn preference procedure (Kelmer Nelson et al., 1995) was used. Infants remained seated on their parent's lap throughout the procedure, and parents were asked to minimize their own movement. Infants were randomly assigned to one of the two speech sequences and one of the two hummed lullabies. The speech and singing stimuli were presented on 10 alternating trials, with order of stimuli (speech or singing first) and side of presentation (left or right) counterbalanced across infants. On each trial, the infant's attention was attracted to one monitor by a flashing red square. As soon as the infant looked at that monitor, one sound stimulus was presented together with a visual animation of a carousel. When the infant looked away from the monitor for more than 2 s, the visual and sound stimuli were terminated. The infant's attention was then attracted to the other monitor. Looking at that monitor initiated the same visual stimulus but the contrasting auditory stimulus, which continued until the infant looked away for 2 s. On each trial, the stimulus was always presented from the beginning (i.e., beginning of the lullaby or syllable sequence). The experimenter outside the booth, who had no access to sound (auditory stimuli or infant vocalization) and no information about test conditions, observed the infant's behavior on the external monitor and continuously recorded looking toward or away from each monitor in the booth by means of key codes on a computer keyboard. Looking times during the presentation of each stimulus type were computed automatically. Typically, infants completed the procedure in approximately 5 min.

Results

Infants often look disproportionately long on the initial trial of a novel stimulus, so it is common to exclude the first two trials (i.e., initial exposure of each stimulus) from data analysis (e.g., Vouloumanos and Werker, 2004; Volkova et al., 2006), a procedure followed here. These initial trials can be considered familiarization rather than test trials. Missing values from infants (4 incomplete trials: 1–2 trials from 3 infants) were replaced with the multiple imputation method (Graham, 2012) implemented with NORM software (Schafer, 1999). Substitution or omission of those values yielded similar results. A preliminary analysis of variance revealed that the effect of age (4–6, 7–9, 10–13 months) on looking time was not significant. Age, considered as a continuous variable in a regression analysis, also made no contribution to looking time. Consequently, age was excluded from the main analysis. A paired sample t-test on cumulative looking time across the four trials with each stimulus revealed a significant difference between speech and singing [t₍₄₉₎ = 3.35, p < 0.01, two tailed]. Infants looked longer during the syllable sequences (M = 77.93 s, SD = 53.98 s) than during the hummed lullabies (M = 50.14 s, SD = 29.58) (see Figure 2). A binomial test revealed that of the 50 infants in the sample, 36 (72.0 %) had longer looking times for speech, z = 3.11, p < 0.01.

FIGURE 2

Figure 2. Cumulative looking time in seconds (s) for singing and speech. Error bars are standard errors (**p < 0.01; *p < 0.05).

Discussion

Infants exhibited greater attention to the ID speech syllables than to the hummed lullabies despite the greater coherence and continuity of the lullabies. Although our findings are consistent with the speech bias that has been proposed for young infants (Vouloumanos and Werker, 2004, 2007), there are a number of alternative interpretations. The stimuli contrasted in other respects than the presence or absence of speech or syllabic content. For one thing, the speech was considerably more variable than the humming in pitch and amplitude. Typical ID speech has much more continuity than the present sequence of disconnected syllables, each of which had the properties of stressed syllables. Moreover, each of the syllables had the exaggerated pitch contours that are considered critical in capturing infant attention (e.g., Fernald and Kuhl, 1987), and these contours were highly variable. The speech stimuli also had bursts of very high-pitched sound at irregular time intervals (see Figure 1), which could have functioned as salient alerting signals. Hummed speech produces less neural activation than natural speech (Perani et al., 2011), so one would expect hummed music to produce less cortical activation than other types of vocal music.

The affective qualities of the stimuli also differed dramatically, with the speech having the properties of high-arousal happiness or joy and the hummed lullabies being tranquil and soothing. Joyful or happy speech reliably attracts and maintains the attention of Western infants (Singh et al., 2002), and joyful music may do likewise. In contrast to Western mothers, who engage in lively vocal and non-vocal interactions with their infants, mothers in many others cultures interact in ways that are primarily soothing rather than arousing (Toda et al., 1990; Trehub and Schellenberg, 1995; Trehub and Trainor, 1998). It is possible that infants who are accustomed to soothing vocal interactions would distribute their attention differently from the infant participants in the present experiment. Nevertheless, the youngest infants in the present study, who might experience more soothing interactions than the older infants (Kitamura and Burnham, 2003), responded no differently than the older infants.

Finally, the stimuli in the present study were atypical in a number of respects. The speech stimulus had the usual exaggerated pitch contours and happy affect of Western mothers in the unusual context of two single, non-contiguous syllables that repeated with variable renditions (following Vouloumanos and Werker, 2004, 2007). In other words, it was dramatically different from conventional ID speech. Although lullabies, sung or hummed, are common in non-Western cultures, they are used infrequently in Western cultures (Trehub and Trainor, 1998). If Western infants are exposed to lullabies, such exposure typically occurs when they are sleepy or distressed rather than awake or alert. For those reasons, we used more conventional stimulus materials in subsequent experiments, namely the lyrics of foreign children's songs that were spoken or sung.

Experiment 2

The goal of the present experiment was to ascertain the relative efficacy of speech and singing for maintaining infant attention when verbal or syllabic content and affective intentions are similar across vocal modes. Infants were presented with a sung and spoken version of an unfamiliar Turkish play song, both produced in an ID or joyful manner. The same lyrics ensured comparable phoneme sequences despite their different realization in speech and singing. Although the overall affective intentions were joyful in both cases, the means of achieving those intentions differ in speech and singing, with unknown consequences.

In research with ID and AD speech, the stimuli are often drawn from natural interactions with infants and adults (e.g., Kitamura and Burnham, 1998) so that verbal content and speaking style differ. At other times, actors portray ID and AD speech with the same verbal content (e.g., Singh et al., 2002). No previous study used the texts of play songs, which include words and nonsense syllables that are distinctive and memorable as well as alliteration, assonance, and rhyme. As a result, the spoken ID version was closer to a spoken nursery rhyme than to conventional ID speech, reducing many of the usual differences between spoken and sung material for infants. Differences between speech and singing still remained, however, with speech being more variable in its pitch patterns and amplitude and also lacking the steady beat of music. If the expanded pitch range and greater pitch variability of speech drive infant attention (e.g., Fernald and Kuhl, 1987; Fernald, 1992), then infants could be expected to attend longer to the spoken lyrics. If happy affect is primarily responsible for infants' listening choices, as is the case for speech style (Singh et al., 2002), then infants might respond no differently to happy ID speech and singing with comparable verbal content.