Congenital Amusia (or Tone-Deafness) Interferes with Pitch Processing in Tone Languages

Tillmann, Barbara; Burnham, Denis; Nguyen, Sebastien; Grimault, Nicolas; Gosselin, Nathalie; Peretz, Isabelle

doi:10.3389/fpsyg.2011.00120

ORIGINAL RESEARCH article

Front. Psychol., 17 June 2011

Sec. Auditory Cognitive Neuroscience

volume 2 - 2011 | https://doi.org/10.3389/fpsyg.2011.00120

This article is part of the Research TopicThe relationship between music and languageView all 23 articles

Congenital amusia (or tone-deafness) interferes with pitch processing in tone languages

Barbara Tillmann^1,2,3,4*

Denis Burnham⁴

Sebastien Nguyen^5,6

Nicolas Grimault^1,2,3

Nathalie Gosselin^5,6

Isabelle Peretz^5,6

¹ CNRS, UMR5292; INSERM, U1028; Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Lyon, France
² University Lyon, France
³ University Lyon 1, Villeurbanne, France
⁴ MARCS Auditory Laboratories, University of Western Sydney, Sydney, NSW, Australia
⁵ International Laboratory for Brain, Music and Sound Research (BRAMS), Université de Montréal, Montréal, QC, Canada
⁶ Department of Psychology, Université de Montréal, Montréal, QC, Canada

Congenital amusia is a neurogenetic disorder that affects music processing and that is ascribed to a deficit in pitch processing. We investigated whether this deficit extended to pitch processing in speech, notably the pitch changes used to contrast lexical tones in tonal languages. Congenital amusics and matched controls, all non-tonal language speakers, were tested for lexical tone discrimination in Mandarin Chinese (Experiment 1) and in Thai (Experiment 2). Tones were presented in pairs and participants were required to make same/different judgments. Experiment 2 additionally included musical analogs of Thai tones for comparison. Performance of congenital amusics was inferior to that of controls for all materials, suggesting a domain-general pitch-processing deficit. The pitch deficit of amusia is thus not limited to music, but may compromise the ability to process and learn tonal languages. Combined with acoustic analyses of the tone material, the present findings provide new insights into the nature of the pitch-processing deficit exhibited by amusics.

Introduction

A highly debated question is to what extent music and language share processing components (e.g., Patel, 2008). Our study contributes to this debate by investigating pitch-processing across domains in congenital amusia (or tone-deafness). Congenital amusia refers to a lifelong disorder of music processing that occurs despite normal hearing and other cognitive functions as well as normal exposure to music. We investigated here whether the impaired musical pitch perception typically found in congenital amusia might reflect a domain-general deficit that also affects pitch processing in speech.

Pitch processing is crucial in music, but also in speech processing, notably for discriminating questions and statements, as well as for emotional expressions in non-tone intonation languages (e.g., English, French); while in tone languages (e.g., Mandarin, Thai, Vietnamese), it is used for all these as well as for understanding word meaning. Tone-languages comprise 70% of the world’s languages (Yip, 2002) and are spoken by more than 50% of the world’s population (Fromkin, 1978). In these languages, tone variation changes (comprising predominantly FØ height and contour parameters) at the syllabic level have the same effect on word meaning as do vowel and consonant variations in non-tone languages. For examples, see Figure 1. In the present study, we used natural speech samples of tone languages to investigate whether pitch variations in speech might be affected by the previously described musical pitch deficit in congenital amusia.

FIGURE 1

Figure 1. (A) Fundamental frequency contours of the four Mandarin tones (spoken by a female speaker of Mandarin). Each tone on the syllable/ma/represents a different lexical item. “ma1” is the level tone, “ma2” the rising tone, “ma3” the dipping tone, and “ma4” the falling tone. (B) Fundamental frequency contours of the five Thai tones (spoken by a female speaker of Thai). Each tone on the syllable/ma/represents a different lexical item. “ma0” is the mid level tone, “ma1” the low level tone, “ma2” the falling tone, “ma3” the high level tone, and “ma4” the rising tone.

Expertise or training in a tonal language can facilitate pitch perception and production with musical material: Mandarin, Vietnamese, and Cantonese speakers have been found to be more accurate at imitating musical pitch and discriminating intervals than English speakers (Pfordresher and Brown, 2009; see also Hove et al., 2010), as can be also reflected in subcortical pitch tracking (e.g., Krishnan et al., 2005). The influence of tone-language background has been mostly observed for relative pitch processing (e.g., intervals), and it might even lead to difficulties in pitch contour processing when non-speech target sounds resemble features of linguistic tones (Bent et al., 2006). However, it has been found that listeners with tone-language background did not differ from listeners with non-tone-language background for absolute pitch discrimination of non-speech sounds (e.g., Bent et al., 2006; Pfordresher and Brown, 2009). Interestingly, in musically trained participants, there is a link between tone-language background and single pitch processing: absolute pitch (i.e., the ability to label a tone without a reference pitch) is more prevalent among tone-language speakers than among non-tone-language speakers (Deutsch et al., 2006, 2009).

Conversely, it has been shown that musical training or expertise can improve pitch perception not only in musical contexts, but also in speech contexts. For example, musicians show improved pitch processing for the prosody of non-tonal language material (Schön et al., 2004; Magne et al., 2006) and for tone-language material, such as Thai tones (Burnham and Brooker, 2002; Schwanhäußer and Burnham, 2005) and Mandarin tones (Alexander et al., 2005; Wong et al., 2007; Lee and Hung, 2008; Delogu et al., 2010; Bidelman et al., 2011).

Previous research has thus shown some positive influences between music and speech due to expertise in music or in tone languages, and these effects suggest common pitch-processing mechanisms in music and speech. For example, musical training might shape basic sensory circuitry as well as corticofugal tuning of the afferent system, which is context-general and thus also has positive side-effects on linguistic pitch processing (e.g., Wong et al., 2007). Similar findings suggesting experience-dependent corticofugal tuning have been recently reported for the effects of tone-language expertise on musical pitch processing (Bidelman et al., 2011). In parallel to this previously observed training-related improvement of pitch processing from one domain to the other, the experiments reported here investigate the influence of a pitch perception deficit for music, as observed in congenital amusia, on pitch perception in speech.

Up to recently, congenital amusia has been thought to result from a musical pitch-processing disorder. Individuals with congenital amusia have difficulties recognizing familiar tunes without lyrics and detecting an out-of-key or out-of-tune note. They have impaired perception of pitch directions for pure tones (Foxton et al., 2004) and for detecting pitch deviations that are smaller than two semitones in sequences of piano notes (Hyde and Peretz, 2004) as well as in note pairs (Peretz et al., 2002). Initial reports have suggested that the deficit was restricted to pitch processing in music, and did not extend to pitch processing in speech material. Individuals with congenital amusia have been reported to be unimpaired in language and prosody tasks, such as learning and recognizing lyrics, classifying a spoken sentence as statement or question based on final falling or rising pitch information (e.g., Ayotte et al., 2002; Peretz et al., 2002).

Peretz and Hyde (2003) suggested that the difference between pitch perception in speech and music is related to the relative size of relevant pitch variations. In speech (of non-tonal languages), pitch variations are typically coarse (e.g., more than 12 semitones in the final pitch rise indicative of a question; see Fitzsimons et al., 2001), whereas in music, these are more fine-grained (1 or 2 semitones; Vos and Troost, 1989; see Figure 2). Accordingly, amusics’ pitch deficit would affect music more than speech not because their deficit is music-specific, but because music is more demanding in pitch resolution. Thus, congenital amusia would represent a music-relevant deficit, not necessarily a music-specific deficit. However, when pitch changes of spoken sentences were embedded in a non-speech context (i.e., musical analogs preserving gliding-pitch changes or transforming these into discrete steps), amusics failed to discriminate these pitch changes – in contrast with their high performance level for the same pitch changes in the sentences (Patel et al., 2005). Conversely, recent data have shown that, for some amusic cases, the pitch-processing deficit can also affect the processing of speech intonation in the amusics’ mother tongue (Patel et al., 2008; Jiang et al., 2010; Liu et al., 2010). In particular, a slow rate of gliding-pitch change might have deleterious effects on pitch perception in English, but not in French speech (Patel et al., 2008), although the influence of glide rate has not been replicated in a subsequent study for English in British amusics (Liu et al., 2010).

FIGURE 2

Figure 2. Fundamental frequency (F0) contours of (A) the spoken sentence “She forgot her book” (with the intonation pattern of a question (?) and a statement (.), ranging typically more than 12 semitones), and (B) of the song “Happy Birthday” (with most pitch variations ranging between 0 and 2 semitones) [Reprinted from Trends Cogn. Sci, 7, Peretz, I., and Hyde, K. L., What is specific to music processing? Insights from congenital amusia. 362–367, Figure 2, copyright (2003), with permission from Elsevier].

In addition to differences in the size of pitch changes, musical, and linguistic materials differ in their use of discrete, segmented events versus continuous pitch changes (i.e., glides), respectively. Foxton et al. (2004) have shown that congenital amusics have higher thresholds for segmented tones (exceeding one semitone) than for continuous tone glides (below one semitone). These higher thresholds would affect music perception more markedly, because pitch in music is based upon discrete notes, while the better thresholds for glides might lead to less impairment for pitch processing in speech signals, with its gliding, continuous pitch changes. However, this is unlikely since amusics perform equally poorly on tone analogs of sentences made of pitch glides and discrete events (Patel et al., 2005). In a recent study using discrete segmented events (a tone for the musical material, the syllable/ka/for the verbal material), we have observed that fine-grained pitch discrimination (i.e., 25 cents) can be impaired in amusics not only for musical sounds, but also for verbal sounds. Interestingly, pitch discrimination is better when the pitch is carried by verbal material than by musical material (Tillmann et al., submitted).

Pitch perception in congenital amusia might thus be affected by the size of pitch changes and the nature of the material (verbal, musical). The present study investigates amusics’ pitch processing in tonal language material in order to address the question: do congenital amusics show deficits for lexical tone perception, thus for speech material with continuous (rather than discrete) pitch changes, and with pitch changes larger than those that are relevant in music, but smaller than those used in statements/questions in their mother tongue (see Figures 1 and 2; Fitzsimons et al., 2001)? We tested French-speaking amusics for their perception of Mandarin tones (Experiment 1) and Thai tones (Experiment 2). We here used monosyllabic words (in contrast to sentences or phrases in previous studies) to keep memory load relatively low, in particular as amusic individuals show impaired short-term memory for pitch information (Gosselin et al., 2009; Tillmann et al., 2009; Williamson et al., 2010). Experiment 2 additionally tested the perception of the same pitch changes in non-verbal, musical analogs. Furthermore, for both Experiments 1 and 2, we present acoustic analyses of the tone-language stimulus materials, and compare the acoustic features of the stimulus materials with participants’ behavioral performance, in order to locate the critical acoustic information used by amusic and control participants.

The overall objective of our study is to further understand the nature of the pitch-processing deficit experienced by individuals with congenital amusia, particularly because congenital amusia is now known to have neurogenetic correlates (Drayna et al., 2001; Peretz et al., 2007), and these may not be music-specific, but apply also to speech, as suggested by recent work on amusics’ perception of pitch in speech of their mother tongue (e.g., Liu et al., 2010; Nan et al., 2010). Testing the perception of tone-language materials allowed us to use natural speech that contained smaller pitch differences than those occurring in native non-tonal speech (English or French), as in the sentences used in Patel et al. (2005) and Liu et al. (2010). Another advantage of using non-native pitch variations in speech is that, as the words have no meaning for the participants, they are free to respond to the acoustic parameters of the speech without any added complication of semantic significance. This also has the advantage that the non-native tones can be presented as speech that has full speech-shaped spectral information, albeit devoid of semantic significance; and as non-speech – in which the same tones are converted to musical stimuli. In this way, the same acoustic aspects of speech, notably here the pitch information, can be presented in speech and non-speech contexts in which the main difference between speech and more musical non-speech stimuli is the differences in spectral make-up. In addition, testing amusics who were non-native speakers with speech signals that were tone-language materials also allowed us to aim for converging evidence with findings on tone-language processing recently reported for amusics who were native speakers (Jiang et al., 2010; Nan et al., 2010). Note that previous research has shown that non-native (non-amusic) listeners still engage a speech listening mode (as reflected by the linguistic constraints of their mother tongue) when processing non-native tone-language materials (e.g., Burnham et al., 1996; Burnham et al., submitted).

Experiment 1

Mandarin Chinese uses four tones characterized by their pitch trajectories, traditionally numbered as tones 1–4: tone 1 is high level, tone 2 is mid-rising, tone 3 mid-dipping (or mid-falling–rising), and tone 4 is high-falling (see Figure 1A). Tone 1 has little fundamental frequency (FØ) movement (and so is often referred to as a level tone), whereas tones 2–4 have more FØ movement (and so often referred to as contour or dynamic tones, Abramson, 1978): tone 2 has a rising FØ pattern, tone 3 a falling–rising FØ pattern, and tone 4 a falling FØ pattern.

Experiment 1 tested native French-speaking congenital amusics’ for their pitch discrimination with this unfamiliar language material. Even though normal French listeners do not perceive tone contrasts categorically, they are sensitive to tone contour variations (Hallé et al., 2004). A same-different paradigm using monosyllabic Mandarin Chinese words was employed; it was taken from Klein et al. (2001) who showed that normal English or Mandarin speaking participants reached high levels of performance (although English-speakers performed at a level slightly below that of Mandarin speakers, 93 versus 98% accuracy). If the hypothesis of domain-general pitch-processing mechanisms is true, then it can be predicted that amusics’ musical pitch deficit should lead to impaired performance in this discrimination task.

Method

Participants

The amusic group and the control group each comprised 20 adults who were native French speakers (from Canada and France). The groups were matched for gender, age, education, and musical training (see Table 1). All participants completed the Montreal battery of evaluation of amusia (MBEA; Peretz et al., 2003), which is currently widely used in research investigating congenital amusia. The MBEA involves six tests that aim to assess the various components that are known to contribute to melody processing. The stimuli are novel melodic sequences, played one note at a time on a piano; they are written in accordance to the rules of the tonal structure of the Western idiom. These melodies are arranged in various tests so as to assess abilities to discriminate pitch and rhythmic variations, and to recognize musical sequences heard in prior tests of the battery. Peretz et al. (2003) tested a large population and defined a cut-off score (78%) under which participants can be defined as amusics or above which participants were normal. Participants’ individual scores for the full battery were below cut-off for the amusic group, but not for the control group (Table 1). One amusic participant reported living in China from age 6 to 10; he indicated that he took lessons in Mandarin Chinese but with difficulty and that he spoke either French (he attended a French school) or English, not Mandarin, during that period. Note that 16 out of the 20 amusics have also participated in the experiment testing pitch change detection in verbal and non-verbal materials (Tillmann et al., submitted).

TABLE 1

Table 1. Number of participants per group (amusic/controls) for Experiments 1 and 2, followed by mean age (years), gender distribution, mean education (years), level of musical instruction as well as mean scores obtained on the Montreal battery of evaluation of amusia (MBEA), for the entire test (global score), and the subtests.

Material

The 98 recordings of monosyllabic Mandarin words (produced by a native female Mandarin speaker) from Klein et al. (2001) were used (see Table 3 for acoustic descriptors). In all there were 51 different words (13 words with level tone, 10 words with rising tone, 15 words with dipping tone, and 13 words with falling tone), with multiple recordings of 25 words for use in same-word pairs (thus leading to acoustic variability between words used in the same-word condition). These words consisted of various consonant–vowel (CV) combinations (e.g., /nju//kuaI//t∫uən/). Words were presented in 49 pairs: 24 composed of word pairs with the same CV combination but differing in the tone, and 25 composed of different renditions of the same words, and so having the same tone¹. For all participants, word combinations presented in a pair (and word order within each pair) were the same. Each of the 98 recordings was used once in the task. The experiment was run with E-Prime software (Schneider et al., 2002).

Procedure

Within each pair, the first word was followed by a silent period of 350 ms, followed by the second word. Following each pair, listeners were asked to judge whether the two words of the pair were the same or different, by pressing one of the indicated keys on a computer keyboard. Participants were not explicitly told that the relevant dimension for discrimination was pitch. Listeners were first familiarized with the task by means of three practice pairs followed by error feedback, and then moved to the 49 experimental pairs without feedback. After participants’ responses, the next pair was presented after a delay of 2 s. The order of presentation of word pairs was randomized for each participant. The experimental session lasted for about 10 min.

Results

Performance was analyzed by calculating proportions of Hits (number of correct responses for different trials/number of different trials) minus False Alarms (FAs; number of incorrect responses for same trials/number of same trials). The amusic group performance was significantly below that of the control group, F (1,38) = 11.63, p = 0.002. Nevertheless, as can be seen in Figure 3A, there was substantial overlap between the groups. Only three amusic individuals performed 2 SD below average control performance. The amusic participant who had spent some time in China reached a performance level of 0.46, i.e., in the lower performance range of the amusic group and within the 2 SD of average control performance.

FIGURE 3

Figure 3. Performance [expressed as Hits − False Alarms (FA)] for amusic and control participants in Experiment 1 with the Mandarin material (A) and in Experiment 2, with the Thai material (speech) and its musical analogs [music; (B)]. Dots represent individual participants.

Correlations² between performance and the six subtests of the MBEA reached significance only for the interval subtest in amusics, r(18) = 0.47, p = 0.038.

To specify whether the observed group difference was associated with a reduced sensitivity to lexical tone pitch changes, or rather to a propensity to judge a “same” pair to entail a change, we ran two two-sided, independent t-tests for Hits and FAs, respectively. Amusics made fewer hits (0.66) and more FAs (0.09) than controls (0.79 versus 0.05), t(38) = 2.70, p = 0.010, and t(38) = 2.23, p = 0.031, respectively. Amusics performed more poorly than controls by both failing to discriminate pairs that were different, and erroneously judging same-word pairs to be different. In general then, it appears that amusics had a less clear grasp of this speech-based pitch discrimination task than did controls.

An additional analysis separated tone pairs as a function of the different tone comparisons (Level–Rising, Level–Dipping, Level–Falling, Rising–Dipping, Rising–Falling, Dipping–Falling). An 6 × 2 ANOVA on Hit–FA rates with tone comparisons as within-participants factor and Group (amusics/controls) as between-participants factor revealed a main effect of tone comparison [F(5,190) = 10.81, p < 0.0001, MSE = 0.05] and a main effect of group [F(1,38) = 11.31, p = 0.002, MSE = 0.17], but no interaction (p = 0.39). Control participants performed generally better than the amusic participants, but both groups showed higher performance level for the pairs comparing level and dipping tones (0.68 for amusics and 0.87 for controls) and for pairs comparing dipping and falling tones (0.71 for amusics and 0.86 for controls) than for the other tone pair comparisons (0.48 for amusics and 0.66 for controls).

Discussion

Experiment 1 revealed that amusics, who were speakers of a non-tonal language (French), encountered difficulties in Mandarin lexical tone discrimination (in comparison to their matched controls). In addition, amusics’ performance correlated with their performance in the interval test of the MBEA, which requires the discrimination of tone sequences differing by interval sizes: the lower their performance on the interval test with melodies, the lower their performance in the lexical tone discrimination for Mandarin. These findings support the conclusion that amusics’ pitch deficit in melodies extends to the perception of pitch in speech material.

While amusics’ average performance was below the group performance of controls, there was considerable overlap in performance ranges between the two groups. The relatively comparable performance of the amusics might be due to some pitch variations in the Mandarin tones (or the comparisons in some of the pairs) that might exceed amusics’ thresholds (e.g., larger than two semitones). To increase the difficulty of the task and to assess the generality of the findings, Experiment 2 tested amusics and controls with a same-different paradigm using Thai tones.

Experiment 2

Standard Thai uses five tones: three level tones (low, mid, high) and two contour tones (rising and falling), referred to as static and dynamic tones respectively (see Figure 1B; Abramson, 1962). The tone systems of Thai and Mandarin are different in relation to the number of tones as well as their pitch height, durations and start/end points. They also show some similarities. For example, both Thai and Mandarin have one rising and one falling contour tone and at least one level tone. Thai, however, contains five (not four) different tones, and these tones are based on smaller pitch changes together with weaker contribution of durational differences than in Mandarin. Previous studies have shown that English-speaking children and adults can discriminate Thai tones (Burnham et al., 1996; Burnham and Francis, 1997; Burnham and Brooker, 2002), and Experiment 2 tested for the first time congenital amusics on this material.

In addition, to further our understanding of domain-generality versus -specificity of pitch processing, Experiment 2 compared amusics’ performance for the lexical tones to their performance on musical analogs thereof (pitch variations of the Thai tones applied to a violin sound). Previous studies have shown that (1) normal, English-speaking listeners performed worse for the speech signals than for musical analogs or low-pass filtered versions of the speech signals (Burnham et al., 1996; Burnham and Brooker, 2002), and (2) musical training boosted overall performance levels, with musicians without absolute pitch performing better than non-musicians, and musicians with absolute pitch performing the best (Burnham and Brooker, 2002). Based on these positive transfer effects of musical training, Burnham and Brooker (2002) concluded that speech and music perception are not independent, and that musical training and absolute pitch ability can affect speech processing. Thus, amusics’ musical deficit would predict impaired processing for pitch in the speech material here, in line with data of Experiment 1 as well as some previous work on amusics’ perception of their mother tongue, whether non-tonal (Patel et al., 2008; Liu et al., 2010) or tonal (Nan et al., 2010).

Impaired pitch processing for speech material has been also reported for a task requiring fine-grained pitch change detection in sequences of a repeated syllable (/ka/) (Tillmann et al., submitted). Even though impaired, amusics’ performance was less impaired for these syllable sequences than for sequences with repeated tones (carefully matched to the syllables for their acoustic features). This performance difference between speech and musical sounds might be linked to differences in the energy distribution of the sounds’ spectrum, notably by the presence versus absence of formants, and/or to higher-level processing related to strategic influences (see below for further discussion). In the Tillmann et al. (submitted) study, the to-be-detected pitch changes were instantiated between syllables (or tones), thus between segmented events, and not within a given event with continuous pitch changes as in the materials of Experiments 1 and 2 here. In Experiment 2, we thus hypothesized that for the processing of pitch in Thai tones, amusics’ performance should benefit from the speech signal, leading to some boost in pitch processing (compared to musical analogs), at least for the most severely impaired amusics (Tillmann et al., submitted). Controls, however, should perform better for the musical material (Burnham and Brooker, 2002; Tillmann et al., submitted).