Differential Difficulties in Perception of Tashlhiyt Berber Consonant Quantity Contrasts by Native Tashlhiyt Listeners vs. Berber-Naïve French Listeners

Hallé, Pierre A.; Ridouane, Rachid; Best, Catherine T.

doi:10.3389/fpsyg.2016.00209

ORIGINAL RESEARCH article

Front. Psychol. , 01 March 2016

Sec. Psychology of Language

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.00209

This article is part of the Research Topic New advances on the perception and production of non-native speech sounds View all 8 articles

Differential Difficulties in Perception of Tashlhiyt Berber Consonant Quantity Contrasts by Native Tashlhiyt Listeners vs. Berber-Naïve French Listeners

Updated

A correction has been applied to this article in:

Corrigendum: Differential Difficulties in Perception of Tashlhiyt Berber Consonant Quantity Contrasts by Native Tashlhiyt Listeners vs. Berber-Naïve French Listeners
1. Read correction

$\r\nPierre A. Hall,,*$ Pierre A. Hallé^1,2,3^*

Rachid Ridouane¹

Catherine T. Best^3,4

¹Laboratoire Phonétique et Phonologie, Centre National de la Recherche Scientifique, Paris, France
²Laboratoire Mémoire et Cognition, Institut National de la Santé et de la Recherche Médicale, Paris, France
³Haskins Laboratories, New Haven, CT, USA
⁴MARCS Institute and School of Humanities and Communication Arts, University of Western Sydney, Sydney, NSW, Australia

In a discrimination experiment on several Tashlhiyt Berber singleton-geminate contrasts, we find that French listeners encounter substantial difficulty compared to native speakers. Native listeners of Tashlhiyt perform near ceiling level on all contrasts. French listeners perform better on final contrasts such as fit-fitt than initial contrasts such as bi-bbi or sir-ssir. That is, French listeners are more sensitive to silent closure duration in word-final voiceless stops than to either voiced murmur or frication duration of fully voiced stops or voiceless fricatives in word-initial position. We propose, tentatively, that native speakers of French, a language in which gemination is usually not considered to be phonemic, have not acquired quantity contrasts but yet exhibit a presumably universal sensitivity to rhythm, whereby listeners are able to perceive and compare the relative temporal distance between beats given by successive salient phonetic events such as a sequence of vowel nuclei.

Introduction

Cross-linguistic studies of nonnative speech perception—which usually bear on the perception of sublexical units—help us understand the mechanisms that underpin the early stages of pre-lexical speech perception. Pre-lexical processes are the least likely to be biased by such top-down effects as lexical feedback. Some early-stage mechanisms seem to hold universally across languages in their principles but may vary from one language to the other in their specific tunings. Such is the case of categorical perception. And indeed, most of the work accomplished so far in the domain of nonnative speech perception has dealt with the issue of categorization. The models elaborated to account for the observed patterns of nonnative speech perception generally try to formalize how the nonnative phones are categorized or not in terms of native categories (PAM: Best, 1995; L2LP: Escudero, 2005, 2009), and accordingly, how well various nonnative phonemic contrasts may be discriminated, or how difficult it may be to acquire new phonetic categories in the process of learning a second language (SLM: Flege, 1995; PAM-L2: Best and Tyler, 2007; L2LP: Escudero, 2009). Interestingly, the nonnative speech enterprise has not focused equally on the various dimensions of speech sounds. For one thing, the main focus has generally been on consonants, more so than on vowels or tones (see Tyler et al., 2014, for an overview of this asymmetric situation; for tones, see Hallé et al., 2004; So and Best, 2014). Second, most studies have dealt with the perception of single segments rather than segment sequences such as clusters (but see Hallé and Best, 2007; Best and Hallé, 2010).

Yet another dimension may be viewed as somewhat neglected: segmental quantity, that is, vowel or consonant distinctive duration. Most of the cross-linguistic studies on the perception of phonemic geminate consonants concern second language learning, for example the difficulties encountered by English or Korean learners of Japanese with Japanese geminates in either production or perception (e.g., Hayes, 2002; Hardison and Saigo, 2010; Sadakata and McQueen, 2011; Sonu et al., 2013). The majority of psycholinguistic or phonologically-oriented perceptual studies on gemination are within-language studies of native speakers (Pattani Malay: Abramson, 1986; Kelantan Malay: Hamzah, 2013; Cypriot Greek: Muller, 2001; Swiss German: Kraehenmann, 2001; Tashlhiyt Berber: Ridouane and Hallé, 2010). The situation is similar for vowel quantity contrasts, with most of the cross-linguistic studies on native vs. second language learners of languages with contrasting vowel quantity such as Japanese, Swedish, Finnish, etc. (e.g., McAllister et al., 2002; Hirata, 2004; Ylinen et al., 2005), Note that some studies investigated both vowel and consonant quantity. In particular, one large cross-linguistic study investigated the interaction between vowel and consonant duration in the perception of consonant quantity (Kingston et al., 2009).

This paper contributes to the study of nonnative perception of consonant quantity contrasts by examining naïve listeners rather than L2 learners. We explore how Berber-naïve French listeners, for whom all segmental quantity contrasts are, in principle, phonemically nonnative, discriminate a particular set of Tashlhiyt Berber (henceforth, Tashlhiyt) singleton-geminate consonant contrasts. We begin with (1) a brief sketch of the state of affairs with respect to the implementation of segmental quantity in French, and (2) a brief review of a few previous studies that have explored French listeners' perception of either native or nonnative quantity contrasts.

Segmental Quantity in French

French has no duration quantity contrasts in its phonemic repertoire, whether for consonants (geminate vs. singleton consonants) or vowels (long vs. short vowels). That is, there are, in principle, no minimal pairs of French words that would differ purely in consonant or vowel duration. However, quantity contrasts do occur at the margins of lexical phonology: this is the case of “fake” geminates. Geminate consonants or vowels may appear across word or morpheme boundaries in French just as in English, German, and many other languages, as in (1a–b). Geminate consonants may also result from schwa deletion, as in (2). Finally, gemination of the /r/ consonant is also observed in inflected verb forms, as in (3). This latter case may seem to produce “true” lexical minimal pairs, but it should rather be viewed as a case of vowel-deletion: courrai < *courirai < courir (cf. finirai < finir). Diachronically, these forms derive from the deletion of atonic vowels in Latin (e.g., je mourrai/ mur.rε/ < mor(i)raio < morīre habeo). Similar forms obtain, synchronically, from schwa deletion in –er verbs whose stem ends with r (e.g., il déclarerait /deklar.rε/ < /deklarərε/; see Meisenburg, 2006, for more examples).

(1a) il frappe pas /frap.pa/ ‘he doesn't hit’ vs. il frappa /fra.pa/ ‘he hit’

(1b) à apprendre /aapr $\tilde{α}$ dr/ ‘to be learned’ vs. à prendre /apr $\tilde{α}$ dr/ ‘to be taken’

(2) là dedans /lad.d $\tilde{α}$ / ‘in there’ vs. la dent /la.d $\tilde{α}$ / ‘the tooth’

(3) il courrait /kur.rε/ ‘he would run’ vs. il courait /ku.rε/ ‘he was running’

Whereas, (1–3) show examples of geminate-singleton minimal pairs at the sentence level, there are many other occurrences of geminate consonant utterances that are not minimal pairs, as in (4). Geminate consonants thus do occur in French. In most cases, two identical consonants are brought into contact, or concatenated, due to word juxtaposition or to schwa deletion. In other words, French geminates are concatenation, or “fake,” geminates (Hayes, 1986; Schein and Steriade, 1986).

(4) robe beige /rɔb.bεӡ/ ‘beige dress’; netteté /nεt.te/ </nεtəte/ ‘sharpness’

As for vowels, standard French (as spoken in the Île de France area) has no vowel quantity contrast, except simple vs. concatenated vowels at the sentence level as in (1b). French has been argued to have instead a vowel tense-lax contrast (Jakobson et al., 1952; Tranel, 1987), i.e., marked by vowel quality as well as duration differences. Yet, some regional varieties of French have maintained a final vowel duration contrast to mark grammatical gender, as in ami-amie (“boyfriend” vs. “girlfriend”) pronounced [ami]-[ami:]. This is the case of French spoken in Switzerland and Belgium but not of, for example, Parisian French.

Given these properties of standard French, we may ask whether Parisian French listeners can distinguish between short and long vowels and between short and long consonants.

French Perception of Segmental Quantity

For vowels, Duncan (1975) manipulated vowel duration of naturally spoken French words and showed that French listeners (of mixed regional origin) were sensitive to vowel duration for pairs such as mettre-maître (“to put” vs. “master”) in which /ε/ may be pronounced longer in maître than mettre. She found that a 32 ms lengthening of /ε/ in mettre was sufficient to lower the initially dominant rate of “mettre” judgments down to 50% (More detailed quantitative data were unfortunately not reported in her study). Grosjean et al. (2007) found that Parisian French listeners did quite poorly at identifying words ending with a long vowel (such as amie [ami:] as opposed to ami [ami]) compared to Swiss French listeners who performed at near ceiling level. As for perception of vowel quantity contrasts in languages other than French, Parisian French listeners have been shown to discriminate Japanese vowel quantity contrasts less well than native Japanese listeners (Dupoux et al., 1999). However, the Parisian French subjects in the Dupoux et al.'s (1999) study nonetheless achieved rather high-level performance: from 80 to 90% correct, depending on the experimental condition.

For consonants, the literature is rather scarce on French sensitivity to consonant duration. Delattre (1971a,b) examined the production and perception of fake and true geminates in English, German, Spanish, and French. For example, he compared noun phrase pairs such as la ville limite /lavil.limit/ (“the extreme town”) vs. la vie limite /lavilimit/ (“the extreme life”) (French), or word pairs in Spanish such as pero (“but”) vs. perro (“dog”) (Spanish). He found that consonant duration was the main cue to gemination for both production and perception, with intensity changes and preceding vowel duration as secondary cues. Interestingly, on the basis of intensity curves compared for geminate vs. singleton consonants, Delattre proposed that gemination involved a double articulation, or a “rearticulation,” an idea followed up by Lehiste et al. (1973), who used electromyographic data to conclude that both phonemic and concatenation geminates are produced with a double articulation in Estonian and in English. The perceptual data reported in Delattre (1971b) unfortunately was mostly qualitative. Delattre synthesized continua between minimal pairs of French utterances (e.g., between elle aime /εlεm/ ‘she loves’ and elle l'aime /εl.lεm/ ‘she loves her/him’) by varying the duration of the critical consonant (/l/, /n/, or /s/). This is a crucial manipulation, as we will see later, because prosodic factors other than duration were kept constant. He ran French, English, German, and Spanish listeners on these stimuli, using a 2AFC identification test. According to Delattre, the non-French listeners “understood enough French to distinguish between simple pairs of utterances, yet …had not lost their native habits of speaking and hearing”. In Delattre's report of the results, “the geminates were separated from the single consonants by a wide range of ambiguous durations” (Delattre, 1971b, p. 100), presumably meaning rather low categorization precision (i.e., shallow slope at categorical boundary). Delattre noted slight differences between listener groups. For example, German and Spanish subjects switched from single to geminate consonant categorization of /s/ at shorter durations than English and French subjects. Delattre also found that “the duration of the preceding vowel was not a factor in the perception of consonant gemination” (Delattre, 1971b, p. 112). This conclusion, which is contrary to that of Kingston et al. (2009), was however based on production data, not perception data. Delattre's conclusion that duration is the main factor for the perception of consonant gemination therefore remains unwarranted.

Since these early efforts to examine the perception of fake geminate consonants, by French listeners among others, we are only aware of one more recent study by Meisenburg (2006). Meisenburg, just like Delattre (1971a,b), looked at the production and perception of a few fake gemination minimal pairs, namely, frappe pas /frap.pa/ vs. frappa /fra.pa/, courrait /kur.rε/ vs. courait /ku.rε/ [examples (1a) and (3) above], and il l'a dit /il.ladi/ ‘he said it’ vs. il a dit /i.ladi/ ‘he said.’ Each stimulus was produced once by 12 French speakers, instructed to avoid prosodic marking, and presented to 16 native French listeners, who were administered a 2AFC forced choice identification task. In Table 1, we summarize the production (segment duration) and perception (response accuracy) data reported in more detail in Meisenburg (2006). We added a correlation measure between segment duration and response accuracy (“D × A” correlation): This correlation is positive for geminate and negative for singleton consonants, except in courait. The correlation strength measures the importance of segment duration in the singleton-geminate judgments made by the subjects. As can be seen in Table 1, the identification performance was about 80% for two contrasts out of three. The performance was lowest for courait-courrait.

TABLE 1

Table 1. Meisenburg's (2006) data.

To summarize the available data on the French listeners' perception of speech segment quantity in French (and in Japanese for vowels), it seems well above the chance level most of the time, for both consonant and vowel quantity, although vowel quantity may be more difficult than consonant quantity since it does not convey a linguistic distinction, at least in the standard, Parisian variety of French.

Thus far, we have reviewed data bearing directly on the perception by French listeners of duration quantity distinctions. Other data suggest that French listeners are sensitive to subphonemic consonant duration differences distinguishing, for example, plain consonants and liaison consonants (Spinelli et al., 2003, also see Spinelli et al., 2007, for similar findings with French elision; (Snoeren et al., 2008), for plain vs. assimilated consonants). In those cases, subphonemic duration differences are very modest (e.g., liaison /r/ was found to be 59–64 ms long as compared to 71 ms for plain onset /r/ in Spinelli et al., 2003). Yet, they result in differential associative semantic priming such that listeners tend to recover the speaker's intended meaning.

In this study, we focus on the French perception of consonant quantity contrasts at a pre-lexical level of perception, using for that purpose a discrimination task on nonnative quantity contrasts. We compare different types of singleton-geminate contrasts in terms of consonant type and within-utterance position, as explained in more detail in the following. For all contrast types, performance of native listeners is taken as the reference for optimal perception, against which the French performance can be compared.

The Present Study

We cited in the Introduction Section a within-language study by Ridouane and Hallé (2010), which tested Tashlhiyt listeners on contrasts between singleton and geminate consonants in Tashlhiyt word minimal pairs. They found near ceiling Tashlhiyt performance for word-initial voiced stop and fricative as well as for word-final voiceless stop contrasts, but rather poor performance on word-initial voiceless-stop contrasts (between 55 and 70% correct discrimination). The Tashlhiyt listeners' performance only slightly improved with audio-visual stimuli. Unpublished data collected for French listeners' discrimination on these same voiceless-stop stimuli showed that French performance was only slightly poorer than for Tashlhiyt listeners. These contrasts are presumably difficult since, as shown in Ridouane (2007), Tashlhiyt utterance-initial geminate and singleton voiceless stops do not reliably differ acoustically, although they clearly differ in their articulation as reflected in electropalatographic measurements of tongue-to-palate contact: in particular, the durations of their closure, which cannot be perceived word-initially from the acoustic signal, are in a ratio close to 2:1. These word-initial voiceless stop contrasts therefore would not allow a classic cross-linguistic comparison between native control listeners performing near ceiling on native contrasts and French listeners expected to perform more poorly on nonnative contrasts.

We therefore chose for the present study the easier Tashlhiyt singleton-geminate contrasts, on which native listeners have been reported to perform near ceiling (Ridouane and Hallé, 2010). These contrasts are substantiated by clear acoustic differences in duration: for example, differences in frication duration for fricatives, or in pre-release voiced murmur duration for voiced stops. In this paper, we thus focus on these easier contrasts and examine their perception by French vs. Tashlhiyt listeners. One reason to use Tashlhiyt as the target language (other than the fact that French listeners are unlikely to have been exposed to Tashlhiyt) is that singleton-geminate contrasts in this language are almost exclusively marked by duration differences (Ridouane, 2007), which are directly relevant to quantity distinctions. Although the primary acoustic cue to gemination is always duration (Lahiri and Hankamer, 1988; Hankamer et al., 1989; Ridouane, 2010), there are some languages in which other phonetic or prosodic cues participate in the distinction between geminate and single consonants: for example, accentual cues in Pattani Malay (Abramson, 1986), VOT differences in Cypriot Greek (Armosti, 2009), following vowel quality differences in Japanese (Kawahara, 2006). Because we address here the issue of the perception of nonnative consonant duration quantity contrasts, a target language such as Tashlhiyt, for which the possible confounds with other dimensions than duration are minimized, is highly desirable. The rather good French performance on fake geminate consonants reported above (Delattre, 1971b; Meisenburg, 2006), at least on /p:/, /l:/, /n:/, and /s:/, which is presumably due solely to durational differences, might predict rather good performance by French listeners on the Tashlhiyt data as well.

We chose Tashlhiyt contrasts that cover a rather wide spectrum of obstruents in terms of acoustic intensity. The acoustic substance of their constriction or closure portion varied from silence (word-final voiceless stops: e.g., fit-fitt), to low-intensity voicing murmur (word-initial voiced stops: e.g., bi-bbi), with strident frication (word-initial fricatives: e.g., sir-ssir) in between. Our initial guess was that singleton-geminate discrimination would be easier when carried by higher-intensity portions of acoustic signal. More precisely, we predicted that, whereas native speakers of Tashlhiyt likely perform near ceiling on all these contrasts, French listeners should encounter the greatest difficulty with voiceless stops (silence) and the least difficulty with voiceless fricatives (strident frication). We will see that this prediction was not borne out and that the acoustic intensity differences among the contrasted consonants was definitely not the sole factor determining nonnative performance.

Experiment 1

We used natural utterances of Tashlhiyt minimal-pairs as singleton-geminate contrasts in a cross-language AXB discrimination test, comparing native speakers of Tashlhiyt and naive French listeners with no exposure to Tashlhiyt or similar languages. The AXB paradigm was chosen because it taps into a sufficiently abstract level of processing to disclose potential difficulties with nonnative contrasts, without imposing heavy memory load or letting subjects to respond on the basis of low-level auditory-acoustic differences and similarities.

Methods

Participants

Twelve native speakers of French, students or teachers at Paris 3 University, aged 21–57 years (mean 33.4, SD 13 years), and 23 Tashlhiyt native speakers, students at Ibnou Zohr University in Agadir, aged 19–37 years (mean 26.1, SD 4.9 years), volunteered to participate in the experiment. French and Tashlhiyt participants were tested in Paris and Agadir, respectively. None of the 12 French participants had had any exposure to Tashlhiyt or any language using word-initial and word-final geminate-singleton contrasts. None of the French or Tashlhiyt participants reported having any hearing deficits or any kind of language impairment.

Stimuli and Design

Seven geminate-singleton contrasts were used: three voiced stop contrasts in word-initial position (bi-bbi, diR-ddiR, and gar-ggar)¹, two voiceless fricative contrasts in word-initial position (fit-ffit, and sir-ssir), and two voiceless stop contrasts in word-final position (fit-fitt, and hat-hatt). The distribution of consonants within the three contrast-types reflects the distribution observed in Tashlhiyt in general (Ridouane, 2014), in order to hold constant the distributional input that native speakers of Tashlhiyt naturally experience. There were thus a total of 14 items. Four repetitions of each item, produced in isolation by a native speaker of Tashlhiyt, were retained as experimental stimuli. In Tashlhiyt, just as in French, voiced stops are realized as phonetically voiced in any position, that is, with a voiced closure portion whose acoustic realization is a voicing murmur (i.e., pre-voicing: voicing prior to stop release). Geminated voiced stops differ from their singleton counterparts essentially by a longer voiced closure portion. Note that the closure duration distinction holds as well for Tashlhiyt voiceless stops, even in absolute initial position (Ridouane, 2007), although of course it is silent rather than voiced. Likewise, for Tashlhiyt fricatives, geminates differ from singletons essentially by a longer constriction duration. Importantly, Ridouane (2007) did not find acoustic or articulatory cues other than duration that reliably distinguishing geminates from singletons in Tashlhiyt. We ran acoustic measurements on the retained stimuli. As expected, the clearest cue to gemination is durational. The critical durations of the stimuli are summarized in Table 2. In all cases, the geminate's closure duration approaches or exceeds twice that of the singleton.

TABLE 2

Table 2. Mean durations (ms) of prevoicing, frication, and silent closure durations for the three contrast types (word-initial voiced stops and fricatives, and word-final stops), detailed by contrasts.

The duration differences shown in Table 2 were all significant at the p < 0.00001 level, according to two-tailed t-tests in which geminates and singletons were compared for each of the three types of contrast [word-initial voiced stops: t₍₂₂₎ = 20.05; word-initial fricatives: t₍₁₄₎ = 10.79; word-final /t/-/t:/: t₍₂₂₎ = 26.65]. They were accompanied by subtler differences, some of which reached significance on two-tailed t-tests. For instance, in the /t/ coda series, the longer closure for /t:/ than /t/ was partly compensated by a shorter onset consonant (88 < 111 ms), t₍₂₂₎ = 3.99, p < 0.001, and initial vowel (104 < 146 ms), t₍₂₂₎ = 8.68, p < 0.00001. In the voiced stop onset series, both the mean intensity and F0 of the voicing murmur were higher in singleton than geminate consonants [F0: 127 > 118 Hz, t₍₂₂₎ = 3.10, p < 0.01; intensity: 46.5 > 41.9 dB, t₍₂₂₎ = 3.24, p < 0.005], that is, they showed a time-intensity trade in production. We also measured the intensity of these voicing murmurs relative to the next vowel (/a/ or /i/), a measure which is more independent of overall speaking loudness: on this measure as well, singletons were found to have higher intensity voicing murmurs than geminates: −16.7 > −21.4 dB relative to the next vowel. The geminate and singleton word-initial fricatives did not differ in terms of mean intensity; they were about 6 dB higher than the voiced murmur in word-initial voiced stops (50.2 vs. 44.2 dB), t₍₃₈₎ = 4.44, p < 0.0005. Finally, there was a marginal trend for geminate fricatives to have a lower mean Harmonic-to-Noise ratio (HNR) than singleton fricatives (0.8 < 4.0 dB), t₍₁₄₎ = 1.91, p = 0.074, i.e., to be “noisier.” The same trend was found with the vowel preceding /t:/ compared to /t/ (9.7 < 12.7 dB), t₍₂₂₎ = 1.93, p = 0.063. That is, in some instances, gemination seemed to be associated with a lower harmonicity measure.

Each contrast was presented four times in each of the four possible AXB combinations (AAB, ABB, BBA, and BAA) so that the stimuli appeared equiprobably in each position within the AXB triplets. There were thus 112 trials for the seven contrasts under scrutiny. These trials were part of a larger design including 128 other trials on word-initial voiceless stop singleton-geminate Tashlhiyt contrasts such as ks-kss or tili-ttili. The perception of these contrasts by Tashlhiyt listeners were reported in Ridouane and Hallé (2010) and in a forthcoming chapter comparing production and perception data; the French listeners' data on these contrasts will be reported elsewhere. In the present report, we will treat the 128 trials such as tili–ttili as filler trials and the other 112 as test trials. The trials were presented in 16 blocks, with one trial for each of the seven test trials in each block. Each block thus contained the same distribution of trials in terms of contrast-types as the entire experimental session, allowing for a time-course analysis of subjects' responses. Both trials within blocks and blocks within the test session were randomized, with a different randomization for each subject. The 16-block test session was preceded by 10 training trials on five contrasts, none of which was used in the test trials: daR-taR, dar-tar, kijji-gijji, tid-ttid, and jutid-juttid. Note that only the last two contrasts were geminate-singleton contrasts with /t/-/t:/ in initial vs. medial position.

Procedure

Participants were tested individually in a quiet room and received the speech stimuli through professional quality circumaural headphones (Sennheiser HD 518). On each AXB trial, participants were presented with three stimuli and had to indicate whether the second item X was a better category match to the first or the third stimulus, by depressing the response key labeled “1” or “3.” The inter-stimulus (offset to onset), inter-trial, and inter-block intervals were set to 1, 4, and 9 s, respectively. Response times were measured from the end of the critical consonant in the X stimulus. The experiment was run using the DMDX software (Forster and Forster, 2003). Participants received feedback for the training trials (accuracy and response time) but not for the test trials.

Results

Correct Discrimination Response Rate

The Tashlhiyt participants performed near ceiling on all contrasts, as expected. By comparison, the French participants performed poorly. They performed the most poorly on the word-initial voiced stop contrasts (e.g., bi-bbi), less poorly on the word-initial voiceless fricative contrasts (e.g., fit-ffit), and best on the word-final /t/-/tt/ contrasts (fit-fitt and hat-hatt). Table 3 shows the accuracy data detailed by contrast. Figure 1 shows the corresponding d-prime data, pooled by contrast-type. We computed d-prime values from the response data in the AXB task following MacMillan and Creelman (2005): for each AXB trial, “A” responses (response key labeled “1”) were (arbitrarily) treated as hits when correct, and as false alarms when incorrect. An analysis of variance was run on the d-prime data, with Subject as the random factor, Language as a between-subject factor (French vs. Tashlhiyt), AXB trial Target (X = singleton vs. geminate), and Contrast type (the three types under scrutiny) as within-subject factors. Note that the AXB trial Pattern, or primacy/recency factor (primacy: X = A vs. recency: X = B), which has been examined in some studies using an AXB procedure (Best et al., 2001; Hallé and Best, 2007), cannot be analyzed in the d-prime data since each d-prime value is computed from accuracy rate on primacy trials and error rate on recency trials. We thus used mixed model log-odds regression analyses, with both subjects and items as random factors, on the raw binary data (correct/incorrect) to examine the Pattern factor, among others. It was not significant, for either the French or the Tashlhiyt subjects' data². We do not report the detail of these analyses because they yielded exactly the same patterns of results as obtained with the analysis of variance on the d-prime data. We now return to this analysis. Let us first discuss the structural factor Target. Target was significant overall, F_{(1, 33)} = 20.05, p < 0.0001, and strongly interacted with Language, F_{(1, 33)} = 16.62, p < 0.0005. Indeed, Target did not reach significance in the Tashlhiyt group, F_{(1, 22)} = 2.51, p = 0.13, whereas it was significant in the French group, F_{(1, 11)} = 19.82, p < 0.001, with a poorer performance overall when X in AXB was a singleton than geminate stimulus (d-primes: 1.27 < 1.97; corresponding discrimination rates: 70.1 < 80.9%). Therefore, discrimination was more difficult around singleton than geminate stimuli for French but not Tashlhiyt participants. If we reason in terms of prototype-induced “magnet effects” (Kuhl, 1991; Kuhl et al., 2008), this asymmetry suggests that, for French listeners, singleton consonants are more typical of the corresponding native stops and fricatives than are geminates. In contrast, singletons and geminates did not differ in typicality for Tashlhiyt listeners. We turn now to the main factors of interest, Language and Contrast.

TABLE 3

Table 3. Experiment 1: Discrimination rate data detailed by contrast (with standard deviations).

FIGURE 1

Figure 1. Experiment 1: French vs. Tashlhiyt participants' d-prime data for the three types of contrasts subsumed as bi–bbi, sir–ssir, and fit–fitt. The error bars represent standard errors.

The main factor Language was highly significant, F_{(1, 33)} = 98.69, p < 0.00001, reflecting the overall better performance of Tashlhiyt than French participants. The Contrast factor was also significant overall, F_{(2, 66)} = 13.53, p < 0.00001. The strong Language × Contrast interaction, F_{(2, 66)} = 29.59, p < 0.00001, suggested that the Contrast effect differed between Tashlhiyt and French participants. Indeed, Contrast was significant only for French, F_{(2, 22)} = 18.14, p < 0.00001, not Tashlhiyt listeners, F_{(2, 44)} = 1.81, p = 0.18. As can be seen in Table 3, performance was near ceiling for Tashlhiyt participants for all contrasts, whereas it varied substantially with contrast type for French participants: about 66 < 75 < 90% (d′: 1.02 < 1.60 < 2.55) for the word-initial voiced stop, word-initial fricative, and word-final voiceless stop contrasts, respectively. The corresponding pairwise differences in the d-prime data all were significant [initial voiced stop vs. initial fricative: F_{(1, 11)} = 6.90, p < 0.05; initial fricative vs. final voiceless stop: F_{(1, 11)} = 10.84, p < 0.01].

We may note that even the lowest French performance (66% correct discrimination rate and d′ = 1.02 on the word-initial voiced stop contrasts) was above chance (or null sensitivity) level as shown by t-test comparisons with the 50% chance level on the accuracy rate data or, equivalently, with d′ = 0 on the d-prime data. However, whereas the Tashlhiyt performance was, as anticipated, optimal and near ceiling level for all three types of contrasts, the French performance was below the Tashlhiyt performance for all three types of contrasts, as shown by paired comparisons between the French and Tashlhiyt d-prime data for each contrast type (voiced stop: 1.02 < 3.08; fricative: 1.60 < 2.95; word-final voiceless stop: 2.55 < 3.02; ps < 0.005).

Correct Discrimination Response Times

Figure 2 shows the RT data for correct responses. Note that RT values were measured from the end of the critical consonant in the second stimulus (X); if the third stimulus (B) had been used instead, RTs would be shorter by 1500 ms in average (based on inter-stimulus interval and average stimulus durations).

FIGURE 2

Figure 2. Experiment 1: French vs. Tashlhiyt participants' response time data for the three types of contrasts subsumed as bi–bbi, sir–ssir, and fit–fitt. The error bars represent standard errors.

The raw RT data was cleaned up by discarding RT values longer than 3.5 s (more than 2 s after stimulus B's reference time) or shorter than 1.5 s (before stimulus B's reference time). About 0.6% of the French RT data was discarded in this way and 0.1% of the Tashlhiyt RT data. An analysis of variance was run on the cleaned-up RT data, using the same factors as for the d-prime data, with the addition of the AXB trial Pattern (or primacy/recency) within-subject factor. This latter structural factor Pattern was significant overall, F_{(1, 33)} = 19.80, p < 0.0001, but did not interact significantly with Language, F_{(1, 33)} = 2.79, p = 0.104. Pattern was significant for both groups [French: F_{(1, 11)} = 14.13, p < 0.005; Tashlhiyt: F_{(1, 22)} = 7.31, p < 0.05], with shorter RTs for “recency” (X = B) than “primacy” (X = A) trials in both groups (French: 2133 < 2264 ms; Tashlhiyt: 1810 < 1873 ms). That is, X = A responses were generally more difficult than X = B responses. Note that such “recency” effects in AXB discrimination experiments seem to be specifically associated with the subjects' detection of nonlinguistic (as opposed to phonological or phonetic) differences (Crowder, 1971, 1973; Best et al., 2001: 786). Because no parallel recency effects were found in the accuracy data (as analyzed using mixed model log-odds regression: see Section Correct Discrimination Response Rate in Experiment 1), and because the contrasts at stake indeed are phonological for Tashlhiyt listeners, who exhibited clear recency effects for RTs, we must ascribe the recency effects in the RT data to a general bias favoring psychoacoustic detection of nonlinguistic differences. The structural factor Target was significant overall, F_{(1, 33)} = 110.59, p < 0.00001, and did not interact with Language, F < 1. Target was significant for both groups [French: F_{(1, 11)} = 26.76, p < 0.0005; Tashlhiyt: F_{(1, 22)} = 101.19, p < 0.00001], reflecting the same pattern: longer RTs for singleton than geminate X targets (French: 2282 > 2114 ms; Tashlhiyt: 1914 > 1769 ms). Therefore, still reasoning in terms of possible “magnet effects,” the RT data suggest that singleton items may be more prototypical than geminate items for both French and Tashlhiyt listeners.

Turning now to the main effects, Language was highly significant, F_{(1, 33)} = 13.36, p < 0.001, with Tashlhiyt listeners faster than French listeners by about 356 ms. The Tashlhiyt advantage held for all three types of contrast, ps < 0.01. However, as suggested by a significant Language x Contrast interaction, F_{(2, 66)} = 11.07, p < 0.0001, this advantage varied across contrasts: As can be seen in Figure 2, it was the smallest for the word-final voiceless stop contrasts. The Contrast factor was significant overall and for both groups [French: F_{(2, 22)} = 53.76, p < 0.00001; Tashlhiyt: F_{(2, 44)} = 8.25, p < 0.001]. For both groups, there was no significant RT difference between the word-initial voiced stop and voiceless fricative contrasts (shorthand: bi-bbi and sir-ssir, respectively), F < 1, and these bi-bbi and sir-ssir types of contrast yielded longer RTs than the word-final voiceless stop fit-fitt type of contrast (French: ps < 0.00001; Tashlhiyt: ps < 0.01). The significant Language x Contrast interaction (see above) also indicates that the RT advantage for fit-fitt over bi-bbi or sir-ssir was larger for French than Tashlhiyt subjects (256 vs. 91 ms).

To sum up, the RT data largely paralleled the accuracy or d-prime data. For French participants, the fit–fitt type contrasts were responded to faster than the other two types, confirming they are easier. For Tashlhiyt participants, accuracy was near ceiling level for all three types of contrasts (cf. Table 3), with some differences in response times, namely faster RTs for the fit–fitt type than the bi–bbi or sir-ssir types.

Discussion

The results of this discrimination experiment showed that, whereas Tashlhiyt listeners performed at near ceiling level on each of the three types of contrasts, French listeners encountered substantial difficulty with the bi–bbi and, to a lesser extent, with the sir–ssir type of contrast³. Based on both the accuracy or d-prime and the response time data, they encountered considerably less difficulty with the fit-fitt type of contrast, for which their performance approached the Tashlhiyt performance, although still significantly well below it. In other words, the data suggest a clear ordering of the Tashlhiyt contrasts in terms of their difficulty for French listeners: bi–bbi<sir–ssir<fit–fitt. This ordering is clearly at odds with the ordering we predicted based on the intensity of the acoustic substance of the consonants involved. However, before dismissing this simple prediction, possible confounds must be examined.

First, learning may have occurred during the experimental session, differently biasing French performance in the direction we found. For instance, French subjects' highest performance on the fit–fitt contrast type might be due to the higher incidence of word-final /t/s, singleton or geminate, than any word-initial consonant. There were indeed twice as many word-final /t/s as word-initial /b/s, /d/s, /g/s, /f/s, or /s/s. To test for any kind of learning effect possibly due to the somewhat unbalanced design we used in terms of critical consonants or number of items per contrast type, we conducted a time course analysis of the French discrimination data. Time-course analysis was facilitated by the experimental design into blocks of equal size, each containing the same distribution of trials in terms of contrast types, as explained above in Section Stimuli and Design in Experiment 1. The analysis suggests that some learning took place, resulting in a slight improvement especially for speed, as can be seen in Figure 3 (d-prime data) and Figure 4 (RT data), which show the time-course of performance during the experimental session (test phase) divided into four successive parts⁴. For sake of exhaustiveness, these figures show the performance of both French (A) and Tashlhiyt (B) subjects, and on both the contrasts targeted in this study and the word-initial voiceless stop contrasts, which we treated as fillers in the present study. The analyses of variance we ran, however, were restricted to the French data on non-filler stimuli, with Subject as a random factor, Contrast type (bi–bbi, sir–ssir, and fit–fitt types) and session Part (parts 1–4) as within-factors, d-prime or RT as the dependent variable. The Part factor was not significant for d-prime, F_{(3, 33)} = 1.39, p = 0.26, and marginally significant for RT, F_{(3, 33)} = 2.63, p = 0.066, indicating a trend toward acceleration of the responses. As can be seen in Figure 3A, evidence for learning in terms of d-prime was found only in the case of the bi-bbi contrasts in the last part of the session, F_{(1, 11)} = 5.01, p < 0.05. Importantly, the interaction between Contrast and Part was far from significant for both d-prime and RT, Fs < 1. That is, the differences among the three types of contrast reported in Section Results in Experiment 1 for the entire session held for each chronological subpart. Or, put another way, there was no sign of differential learning effects according to contrast type that could explain the differences across contrast types, in particular, the best performance for the fit–fitt contrast type. This suggests that listeners may tend to learn contrasts across the experimental session on the basis of entire syllables rather than single consonants: For example, hat–hatt and fit–fitt do not help each other and provide better learning of word-final /t/ than learning of, say, /b/, which only appears in bi–bbi.

FIGURE 3

Figure 3. (A,B) Experiment 1: (A) French and (B) Tashlhiyt participants' d-prime data for the three types of test contrasts (D, S, and T for the bi–bbi, sir–ssir, and fit–fitt contrast types, respectively), in four successive parts of the experimental session. The data on the filler contrasts such as ks-kks (noted U) are shown for sake of comparison. The error bars represent standard errors.

FIGURE 4

Figure 4. (A,B) Experiment 1: (A) French and (B) Tashlhiyt participants' RT data for the three types of test contrasts (D, S, and T for the bi–bbi, sir–ssir, and fit–fitt contrast types, respectively), in four successive parts of the experimental session. The data on the filler contrasts such as ks-kks (noted U) are shown for sake of comparison. The error bars represent standard errors.

Another possible confound is that, whereas the intensity of the critical acoustic substance was rather constant within the sir–ssir and fit-fitt types of contrast (sir–ssir: 49.7 vs. 50.7 dB mean intensities, |t| < 1; fit–fitt: acoustic silence for both singleton and geminate voiceless stop occlusions), it was clearly softer for geminates than singletons in the voiced occlusions of the bi–bbi type of contrast: 41.9 < 46.5 dB, t₍₂₂₎ = 3.24, p < 0.005. Thus, intensity and duration of the voiced occlusion tended to trade off in the acoustic production of the bi–bbi type of contrasts (cf. the well-known time-intensity trade-off). This intensity-duration trade-off might explain the particular difficulty encountered by French listeners on the bi–bbi type of contrasts as a general auditory phenomenon (Shinn, 2007). To test for this possibility, we ran French participants on a second discrimination experiment comprising the same seven contrasts used in Experiment 1, with the addition of a manipulated bi–bbi type of contrast, in which the intensity of the voiced occlusions were equalized between singleton and geminate stops. The purpose of this manipulation was to test the impact on French performance of the intensity-duration trade-off in voiced occlusions.

Experiment 2

This experiment was designed to replicate Experiment 1 with additional contrasts of the bi-bbi type, in which signal intensity during prevoicing was manipulated so that geminate and singleton voiced stops were equalized in terms of prevoicing loudness. This manipulation was intended to test for the possibility that the poor French performance on the voiced stop contrasts was due to the prevoicing intensity-duration trade-off found in the original stimuli. Only French participants were tested since Tashlhiyt listeners were already shown to perform near ceiling level.