Within-Speaker Perception and Production of Two Marginal Contrasts in Illinois English

Zhang, Jennifer; Graham, Lindsey; Barlaz, Marissa; Hualde, José Ignacio

doi:10.3389/fcomm.2022.844862

ORIGINAL RESEARCH article

Front. Commun. , 21 June 2022

Sec. Psychology of Language

Volume 7 - 2022 | https://doi.org/10.3389/fcomm.2022.844862

This article is part of the Research Topic Fuzzy Boundaries: Ambiguity in Speech Production and Comprehension View all 12 articles

Within-Speaker Perception and Production of Two Marginal Contrasts in Illinois English

$\nJennifer Zhang$ Jennifer Zhang¹

Lindsey Graham¹

Marissa Barlaz¹

José Ignacio Hualde^1,2^*

¹Department of Linguistics, University of Illinois at Urbana-Champaign, Urbana, IL, United States
²Department of Spanish and Portuguese, University of Illinois at Urbana-Champaign, Urbana, IL, United States

The notion of marginal contrasts and other gradient relations challenges the classification of phones as either contrastive phonemes or allophones of the same phoneme. The existence of “fuzzy” or “intermediate” contrasts has implications for language acquisition and sound change. In this research, we examine production and perception of two marginal contrasts [ɑ-ɔ] (“cot-caught”), where two original phonemes are undergoing a merger, and [ʌi-aɪ] (“writer-rider”), where a single original phoneme has arguably split into two contrastive sounds, albeit in a limited manner. Participants born and raised in Illinois were asked to provide recordings of cot-caught and writer-rider pairs embedded in sentences, followed by the target word in isolation. They then completed ABX and two-alternative forced choice two-alternative forced choice (2FC) perception tasks with stimuli produced by two native speakers from the Chicagoland area. Results showed that the [ʌi-aɪ] contrast, which has been defined as marginal in other work, is actually currently more phonetically and phonologically stable than [ɑ-ɔ] for the group of speakers that we have tested, with a more robust link between production and perception. The cot-caught merger appears to have progressed further, compared to what had previously been documented in the region. Our results and analysis suggest different sound change trajectories for phonological mergers, regarding the coupling of production and perception, as compared with phonemic splits.

Introduction

The words in a language are commonly analyzed in terms of unique phonological units, which by themselves are meaningless but combine according to the constraints of the language to bring about meaning (Hockett, 1958, 1960). This system of categorizing sounds assumes a specific set of phones for each language, with phones falling into one of two categories: phoneme (contrastive) or allophone (non-contrastive). Traditionally, two sounds are considered to be contrastive if, in at least one phonological environment, the choice of phone may result in lexical minimal pairs; the choice of phone cannot be predicted from the environment alone. Conversely, if the choice between two sounds can be predicted from their phonological environment, then the two sounds are allophones. Many phonological processes appear to ignore non-contrastive features, and contrast-based theories hold that the only features that can be phonologically active are those that serve to distinguish and contrast members of the underlying phonemic inventory (see Kiparsky, 1985; Hall, 2007; Dresher, 2009).

However, numerous researchers have pointed out the existence of distinctions between phones which cannot be easily categorized as either phonemic or allophonic (e.g., Goldsmith, 1995; Ladd, 2006, 2014; Nadeu and Renwick, 2016). Hall (2013) offers a comprehensive overview of these intermediate phonological relationships and provides a typology illustrating the many different ways in which contrasts can be marginal. In the literature, such relationships have previously been referred to as semi-phonemic (e.g., Bloomfield, 1939; Crowley, 1998), quasi-phonemic (e.g., Scobbie et al., 1999; Hualde, 2005), weak contrast (e.g., Hume and Johnson, 2001; Walker, 2005; Martin and Peperkamp, 2011), partial contrast (e.g., Hume and Johnson, 2003; Chitoran and Hualde, 2007; Kager, 2008), gradient phonemicity (e.g., Boulenger et al., 2011; Ferragne et al., 2011), and marginal contrast or marginal phoneme (e.g., Vennemann, 1971; Kiparsky, 2003; Edwards and Beckman, 2008; see also Hall, 2013; Renwick et al., 2016). Even in cases of phonological neutralization, where a contrast that is neutralized in a specific environment is still considered to be present elsewhere, some researchers have interpreted neutralization as an example of a “partial contrast,” intermediate between full contrast and full allophony (Hume and Johnson, 2003; Kager, 2008). There are also cases where the distribution of a contrast in the lexicon may not be as reliably or consistently employed as expected, although the sounds themselves may be clearly distinct phonetically (Renwick and Ladd, 2016).

The notion of marginal contrasts and other gradient relationships challenges the division of phones into strict phonemic categories. The existence of marginal contrasts has implications for models of speech perception and language acquisition (both first and additional language) that rely on learner identification of contrastive phonological units and also has implications for sound change, in that speakers can acquire a distinction that is not necessarily utilized to identify words in speech.

Additionally, a speaker's ability to perceive marginal contrasts may not be directly correlated to their ability to produce that contrast and vice versa. Studies of sound changes in progress have shown that perception and production often do not proceed symmetrically, with changes occurring earlier in perception than in production (Di Paolo and Faber, 1990; Herold, 1990; Harrington et al., 2012; Kleber et al., 2012; Kuang and Cui, 2018), although some evidence has been found for a production lead when the relevant cues for production and perception are misaligned (e.g., Coetzee et al., 2018). Listeners may also still be able to perceive a contrast that they no longer produce (Labov, 1994; Hay et al., 2013; Coetzee et al., 2018; Pinget et al., 2020). Differences in perception and production in the actuation of a sound change may misalign, as perception and production may in fact be based on different targets or exemplars (Garrett and Johnson, 2013).

Speaker intuitions can also be a valuable resource for examining metalinguistic awareness of marginal contrasts. Previous research with Catalan (Nadeu and Renwick, 2016; Renwick and Nadeu, 2018) and Italian speakers (Renwick and Ladd, 2016), both populations with marginal mid vowel contrasts and the commonly used metalinguistic language to describe said contrast (“closed” and “open” mid vowels), has found speakers to be relatively accurate judges of their own productions. However, the prevalence of mismatches between production and speaker intuition involving members of a mid vowel contrast pair, relative to mismatches between pairs of mid vowels and corner vowels [i, a, u], separate the marginal mid vowel contrast on some dimension of phonological closeness.

In this article, we are concerned with the interaction between perception and production in two cases of marginal contrast in Illinois American English: [ɑ-ɔ], as in cot vs. caught (Experiment 1), and [ʌi-aɪ] as in writer vs. rider (Experiment 2). These two cases of marginal contrast differ in their diachronic provenance. The former represents an ongoing merger of two phonemes. The latter, instead, is a case of phonemic split, as it has arguably resulted from the phonological recategorization of allophones as (quasi-)contrastive units. This phenomenon is often known as Canadian raising. In both cases, we are interested in determining to what extent the degree to which the two categories are separated in individual speakers' productions determines their behavior in perception, as well as the relation of speakers' intuitions about contrastiveness to their own production and perception. Although there is a substantial literature on each of the two vowel phenomena we examine here, we are not aware of previous research that has compared the production-perception link in both a merger in progress and a split in progress for the same group of speakers.

Experiment 1: A Merger in Progress: Production and Perception of [ɑ-ɔ] (“Cot-Caught”)

Background and Research Question

One example of a contrast that could be considered marginal in some varieties of present-day American English is the [ɑ-ɔ] (“cot-caught”) low back vowel pair. The merger of these two phonemes was first attested in the US in the 1930s (Kurath, 1939) in parts of western Pennsylvania and eastern New England. Labov et al. (2006) later documented the distribution of the cot-caught merger, showing that the merger was highly advanced or completed in western Pennsylvania, and progressing in eastern New England and the western half of the United States. In contrast, the Inland North, the Mid-Atlantic, and the South were identified as regions that showed resistance to the low back merger. When the data for the Atlas of North American English (Labov et al., 2006) were collected, results of minimal pair perception tests for speakers in the Inland North (including Chicagoland, part of the area under study) showed no trace of a merger, with participants universally responding that the presented minimal pairs were different from one another. The maintenance of the /ɑ/ vs. /ɔ/ contrast was attributed to the fronting of /ɑ/, part of the Northern Cities Chain Shift (Labov, 1994; Clopper et al., 2005), making it rather distinct from /ɔ/. The low back merger was also found to be most advanced in syllables closed by nasal consonants and most conservative before velar /k/.

Even in varieties where /ɑ/ and /ɔ/ are clearly contrastive phonemes, the distribution of the two phones is not entirely free, and the presence or lack of a contrast is sometimes predictable from context. As Labov et al. (2006: 57) explain, historically, /ɑ/ descends from Middle English short /o/, with the addition of some /o/ words directly borrowed from French and some words where /a/ was rounded after /w/ (watch, want, wander, etc.). The resulting phone occurs before all but two consonants, /v/ and /ʒ/, in American English. In contrast, /ɔ/ has a more limited distribution. This vowel is, for the most part, a direct continuation of the Middle English diphthong /aw/ (which had a number of Old English and Old French sources). In addition, a number of words that had Middle English /o/ have been transferred to the /ɔ/ class, e.g., dog, long, loss (before /g/, /ŋ/, and voiceless fricatives, but without affecting all lexical items with these following contexts). Presently, for some speakers in Illinois, a contrast is attested in the phonological contexts shown in Table 1, where both phones occur (see also Labov et al., 2006: 57).

TABLE 1

Table 1. Contrastive contexts in US varieties without merger of /ɑ/ and /ɔ/.

In the contexts in Table 2, on the other hand, the contrast appears to have been neutralized for all speakers in the Midwest dialect that we explore here. Labov et al. (2006: 57), describe a slightly different distribution, including a possible contrast before /g/, as in log vs. dog, that does not seem to exist in the geographical area under study. These authors do in fact report variation before /g/.

TABLE 2

Table 2. Contexts of phonological neutralization in varieties without complete merger.

Even for contexts where a robust contrast has been reported, e.g., before /l/, the realization of the contrast may be less certain given the lack of representative words or given gaps in the lexicon. The merger between the two vowels appears to proceed in gradient fashion, occurring first before nasal consonants (Labov et al., 2006).

Based on informal observation, we suspect that the merger is currently more advanced in our target population (young speakers from northern and central Illinois) than some decades ago. We expect to find three or even four types of speakers: (a) speakers with a clear contrast in production and perception, (b) speakers who have merged the two phonemes in production and do not perceive them as different vowels, (c) as in other cases of mergers in progress, we also expect to find some speakers who have a marginal contrast in production, but cannot reliably identify or discriminate the two historical phonemes; that is, a merger in perception may precede a merger in production (Labov, 1994, 2011). Finally, some recent research indicates that in some mergers in progress there are listeners who can still perceive a contrast that they no longer produce (Hay et al., 2013; Coetzee et al., 2018; Pinget et al., 2020); thus, we may also expect to find a group (d) of speakers who do not produce the contrast but can reliably perceive it. Depending on the types of speakers found, these groups may help elucidate potentially differing patterns regarding the progression of the cot-caught merger.

Methods

Participants

Thirty-six participants were recruited among the undergraduate student population at an Illinois university to participate in this study. Of these participants, 11 did not complete all tasks and were excluded from analysis. Since the focus of this study is variation within northern and central Illinois, an additional 5 participants were excluded as they reported being born in a different country or state. The remaining 20 participants (14 females, 6 males) all reported being born and raised in Illinois. Of these, 14 are from Chicago and its suburbs, 4 from Central Illinois, and 2 from Illinois near the St. Louis area. We decided not to exclude the two St. Louis-area speakers¹ because this area has been shown to participate in some of the vowel changes that are found in Chicago, such as the ones under study (Labov, 2007). For the place where each participant was raised, see Figure 1 (in Section Discussion). Their ages ranged from 18 to 25. Participants were volunteers or received extra credit for their participation in an undergraduate linguistics course. These subjects participated in one production task and two perception tasks.

FIGURE 1

Figure 1. Participant locations by production cluster for cot-caught (left) and writer-rider (right). Participant locations by production cluster (black circle: no contrast [low Pillai score]; orange triangle: contrast [high Pillai score]; Google, n.d.).

Stimuli for Production Study

For the production task, the goal was to create balanced lists of 20 pairs for each contrast under study. For the cot-caught contrast, the stimuli consisted of 13 monosyllabic minimal pairs with an alveolar coda (e.g., caught vs. cot), 3 monosyllabic near-minimal pairs (e.g., laud vs. lot), and 4 monosyllabic non-minimal pairs to complete the set of 20. Stimuli for our Experiment 2 (20 pairs) were also presented together. Filler items consisted of 10 pairs distinguished by their codas (e.g., bet vs. bed) and 10 homophone pairs (e.g., flower vs. flour). This resulted in 60 total pairs for a total of 120 productions. The stimuli for production are shown in Table 3.

TABLE 3

Table 3. Stimuli used in production and perception tasks.

Stimuli for Perception Study

For our perception study, our goal was to create balanced lists of 10 minimal pairs. The 10 minimal pairs (20 words) for the cot-caught contrast were all monosyllabic with coda consonants /t, d, n, k/. The stimuli used for the perception tasks are also shown in Table 3. Fillers in the perception tasks included 10 minimal pairs (20 words) distinguished by their codas (e.g., mat vs. mad, mate vs. made) and 4 homophone pairs (8 words) (e.g., metal vs. medal) as distractors. Tokens created for our Experiment 2 (9 minimal pairs, 18 words) on Canadian raising (see Experiment 2) also were presented together. Each word was presented two times, resulting in a total of 132 tokens.

The stimuli were produced by two native speakers, one female (Speaker F) and one male (Speaker M) from the Chicagoland area. These two model speakers were recruited to produce the stimuli because they reported producing a contrast between both cot-caught words and writer-rider words, and they both grew up in Illinois, like the participants in our experiments. The stimuli were recorded in a soundproof booth, using a Marantz PDM 750 solid state recorder and an AKG C5C20 head-mounted microphone at a sampling rate of 44.1 kHz. Based on formant values, all target stimuli included in the perception task showed a difference in vowel quality between cot-words and caught-words, although this difference was not always of the same magnitude or produced in the same manner. Figure 2 shows average time-normalized formant trajectories for the stimuli produced by each of the two model speakers. Note that cot-words have higher values for both formants than caught-words, indicating a lower and less retracted articulation for [ɑ] than for [ɔ].

FIGURE 2

Figure 2. Formant contours for cot-caught stimuli. Time-normalized formant contours (F1 and F2) in Hz for the two speakers providing perception stimuli, separated by target phone. Formants for cot-words are shown in solid black lines and formants for caught-words in dashed orange lines.

Procedure

Participants first completed a background questionnaire to provide demographic information and confirm that they had been born and raised in Illinois.

Production

For the first experimental task, participants were asked to provide recordings of 120 target words (20 caught-cot pairs, 20 writer-rider pairs, and 20 filler pairs). Because of the COVID-19 pandemic situation, conducting the experiment in a phonetics laboratory was not feasible at that time. Instead, participants were asked to record themselves in a quiet room, using their phones² or laptops. The recording material was presented via PowerPoint slides, and a copy of the PowerPoint slides was shared with each participant. Each target word was embedded in a sentence (e.g., “The word ____ in English [means/refers to/is…]”) which was presented on one slide, followed by the same target word in isolation on the next slide. The tokens were presented in pseudo-random order such that each presented token was not followed by a member of a potential minimal pair. The recording session was divided into four blocks, and participants were asked to submit their recordings at the completion of each block. The first production of each block consisted of a filler sentence and word.

ABX

Following production, participants completed the first perception task: an ABX task administered online via Qualtrics. The stimuli consisted of target words in isolation, with an interstimulus interval of 500ms. Participants heard a presented ABX series and were asked to select whether X was the same word as the first word they heard (A) or the second (B) by clicking on the number “1” or “2” on the screen. Upon making a selection, the next ABX series was automatically presented. Tokens produced by Speaker F were used for A and B of the ABX task, and tokens produced by Speaker M were used for X. The use of two speaker voices requires some level of abstraction by the participant, particularly as our speakers differ in sex. The stimuli were presented in pseudo-random order such that no two types of the same category (cot-caught, writer-rider, coda-distinguished filler, homophone filler) were presented sequentially; the presentation of one category of stimuli was always followed by a different category, e.g., rider-writer (rider-writer pair) followed by mist-missed (filler-pair) followed by cawed-cod (cot-caught pair), followed by mate-made (coda-distinguished pair). There was a total of 132 items presented, of which 40 tokens (20 pairs) were representative of the cot-caught contrast. Tokens created for our Experiment 2 on Canadian raising (see Experiment 2) also were presented together.

Two-Alternative Forced Choice Identification (2FC)

Participants then completed the second perception task: a two-alternative forced choice (2FC) word identification task administered online via Qualtrics. The stimuli presented were the same as in the ABX task, but participants were instead asked to identify an auditorily presented word by clicking on one of two words presented on the screen. For example, they would hear the word cot and either click on the presented text < cot> or < caught>. Upon making a selection, the task automatically moved to the next 2FC item.

Exit Survey

Finally, participants completed an exit survey which probed their phonological intuitions of the contrast as spoken by themselves (e.g., “Do you think you pronounce cot and caught in the same way?”), their parents or guardians, and by their social circles. They were also asked to describe any differences in their pronunciation [A similar methodology was used in Renwick and Nadeu (2018)].

Acoustic Analysis of Production Data

The target words were first force-aligned with the Montreal Forced Aligner (McAuliffe et al., 2017), then corrected by hand in Praat (Boersma and Weenink, 2021). Waveform and spectrogram information was used for manual corrections. When the preceding consonant was an obstruent, the left boundary of the vowel was placed at the first zero uprising after the onset of glottal vibration, when formant structure was visible. When the preceding consonant was a fricative, the left vowel boundary was similarly placed at the first zero uprising when formant structure was visible, after the offset of high energy frication noise. When the preceding consonant was a sonorant, changes in formant structure and intensity were used to place segmental boundaries. The right boundary of the vowel was similarly determined by decreases in intensity and changes in formant structure. The vowels were manually assigned labels that would correspond to the presence, rather than merger, of a phonological contrast. Following segmentation, F1 and F2 values were automatically extracted at the 50% duration of the vowel.

Statistical Treatment

As a measure of distance between vowel distributions, we calculated Pillai scores for each vowel pair at the 50% duration of the vowel for each participant (Nycz and Hall-Lew, 2013; Jibson, 2021). A higher Pillai score, closer to 1, results from greater distance and less overlap between vowel pairs, indicating a stronger contrast. A lower Pillai score, closer to 0, results from greater overlap, indicating a weaker contrast or no contrast at all. The Pillai scores were then submitted to a k-means cluster analysis using the function kmeans from the stats package in R (R Core Team, 2019). The analysis was run for 2-4 groups. We decided to use cluster analysis as opposed to determining a threshold for the classification of participants as having one phoneme or two (as in, e.g., Labov et al., 2006), precisely because we want to allow for the possibility of having intermediate situations between merger or not merger and split or not split.

Since we are interested in determining the relation between production and perception, we ran correlations (cor.test from stats in R) between Pillai scores and perception accuracy results. Linear mixed effects regressions were run on accuracy rates and formant values with the function lmer in the package lme4 (Bates et al., 2015) and p-values were obtained with the emmeans package (Lenth, 2022). In addition, we considered the extent of participants' phonological intuitions concerning the existence of a contrast in their speech or lack thereof, and how this corresponded to their performance in our two perception tasks.

Results

Production Results for Experiment 1

Figure 3 displays average F1 and F2 values over normalized time for words belonging to the traditional /ɑ/ class (cot) and words belonging to the /ɔ/ class (caught). Each participant is shown on their own plot, and the plots are organized from lowest Pillai score (Participant 003) to highest (Participant 026). Speakers ranged from no discernible contrast in Participant 003 (Pillai score = 0.02) to a very clear contrast in Participant 026 (Pillai score = 0.89).

FIGURE 3

Figure 3. Formant contours for cot-caught by participant. Individual time-normalized formant contours (F1 and F2) in Hz, separated by target phone. Participants are organized from lowest to highest Pillai score at 50% vowel duration.

Figure 4 shows the vowel plots for Participants 003 (lowest Pillai score = 0.02) and 026 (highest Pillai score = 0.89), showing an example of the difference in degree of overlap for speakers with the merger and with the contrast. The vowel plot for Participant 003 shows a clear overlap between cot-type words and caught-type words. In comparison, the two types of vowels are clearly distinct for Participant 026.

FIGURE 4

Figure 4. Vowel plot for cot-caught productions with low vs. high Pillai scores. Vowel plot of individual tokens representative of the cot-caught contrast for Participant 003 (lowest Pillai score) and Participant 026 (highest Pillai score). Cot-words are graphed in black and caught-words in orange. Ovals indicate the 95% confidence interval (2 standard deviations) for each vowel.

Participants were clustered first based on production alone, using cluster analysis specified for 2 to 4 clusters. Based on the total within-cluster sum of squares, the best clustering resulted in 2 groups according to their productions, which we may think of as mergers and non-mergers. The merger group includes 13 participants with Pillai scores ranging from 0.02 to 0.30, and the non-merger group includes 7 participants with Pillai scores ranging from 0.35 to 0.89.

Perception Results for Experiment 1

Participants were also independently clustered based on average perception accuracy between ABX and 2FC. Participants did not fall into the same clusters for production as for perception, so the two clustering analyses were visually combined to show inconsistencies between mergers in perception and production. The resulting groups are shown in Figure 5, along with the correlation between production of [ɑ-ɔ] and each perception task.

FIGURE 5

Figure 5. Correlations between Pillai scores at 50% vowel duration and ABX perception accuracy (left) or 2FC perception accuracy (right) for cot-caught. Participants are clustered based on perception and production independently, with the two clustering analyses visually combined to show 4 groups, where the first NO or YES of each label signifies the contrast in perception and the second NO or YES signifies a contrast in production. NO/NO (participant clustered in the group with no contrast based on Pillai scores for both production and perception) is shown in black. NO/YES (participant in no-contrast group in perception, but in contrast group in production) is in orange. YES/NO (participant in contrast group in perception, but in no-contrast group in production) is in blue and YES/YES (contrast in both production and perception) is in green.

As can be seen in Figure 5, participants fall into one of four possible groups according to Pillai-score-based clustering: (1) NO contrast in perception and NO contrast in production (black in Figure 5), (2) NO contrast in perception, YES contrast in production (orange), (3) YES contrast in perception, NO contrast in production (blue), and (4) YES contrast in both perception and production (green). For speakers with the contrast in both production and perception, formant values for [ɑ] and [ɔ] were significantly different for F1 (p < 0.05) and F2 (p < 0.001). Those without the contrast showed no significant differences among formant values. Perception accuracy rates between the NO/NO and YES/YES groups were significantly different for the 2FC task (p < 0.05) but not for ABX (p = 0.23).

Participants 002 and 006 are examples of speakers in group 2, who showed a contrast in their production (Pillai scores = 0.35 and 0.47), but their perception accuracy was around chance (40-65%). In the opposite direction, Participants 011 and 031 are examples of speakers in group 3, those who could perceive the contrast with accuracy rates ranging from 70 to 80%, but whose productions had low Pillai scores (0.04 and 0.29) and were therefore considered to be merged in production. Group 3 in particular was not specifically hypothesized to exist, based on the assumption that vowel mergers in perception tend to precede merger in production, yet it comprises 35% of our participants. These findings do align, however, with recent research regarding perception and production inconsistencies (e.g., Hay et al., 2013; Coetzee et al., 2018; Pinget et al., 2020).

Overall, the correlations between perception and production were very weak (R² = 0.06 between production and ABX, R² = 0.09 between production and 2FC). Perception accuracy for individuals clustered as non-mergers was 73% (averaged between ABX and 2FC), whereas average accuracy for those clustered as mergers was 69%; perception accuracy was thus similar regardless of their clustered status as mergers or non-mergers.

Exit Survey Results for Experiment 1

Based on their responses to the exit survey, participants who were clustered as non-mergers in both perception and production appeared to be metalinguistically aware of the contrast. For those who reported having a contrast (n = 12), 9 of them also showed the contrast in production. The other 3 participants who self-reported a distinction were instead clustered as showing a merger in their production. Fewer participants reported having merged productions (n = 8), and their productions were split between the merger group and the non-merger group. This data is visualized in the violin plots as seen in Figure 6, which show participants' self-reported distinctions compared with their Pillai scores at 50% vowel duration.

FIGURE 6

Figure 6. Pillai scores by self-reported contrast for cot-caught. Pillai scores for caught-cot by participant, separated by self-reported distinction of the contrast.

As for the correlation between participants' phonological intuitions and their performance in the perception tasks, participants who reported making a contrast between cot and caught outperformed participants who reported not making a distinction, as can be seen in Figure 7. This is an expected result. Accuracy was somewhat higher for both groups of participants in the forced-choice identification task.

FIGURE 7

Figure 7. Accuracy in perception by self-reported contrast for cot-caught. Accuracy in the two perception tasks (ABX and 2FC Identification), separated by self-reported distinction of caught-cot.

Furthermore, participants who reported making a contrast in production also described differences in vowel pronunciation on the exit survey, with a number of participants transcribing [ɑ] in cot as < ah> and [ɔ] in caught as < aw> (e.g. “cot is more ‘KAHt' caught is more ‘CAWt”' and “In the word cot, the ‘o' makes more of an ‘ah' sound rather than the ‘aw' sound found in caught”). Our results suggest that speakers who believe they have a contrast do in fact produce it. However, their perceptual discrimination abilities, based on clustering, do not always mirror the contrast as found in production, again based on clustering. The resulting four groups from our clustering analyses also show different patterns of perception and production as they relate to a merger in progress. Out of our 20 speakers only 12 reported having a phonological contrast in their own speech, and less than half of them (3) fell within the non-merger clusters in both production (high-Pillai-score cluster) and perception (average ABX and 2FC accuracy). The historical phonemic contrast between /ɑ/ and /ɔ/ now thus has the status of a marginal or fuzzy contrast for this group of speakers, with most speakers showing variable behavior in perception and production that is inconsistent with the existence of a robust contrast between two phonemes.