Correction by Focus: Cleft Constructions and the Cross-Linguistic Variation in Phonological Form

Greif, Markus; Skopeteas, Stavros

doi:10.3389/fpsyg.2021.648478

ORIGINAL RESEARCH article

Front. Psychol. , 29 November 2021

Sec. Psychology of Language

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.648478

This article is part of the Research Topic Experimental Approaches to Pragmatics View all 19 articles

Correction by Focus: Cleft Constructions and the Cross-Linguistic Variation in Phonological Form

$\r\nMarkus Greif$ Markus Greif¹

Stavros Skopeteas^1,2*

¹Linguistics Department, Bielefeld University, Bielefeld, Germany
²Linguistics Department, University of Göttingen, Göttingen, Germany

A challenging issue of cross-linguistic variation is that the same syntactic construction may appear in different arrays of contexts depending on language. For instance, cleft constructions appear with contrastive focus in English, but in a larger array of contexts in French. A part of the cross-linguistic variation may be due to prosodic differences, since prosodic possibilities determine the array of focus structures that can be mapped onto one and the same syntactic configuration. In the present study, we compare languages with flexible nuclear-accent placement (English, German), with languages that do not use this prosodic strategy (French, Mandarin Chinese). In a speech production experiment, we examine the prosodic realization of contrastive focus and identify prosodic reflexes of focus in all languages. The presence of different phonetic reflexes of focus suggests that – anything else being equal – the same syntactic constructions should be possible in the same array of contexts. In an acceptability study with written questionnaires, we examined the felicity of cleft constructions in contexts licensing a focus within the cleft clause. This focus structure is orthogonal to the preferred focus structure of cleft constructions and can appear in cases of second-occurrence foci (in contexts of correction). The obtained judgments reveal a distinction between languages with flexible nuclear-accent placement (English, German) and languages with other types of reflexes of focus (French, Chinese): languages of the former type have an advantage in using cleft constructions with a focus within the cleft clause, which shows that the array of contexts of using clefts in English and German is not a proper subset of the array of contexts applying to the same constructions in French and Chinese. The obtained differences can be explained by the role of prosodic devices and corroborate the view that prosodic reflexes of focus have different semantic-pragmatic import: it is easier to establish a focus structure that is orthogonal to the syntax in a language with flexible nuclear-accent placement (English, German); this does not hold for prosodic correlates of focus that reinforce the articulation of prosodic constituents (French) or the articulation of lexical tones (Chinese).

Introduction

Discourse notions such as topic and focus are reflected in different grammatical layers, notably in syntax and prosody. The idea that these layers are complementary has been fruitfully used in order to account for the fact that similar syntactic constructions appear in different arrays of contexts depending on language. Vallduví and Engdahl (1996: 497) explain the differences in the use of syntactic movement in Catalan and English in terms of prosodic plasticity. ‘Plastic’ languages, such as English, shift the nuclear stress signaling that the focus is part of the stressed constituent; in ‘non-plastic’ languages, such as Catalan, the nuclear stress appears in a fixed position within the linearization (in case of Catalan, it is the rightmost constituent); syntactic operations are employed such that the focus appears in the position that bears the nuclear stress.¹ In the same vein, Samek-Lodovici (2005) accounts for the choice of alternative strategies to express focus in English, Italian, and Bantu languages by means of alternating rankings of constraints that sanction deviations from syntactic and prosodic principles. Zubizarreta (1998: 21–22) observes that languages differ with respect to the expression of prosodic prominence of focus. In English, German, and French, clause-initial non-contrastive foci are realized with prosodic prominence followed by deaccenting. In contrast to these languages, Spanish and Italian have a default prosodic prominence on the rightmost prosodic constituent that is not modulated by focus; in order to maintain this prosodic pattern, these languages employ deviations from the canonical word order such that non-contrastive foci surface rightmost in the clause. These approaches share the reasoning that syntactic movement is a last resort, employed for discourse functions that cannot be expressed by prosodic means in the language at issue. The distinction between two classes of languages may be oversimplified, as various instrumental phonetic studies on prosody show (see, e.g., effects of focus on the pitch range of tonal events in Chinese; Xu, 1999). Finally, it is cross-linguistically possible to increase the articulatory effort in order to draw the attention of the hearer to salient parts of the utterance (see effort code in Gussenhoven, 2004: 85–89). However, we know that the exact semantic-pragmatic value of similar prosodic devices can vary between languages (see Vander Klok et al., 2018 for differences in the prosodic means expressing variation in prominence between English and French). Thus, the core question is how different prosodic means of expressing prominence (e.g., nuclear-accent placement in English, pitch range expansion of tonal events in Chinese) can account for the possibility of using the same construction in different contexts depending on language.

Within this line of thought, the present study examines cleft constructions, which are informative for the general question at issue since these constructions are associated with a particular information structure.² In the typical instances of cleft constructions in English, the ‘pivot,’ that is the constituent in the matrix clause, is contrastively focused; this construction asserts that the proposition is true for the pivot to the exclusion of some alternatives that are relevant in discourse (see ‘cleft-focus principle,’ Rochemont, 1986: 133). The ‘cleft clause,’ that is the constituent that surfaces as a relative clause, contains the background information. Example (1) illustrates a context in which the contextual conditions for a felicitous use of the cleft construction are met. In this realization of the cleft, the nuclear stress is aligned with the pivot, as indicated by the small capitals.

(1) A: Did Mary buy the bicycle?

B: No, it’s JOHN that bought the bicycle.

Beyond cleft constructions with a focus in the pivot, as seen in (1), earlier research in English has shown that cleft constructions appear in a variety of contexts such that the focus domain of the utterance is (a part of) the cleft clause (e.g., ‘informative presupposition clefts’ in Prince, 1978; ‘topic-comment clefts’ in Hedberg, 1990, 2013; detailed classification in Delin, 1992; discussion of various classes of examples in Hartmann (2015): 252–270). The information structure of these examples is reflected in prosody: the nuclear accent in informative-presupposition clefts is realized within the cleft clause (see discussion in Delin, 1992, 1995; Hedberg, 2013), while the pivot is not completely deaccented (Hartmann, 2015: 214).

In the present study, we examined a particular type of context that enforces a focus within the cleft clause, namely cases of correction, as introduced in (2). Assume a context containing a cleft construction such that the pivot of the cleft (John) is focused as in (2A). In this context, it is possible to use a cleft construction as in (2B), correcting a part of the utterance in (2A). Correction establishes a relation between an ‘antecedent statement,’ that is available in the discourse, and a ‘corrective statement,’ that is a denial of (a part of) the antecedent statement. The corrective statement contains a replacement that is interpreted as incompatible with the antecedent statement and which is contrastively focused (Steube, 2001; Van Leusen, 2004; Repp, 2010). An important aspect of correction is the structural parallelism between the corrective statement and the antecedent statement, which is an instruction to the addressee to identify the relevant statement in discourse (Van Leusen, 2004: 437; Clifton and Frazier, 2016). The effects of structural parallelism are shown in (2): assuming an antecedent statement that contains a cleft construction (for reasons that depend on the contextual conditions of A and are not crucial for our purposes), it is possible to utter a corrective statement as in B, that is structurally parallel to the antecedent claim and involves a contrastive focus within the cleft clause. This configuration deviates from the expectation that the pivot of a cleft construction is the main focus of the utterance.

(2) A: … It’s [JOHN]_FOC that bought the car.

B: No, it’s [John]_FOC2 that bought the BICYCLE_FOC1.

The corrective statement in (2B) contains a complex focus structure, involving a primary focus (FOC1) and a secondary focus (FOC2). The primary focus is the focus of the corrective assertion that is expressed by the nuclear accent. The focus on ‘bicycle’ excludes the alternative in the antecedent statement: ‘it’s John that bought the bicycle’ is contrasted to ‘it’s John that bought the car.’ Additionally, this utterance has a second-occurrence focus³, FOC₂, which is inherited from the context utterance. If the cleft construction in (2A) identifies ‘John’ in contrast to further relevant alternatives (e.g., ‘Peter’ or ‘George’), this information is presupposed by the corrective statement in (2B). The second-occurrence focus is expressed by the cleft construction in this case and may have some secondary prosodic prominence (Féry and Ishihara, 2006; Beaver et al., 2007; Howell, 2011; Büring, 2015; Baumann and Ishihara, 2016). The asserted and presupposed information of (2B) can be paraphrased as: ‘it’s John (in contrast to ‘Peter’ or ‘George’) that bought the bicycle (not the car).’

The cleft constructions in (1) and (2) share the interpretation that some contextually relevant alternatives to the pivot are excluded (which applies to further contextual instances of cleft constructions, as shown by Hartmann, 2015: 253). These constructions differ with respect to the partitioning of the utterance in asserted and presupposed information, which is expressed by the nuclear stress placement, as summarized in (3).

(3) Cleft constructions and focus structure

The pivot of a cleft construction excludes alternatives that are relevant in the context.

(a) If the nuclear stress falls within the pivot, the exclusion of alternatives is the asserted information (focus).

(b) If the nuclear stress falls within the cleft clause, the asserted information is in the cleft clause (focus), while the exclusion of alternatives is part of the presupposed information (second-occurrence focus).

The crucial issue is that the variation in the focus structure of cleft constructions requires the possibility of variable nuclear stress placement, as stated in (3). The predictions of (3) are straightforward for languages such as English and German that realize the nuclear stress by means of pitch accents. Our first question is how this contrast can be expressed in languages that do not rely on pitch accents for signaling focus, such as French and Chinese. In order to establish the corresponding prosodic means in these languages, we conducted a cross-linguistic study on speech production (comparing English, German, French, and Chinese), which is reported in Section 2. The results of this study show that reflexes of prosodic prominence appear in all examined languages, but these reflexes are different in nature.

With this background, we examined whether a cleft construction with a focus in the cleft clause is equally felicitous in these languages (Section 3). Judgments of contextual felicity revealed a typological distinction between languages with flexible nuclear-accent placement (English and German) and languages that do not rely on this strategy (French and Chinese). Hence, these findings are in line with the idea that various classes of prosodic events have distinct semantic-pragmatic import: precisely, using cleft constructions with a focus in the cleft clause has an advantage in languages in which nuclear-accent placement unambiguously identifies the intonational nucleus (English and German); see discussion in Section 4.

Prosodic Reflexes of Focus

Aims

The present experiment examines whether canonical and cleft constructions can be realized with different prosodic patterns depending on focus in typologically different languages: languages allowing for flexible placement of nuclear accents (English, German), and languages that do not employ this prosodic strategy (French, Chinese).

Method

Participants

Sixteen native speakers of each language participated in this study. They were explained that their participation was voluntary and that the data will be used in anonymized form for research purposes. Written consent (translated into the native language of the participants) was acquired; participants were paid for their contribution to the experiment. Sex was controlled in the samples in order to outbalance the influence of sex on pitch: English (n = 16, female = 8, age range = 18–29, average = 22.1; collected in London), German (n = 16, female = 8, age range = 19–34, average = 23.4; collected in Bielefeld), French (n = 16, female = 8, age range: 18–44 = average 25.9; collected in Lyon), and Chinese (n = 16, female = 8, age range = 18–24, average = 20.8; collected in Beijing).

Factorial Design

The trials of this study presented short dialogical interactions. The instructor introduced a context, as in (4A). The participant produced a target utterance (4B) containing a corrective statement, whose antecedent was the last sentence of the context.

(4) A: Everyone brought something to the potluck today. Peter brought the bread.

B: No, [Layla]_F brought the bread today.

In order to assess the impact of contrastive focus on the prosodic realization of canonical and cleft constructions, we designed an experiment with the factors FOCUS and CONSTRUCTION of the target utterance; see (5). The factor FOCUS refers to the focus domain of the utterance, which depends on the relation of the target utterance to the last sentence of the context, and contains two levels: subject focus and object focus. The factor CONSTRUCTION relates to the syntactic construction of the target utterance: either ‘canonical constructions’ or ‘cleft constructions.’ The target utterance has always the same structure as the antecedent statement, maintaining the structural parallelism of correction as introduced in (2): canonical and cleft constructions in the target utterance always relate to canonical and cleft constructions respectively in the context utterance.

(5) Factorial design of the speech production study

(a) FOCUS: subject, CONSTRUCTION: canonical

A: Everyone brought something to the potluck today. Peter brought the bread.

B: No, [Layla]_F brought the bread today.

(b) FOCUS: subject, CONSTRUCTION: cleft

A: Everyone brought something to the potluck today. It’s Peter that brought the bread.

B: No, it’s [Layla]_F that brought the bread today.

A: Everyone brought something to the potluck today. Layla brought the salad.

B: No, Layla brought the [bread]_F today.

(d) FOCUS: object, CONSTRUCTION: cleft

A: Everyone brought something to the potluck today. It’s Layla that brought the salad.

B: No, it’s Layla that brought the [bread]_F today.

Material

The experimental conditions were implemented in four items involving different lexicalizations of simple transitive clauses. All lexicalizations had the same syntactic constituents, the same number of syllables and the same word stress pattern (English, German) or tonal structure (Chinese); voiceless obstruents were avoided whenever possible in order to reduce missing values in the f_o measurements;⁴ see full listing of the items in Supplementary Material, Section 2. The number of items is arguably low. Beyond limitations in developing lexicalizations with the present phonological requirements (same syllabic structure, word stress, tonal structure, avoidance of voiced consonants), the main motivation for this decision is to obtain minimal pairs of prosodic realizations of the same lexicalization and by the same speaker under different treatments. Hence, we created four different lexicalizations in order to obtain four repeated observations with each speaker. The drawback of the limited sample of items is that the findings cannot claim generalizability for the population of possible lexicalizations.

The objects were not final within the utterance, such that tonal events that are associated with object focus do not clash with the final lowering at the right edge of the utterance. Therefore, we used a clause-final temporal adverb in those languages in which the object would otherwise be the rightmost constituent (English and French). These items were recorded in all conditions with all participants, which renders a total of 4 items × 16 participants = 64 tokens per experimental condition (à four conditions: 256 utterances per language). Experimental items were mixed with fillers in a proportion 1 (target): 3 (fillers), whereby a part of the fillers (1:3) were items of a further experiment and the remaining fillers (2:3) were distractors. All trials (targets and fillers) were performed with the same instruction and had the same dialogical structure, as illustrated in (6).

The same types of constructions (canonical constructions vs. cleft constructions) were examined in all languages at issue. German declarative main clauses have a verb-second order, as seen in (6a). Cleft constructions as in (6b) are possible in German but occur less frequently and in restricted contexts compared to English (Dufter, 2009: 168; Fischer, 2009: 90). Narrow focus is usually expressed by prosodic means and/or syntactic movement in German. It is possible to use German cleft constructions with a focus within the cleft clause (Fischer, 2009: 168; Hartmann, 2015: 271), as discussed in Section 1 for English (‘informative presupposition clefts’ in terms of Prince, 1978). Experimental results show that the exhaustive interpretation (i.e., the interpretation that the pivot is the only alternative for which the presupposition of the cleft clause holds true) is not part of the truth-conditional meaning of German clefts (Drenhaus et al., 2011), which differs from English clefts that are exhaustively interpreted (Kiss, 1998: 268; Destruel and De Veaugh-Geiss, 2018).

www.frontiersin.org

French c’est clefts, as in (7), occur in a larger array of contexts than English it-clefts. While English clefts are licensed by contrastive focus, French clefts also appear in answers to wh- questions (Skopeteas and Fanselow, 2010). Furthermore, French c’est clefts with a subject as pivot do not only occur when the subject is in narrow focus, but also whenever the subject is part of a larger focus domain (Lambrecht, 2001; corpus findings in Karssenberg and Lahousse, 2018). While English clefts come with an exhaustive interpretation, this is not necessarily the case for French clefts (Destruel and De Veaugh-Geiss, 2018).

www.frontiersin.org

In Chinese, the canonical order with finite verbs is SVO; see (8a) (see discussion in Huang et al., 2009: 199–202). The ‘bare shi’ construction in (8b) (with shi4 preceding the subject) is a cleft construction, typically expressing contrastive focus on the subject. Similarly as with French, the same construction occurs in sentence focus (Cheng, 2008: 255; Paul and Whitman, 2008: 426; Von Prince, 2012: 342; Paul, 2015: 216; see discussion of the tonal properties in Section 2.4).

www.frontiersin.org

Procedure

Recordings took place in quiet rooms in the four places of data collection (London, Bielefeld, Lyon, Beijing). The data was recorded with an Olympus digital recorder (LS-13) with in-built microphones and saved in .wav files at a sampling frequency of 44.1 kHz. The participants were presented with the material in a power point presentation. Each trial was presented in two slides: in a first slide, they read a context-target pair as in (5) and were instructed to look carefully at the dialogue and to memorize the target sentence. In a second slide, only the context was presented, while a native speaker/instructor performed it orally (instructors were advised to perform the context sentences as natural contributions in a dialogue and to avoid a non-expressive style like repeating sentences from a list). The participants were instructed to perform the memorized target utterance in a way that naturally fits to the context (the purpose of this manipulation was to avoid effects of read speech). The participants were allowed to repeat the trial if they thought that their performance was not natural enough (without further guidance by the instructor).

Data Analysis

The recordings were processed in praat (Boersma and Weenink, 2020). The data set contained 64 utterances per condition/language; a few tokens had to be removed due to speech disfluencies or errors (two tokens in German and five tokens in Chinese). TextGrid objects were created for the valid data, with intervals corresponding to the syllables of the target utterances. All sound files and TextGrid objects are available at zenodo (Greif and Skopeteas, 2021).

A praat script written by the authors extracted the timing of the onset and the offset of each syllable, as well as the mean f_o of five equal time bins per syllable. The extracted measurements were processed in R (R Core Team, 2020). The f_o values in Hz were converted into semitones with a reference value of 50 Hz, with the formula f_{o (semitones)} = 12(log₂. f_{o (Hz)}/50) (Nolan, 2003; Grice et al., 2007; Wang and Xu, 2011).

The f_o values in semitones were averaged per experimental condition in order to detect the impact of the factors at issue on the f_o excursion in visualizations. Statistic evaluation was conducted on the non-averaged data.

Linear mixed-effects models were fitted on the (semitone transformed) f_o measurements in each area of interest (subject or object, see details in Section 2.3) separately (using package lme4 in R; Bates et al., 2015). We examined f_o excursions as time series, with the f_o mean of time bins as dependent variable. The fixed effects were the experimental factors FOCUS (level 0 = object; level 1 = subject) and CONSTRUCTION (level 0 = canonical; level 1 = cleft), and the continuous variable of TIME (levels: 1–5), whose levels refer to the corresponding time bin within the syllable. Including TIME to the model offers the possibility to examine the impact of the fixed effects on the f_o excursion as a function of time: the interaction effects with TIME reflect the impact of the corresponding fixed factor on the f_o slope within the area of interest (Barr, 2008). Starting with a random-effects structure with intercepts for PARTICIPANTS and ITEMS as well as by-PARTICIPANTS and by-ITEMS random slopes of FOCUS and CONSTRUCTION, we identified the maximal random-effects structure that converges in all languages for the analyses in a certain area of interest.⁶ Keeping the maximal converging random-effects structure constant (as suggested by Barr et al., 2013), we reduced the fixed-effects structure (FOCUS × CONSTRUCTION × TIME) with a backward-elimination procedure of non-significant effects (performed automatically by the function step of the package lmerTest in R; Kuznetsova et al., 2017). The fixed effects that were not nested in a higher interaction were additionally tested with Likelihood Ratio Tests (Bates et al., 2015: 35); for the significance of fixed effects that were nested in higher interactions, we can only rely on the t-values (ratio of the estimate to its standard error).

Predictions

The experimental material contains two areas of interest: the f_o excursion of the subject and f_o excursion of the object; in languages with stress, either lexical (German, English) or postlexical (French), the area of interest is the corresponding stressed syllable. In the area of the subject, we expect a contrast between nuclear accents (if the subject is focused) and prenuclear accents (if the focus falls on the object); in the area of the object, we expect a contrast between nuclear accents (if the object is focused) and deaccenting (if the focus falls on the subject). In Chinese, we expect that the f_o excursion of non-focused constituents will be tonally compressed compared to the f_o excursion of focused constituents (in either area). The type of accent depends on language and will be introduced with the presentation of the results in Section 2.4. In all cases, the expected contrasts imply a difference in the f_o slope, while the direction of the difference is language-specific (it depends on the prosodic events at issue).

The predictions of this study will be examined by testing for an interaction of the fixed factors with the variable of TIME within the areas of interest (i.e., the syllables in which phonological considerations predict reflexes of focus). Effects of TIME are evidence for a difference in the f_o slope, reflecting tonal events aligned with the area of interest (Grabe et al., 2007; Isaacs and Watson, 2010). Hence, an interaction FOCUS × TIME or an interaction CONSTRUCTION × TIME indicates that the corresponding fixed factor has an impact on the change of f_o within the area of interest. Effects that are independent of the time variable, such as a main effect of FOCUS, are evidence for a difference of the f_o level (see Barr, 2008 concerning the relevance of ‘rate effects’ in time series).

With this background, the major question in cross-linguistic perspective is whether FOCUS × TIME effects appear in all languages. The distinction between plastic (English, German) and non-plastic (French, Chinese) languages predicts that the effects of FOCUS will appear only in the former language type. However, earlier studies have shown that various phonetic reflexes of focus, such as a pitch range expansion or reflexes of demarcation of focused constituents, are found in non-plastic languages as well (see Xu, 1999; Chen and Gussenhoven, 2008 on Chinese and German and D’Imperio, 2010; Delais-Roussarie et al., 2015 on French), which predicts an effect on the f_o slope in all languages.

An interaction CONSTRUCTION × TIME may appear if certain constructions are associated with prosodic events that are independent of focus. Precisely, cleft constructions differ with respect to prosodic phrasing, such that the cleft clause forms an intonation phrase on its own (Féry, 2013: 699 on French); edge tones that delimit intonation phrases may appear around the boundary between the pivot and the cleft clause.

A threefold interaction FOCUS × CONSTRUCTION × TIME indicates that the effect of FOCUS on the f_o slope is modulated by CONSTRUCTION. Since cleft constructions with a focus in the cleft clause bear a second-occurrence focus as seen in (2), subject constituents may be not completely deaccented, which predicts a threefold interaction within the area of interest of the subject. In cross-linguistic perspective, effects of second-occurrence focus entail effects of focus. That is, a threefold interaction may appear in a subset of the languages that have a FOCUS × TIME interaction. Our predictions are summarized in (9).

(9) Predicted effects on the f_o slope

(a) FOCUS × TIME: focus influences the f_o slope (language-specific effects).

(b) CONSTRUCTION × TIME: canonical and clefts constructions differ with respect to p-phrasing.

(c) FOCUS × CONSTRUCTION × TIME: second-occurrence focus in cleft constructions predicts that the effect of focus on the f_o slope will be modulated by construction.

Results

The f_o excursions in Figure 1 illustrate the basic contrast between early and late foci in British English. Annotations indicate the tonal events that are relevant for our discussion on the prosodic reflexes of focus, assuming the ToBI conventions (Veilleux et al., 2006). When the subject is focused (Figure 1A) it is realized with a bitonal accent L + H*, which stands for a substantial rising pitch movement that reaches a high target within the stressed syllable; this realization is characteristic of contrastive foci in English (Ladd, 2008: 96; Watson et al., 2008; Gotzner, 2015: 130–136). The realization of a subject preceding the focus in Figure 1B also has a rising f_o excursion, starting from a low target within the stressed syllable and rising toward a high target that may be reached after the stress (L* + H).⁷ The prosodic realization of the objects is different in both figures. When the object is focused, it is realized with a rising contour (Figure 1B), similarly as with the focused subject in Figure 1A. When the object follows the focus, it is deaccented (Figure 1A), which means that it does not contain any significant prosodic events (Ladd, 2008: 231–236) and ends up with a final low target as expected for declaratives, which is phonologically represented by the sequence of a phrase tone (L−) and a boundary tone (L%).

FIGURE 1

Figure 1. Illustrative examples of (A) subject and (B) object focus in British English.

The average f_o excursions of British English (Figure 2)⁸ show a major distinction between early focus (on the subject, blue line) and late focus (on the object, red line), which applies to canonical and cleft constructions. The f_o rise in the stressed syllable of focused subjects (gray cell) has a greater slope with focus on the subject (blue line) than with focus on the object (red line). The realization of the objects show a rising contour when the object is focused (red line) and is deaccented when the object is given (blue line). These properties apply to canonical and cleft constructions, which means that prosodic marking of focus is not compensated by marking the focus in syntax (see the same effect for Canadian English in Arnhold, 2021).

FIGURE 2

Figure 2. Average f_o measurements in British English (time normalization based on five equal intervals per syllable; vertical lines: word edges; gray cells: areas of interest, stressed syllable of subject and object).

The German data shows a similar pattern in canonical and cleft constructions (Figure 3). Focused subjects (blue lines) are realized with an f_o excursion rising up to a H target that is close to the right edge of the stressed syllable, reflecting the fact that German has a bi-tonal accent L + H* for contrastive assertions (Grice et al., 2005: 65, 71; see Alter et al., 2001 on contrast). Non-focused subjects (red lines) optionally have prenuclear accents, reaching an f_o maximum after the right edge of the stress, which reflects the fact that the H-target of prenuclear accents (L* + H) may follow the stressed syllable (Féry and Kügler, 2008; Baumann and Riester, 2013: 20; Féry, 2017: 154). The impact of focus on object constituents is similar: a rise within the stressed syllable (L + H*) when the object is focused (red lines) viz. deaccented objects with a flat contour when the object is given (blue lines).

FIGURE 3

Figure 3. Average f_o measurements in German (time normalization based on five equal intervals per syllable; vertical lines: word edges; gray cells: areas of interest, stressed syllable of subject and object).

In French, the rightmost full (i.e., non-schwa) syllable is characterized by metrical prominence, which is reflected in lengthening and tonal activity; metrical prominence is assigned postlexically in French, which means that it is not determined by the lexicon (see summary in Post, 2000: 8–9; Féry, 2014). In terms of the French ToBI (Delais-Roussarie et al., 2015), the last syllable of the accentual phrase is associated with a high tonal target (H−), while the last accentual phrase ends up with a low target (L−L%); see Figure 4. French accentual phrases may start with a rise within the initial syllable (German and D’Imperio, 2010; Delais-Roussarie et al., 2015). Since these events are associated with edge syllables, we code them as edge tones associated with the left edge of an accentual phrase (−L + H) (following Féry, 2014). Initial rises are reported to appear more often with contrastively focused constituents (see German and D’Imperio, 2010; Delais-Roussarie et al., 2015), especially in contexts of correction (Vander Klok et al., 2018); however, the function of these events is controversial, since they may be used to draw the attention of the hearer to not focused constituents and there are also empirical studies disputing its correlation with contrastive focus (Cole et al., 2019: 130). The data in Figure 4 illustrate this contrast: focused subjects may be realized with an initial rise (Figure 4A), such that the high target is aligned with the right edge of the first syllable; non-focused subjects are realized with a (lower scaled) high edge tone aligned with the right edge of the accentual phrase (Figure 4B). The initial rise can also appear with focused objects (Figure 4B), while objects are not accented when following the focused subject (Figure 4A).

FIGURE 4

Figure 4. Prosodic realization of (A) subject and (B) object focus in French.

The averages per experimental condition (Figure 5) confirm that the introduced phenomena depend on information structure. The average f_o excursion of focused subjects (blue lines) targets an earlier local maximum than the corresponding excursion of non-focused subjects (red lines). Focused objects (red lines) also show an initial rise in contrast to non-focused objects (blue lines). Our data shows that tonal events following the nucleus are not necessarily erased in French (Di Cristo and Jankowski, 1999: 1567; Jun and Fougeron, 2000: 230; Féry, 2014):⁹ prosodic words in the postfocal domain display the same type of f_o excursion with their focused counterparts – but with a compressed pitch range.

FIGURE 5

Figure 5. Average f_o measurements in French (time normalization based on five equal intervals per syllable; vertical lines: word edges; gray cells: areas of interest, stressed syllable of subject and object).

Mandarin Chinese displays a phonological contrast between four lexical tones (T1: high level; T2: rise; T3: fall-rise; T4: fall). The target words in our material contain the simple contour tones T2 and T4 that are comparable since they consist of two tonal targets (i.e., T2: LH, T4: HL). All items have the tonal sequence T2-T2 (rise-rise) for subjects and T2-T4 (rise-fall) for objects; see (8) and Supplementary Material, Section 2.1.4. The choice of T2/T4 was just determined by convenience for the selection of appropriate lexical material and maintained constant across items. Word stress is not applicable to Chinese. Even if some studies report a preference for initial prominence in compounds (Duanmu, 2007: 135, 142), both syllables are areas of interest for our study (see Figure 6), since there is no reason to expect reflexes of focus only in the initial syllable. Focus is reported to be reflected in an expansion of the pitch range of lexical tones, with a greater effect on f_o maxima than f_o minima (Xu, 1999: 69; Greif, 2012: 38) as well as by a general increase of the distinctness of tonal targets, which resembles hyperarticulation effects of focus on vowel quality (Chen and Gussenhoven, 2008: 744). This kind of hyperarticulation is also seen in our data: the T2–T2 sequence in the subject is realized with two distinct rising excursions when the subject is focused, but this contour is leveled out into a single rise when the subject is out of focus. A similar contrast applies to the object constituents. The T2–T4 sequence results in a hat contour (LHL), whose peak is reached beyond the offset of the first syllable (Xu and Wang, 2001: 331): this hat contour appears with a reduced pitch range when the object follows the focus, which is evidence for postfocal tonal compression. The asymmetry between prenuclear and postnuclear tonal compression is similar to the asymmetry between prenuclear and postnuclear deaccenting in Germanic languages (Chen, 2010: 520). While the pitch compression is radical in the postnuclear domain, prenuclear tones only slow slight differences in terms of pitch range (see lexical tones of subjects under object focus).

FIGURE 6

Figure 6. Average f_o measurements in Chinese (time normalization based on five equal intervals per syllable; vertical lines: word edges; gray cells: areas of interest, subject and object).

Linear mixed-effects models with the factors FOCUS, CONSTRUCTION, and TIME were fitted on the f_o measurements within the stressed syllables (for objects and subjects separately; see details in 2.2.5). In Chinese, we analyzed the first and the second syllable separately, in order to maintain the same degrees of freedom in all analyses and since we cannot reduce the analysis to a single syllable based on assumptions about word stress.

The maximal random-effects structure that converges in all analyses for subjects contains random intercepts for PARTICIPANTS and ITEMS and a by-PARTICIPANTS random slope of CONSTRUCTION. The models of maximal fit for the f_o measurements in the stressed syllable of the subject are listed in Table 1. German is the only language with a significant threefold interaction (CONSTRUCTION × FOCUS × TIME), indicating that the effect of FOCUS on the f_o slope is modulated by CONSTRUCTION, such that the difference between focused and non-focused subjects is greater in canonical clauses (therefore the interaction effect is negative); compare blue and red lines in the area of subjects in Figure 3. In all languages, we obtain a significant FOCUS × TIME interaction, whose direction is language specific: it is positive with rising accents (English, German, Chinese/syllable 1) and negative with falling accents (French). In either case, this effect means that the f_o change is more rapid when the subject is focused. The models of maximal fit in English and Chinese (syllable 1) contain a negative interaction CONSTRUCTION × TIME, indicating that the f_o change is slower in cleft than in canonical constructions.

TABLE 1

Table 1. Linear fixed-effects models of best fit on the f_o measurements (semitones): subject.

The f_o measurements in the object constituent reveal similar results in all languages (Table 2). There is a clear interaction effect FOCUS × TIME, which is negative in English, German, and Chinese/syllable 1, since the baseline of object focus is a rise in these languages, while the same syllables in the postfocal domain (subject focus) are rather flat or slightly falling. The corresponding FOCUS × TIME interaction effects are positive in French and in Chinese/syllable 2, in which case the f_o excursion of the object focus is falling. There is no evidence that the difference between canonical vs. cleft constructions (CONSTRUCTION × TIME) plays a role.

TABLE 2

Table 2. Linear fixed-effects models of best fit on the f_o measurements (semitones): object.

Discussion

The results of the present study reveal that all examined languages show prosodic reflexes of focus, either through the prosodic prominence of the focused constituent or through leveling out the prosodic events of the postfocal domain.

All languages have a significant FOCUS × TIME interaction within the subject area (Table 1), whose properties vary depending on the language-specific tonal events. In German and English, this effect is positive, reflecting the use of rising accents for marking foci in these languages (Grice et al., 2005: 65, 71; Ladd, 2008: 96). A similar effect is found in the first syllable of the subject in Chinese, reflecting a more rapid rise of rising tones (T2) under focus. Our findings are in line with previous results on pitch range expansion of lexical tones under focus, especially applying to the rising tone (T2) (Xu, 1999; Wang and Xu, 2011; Greif, 2012: 75; Ouyang and Kaiser, 2015: 65). In particular, the average contours in Figure 6 show an increase of distinctness between subsequent rises within focus, which is in line with the view that tonal realizations are hyperarticulated under focus (Chen and Gussenhoven, 2008: 744). In French, contrastive focus on the subject frequently induces initial rises in the focused constituent resulting in a falling contour within the last syllable (German and D’Imperio, 2010). Hence, focus has an effect on f_o excursions in all languages in our sample, as summarized in (10).

(10) Prosodic prominence of focus

Evidence for prosodic prominence of foci is found in all languages for both subject and object foci and both canonical and cleft constructions. The nature of the obtained effects depends on the specific properties of the languages at issue.

(a) In English and German the focused constituent bears the nuclear accent, which contains a high peak within the stressed syllable; the effects on the f_o slope come from the contrast of the nuclear accents with prenuclear accents (area of interest: subject) or with deaccented domains (area of interest: object).

(b) In French and Chinese, the obtained effects come from phenomena increasing the saliency of prosodic entities: initial rises in French are a general strategy for demarcating prosodic constituents that appear more often with foci; in Chinese, focus is reflected in the hyperarticulation of the tonal targets of phonological events that are independent of focus (lexical tones).

Postnuclear prosodic events are leveled out, which gives rise to a significant FOCUS × TIME interaction in all languages (Table 2). Postnuclear leveling encompasses two types of phenomena, namely deaccenting and tonal compression. In German and English, the postfocal domain is deaccented: the average excursions of postfocal objects reveal a falling contour without any significant prosodic events, sharply contrasting to the corresponding contour of accented constituents. This finding is in line with previous findings in English (Liberman and Pierrehumbert, 1984; Ladd, 2008: 231–236) and German (Féry and Kügler, 2008; Baumann and Riester, 2013: 20; Féry, 2017: 154). The postfocal excursions in French and Chinese have the same prosodic pattern as the corresponding conditions in focus, realized with a reduced pitch range, which is evidence for tonal compression. In French, tonal compression applies to edge tones: the rising contours encompassing prosodic words are visible in focus or out of focus, with a difference in pitch range, which confirms the view that the reflexes of prosodic phrasing on intonation are still visible in the postfocal domain (Di Cristo and Jankowski, 1999: 1567; Jun and Fougeron, 2000: 230; Féry, 2014). In Chinese, tonal compression applies to lexical tones: the hat contour (T2-T4) is realized with reduced pitch range when the object follows the focus, as already reported in instrumental phonetic studies (Xu, 1999: 69; Chen, 2010; Greif, 2012: 82–88, 110–116). This result is not generalizable for all tone languages but confirms the view that Mandarin Chinese belongs to the subclass of tonal languages that have postfocal tonal compression (Xu et al., 2012). Our conclusions are summarized in (11).

(11) Postfocal tonal leveling

The postfocal domain is prosodically leveled out in all languages:

(a) English and German: the postfocal material is deaccented;

(b) French and Chinese: the available tonal events (edge tones in French, lexical tones in Chinese) are visible after the focus but tonally compressed.

The effects of second-occurrence focus are only confirmed by a significant CONSTRUCTION × FOCUS × TIME interaction in German. This result is in line with previous studies on second-occurrence focus in non-final contexts, in particular Féry and Ishihara (2006) on German. We refrain from any strong statement about a difference between languages with respect to second-occurrence foci: prenuclear accents are optional in general and a prosodic marking of second occurrence focus is not mandatory in these constructions, since it is already expressed through the cleft construction. Nevertheless, the fact that the only language for which we obtained evidence for prosodic reflexes of second-occurrence focus is German is in line with the view that signaling second-occurrence focus entails signaling focus. Languages with a contrast between accent types for the expression of focus are more likely to employ this contrast for second-occurrence foci as well.

Finally, the prosodic devices that can be used for signaling focus are equally used in canonical and cleft constructions. The interaction effects of CONSTRUCTION × TIME in the subject region in English and Chinese are accounted by specific properties of the constructions at issue. In both languages, cleft constructions show a tonal event that is immediately left-adjacent to the first syllable: in English it is a pitch accent aligned with the pronoun it (see Figure 2), while in Chinese it is the falling tone (tone 4) on the copula shi (see Figure 6). The reflex of these accentual events on the immediately adjacent high target is that the f_o rise starts later and from a higher pitch level in these constructions, which results into the significant interaction effect in these languages. Hence, this effect relates to language-specific properties of the material and is not informative for a difference between canonical and cleft constructions in terms of the prediction in (9b). An interaction effect of FOCUS × TIME (across constructions) is available in all languages, both in the analyses of subjects (Table 1) as well as in the analyses of objects (Table 2). We conclude from these facts that all languages have the potential to realize different prosodic structures depending on focus with canonical and cleft constructions.

Contextual Felicity of Syntactic Constructions

Aims

The aim of the present experiment is to test whether the contextual felicity of cleft constructions with a contrastive focus in the cleft clause depends on the prosodic typology. For this purpose, we collected judgments of the appropriateness of target utterances in certain contexts by means of written questionnaires. The typological distinction between plastic and non-plastic languages (based on the flexibility of nuclear-accent placement) predicts an advantage for languages such as German and English. However, our study on speech production revealed that focus is associated with various reflexes of prosodic prominence in all examined languages (Section 2.4).