The efficacy of the type of instruction on second language pronunciation acquisition

Alghazo, Sharif; Jarrah, Marwan; Al Salem, Mohd Nour

doi:10.3389/feduc.2023.1182285

ORIGINAL RESEARCH article

Front. Educ., 18 May 2023

Sec. Language, Culture and Diversity

Volume 8 - 2023 | https://doi.org/10.3389/feduc.2023.1182285

The efficacy of the type of instruction on second language pronunciation acquisition

Sharif Alghazo^1,2^*

Marwan Jarrah¹

Mohd Nour Al Salem¹

¹Department of English Language and Literature, The University of Jordan, Amman, Jordan
²Department of Foreign Languages, University of Sharjah, Sharjah, United Arab Emirates

This study investigates the efficacy of the type of instruction (i.e., perception-based vs. production-based) on second language (L2) pronunciation acquisition in an English as a foreign language (EFL) context. To achieve this objective, 60 tertiary-level Jordanian learners of English were recruited and put into two groups (30 learners in each group). Group A received 6 weeks of perception-based instruction on both segmental and suprasegmental aspects of English pronunciation, and Group B received production-based instruction over the same period and on the same aspects of pronunciation. Progress in L2 pronunciation was assessed at three time points (i.e., week 1, week 6, and week 14). Pre-, post- and delayed post-tests were run to achieve the study’s objective. A statistical analysis was conducted to analyse the data. The results show that both groups demonstrated a significant improvement in L2 pronunciation accuracy; in particular, Group A which received perception-based instruction demonstrated higher gains in segmental, syllabic, and prosodic aspects while Group B which received production-based instruction demonstrated more improvement in both global (i.e., comprehensibility) and temporal (i.e., fluency) aspects of pronunciation. However, both groups demonstrated similar gains on the delayed post-test. The findings provide implications for L2 pronunciation learners and teachers on the impact of the type of instruction on the addressed aspects of pronunciation.

Introduction

Pronunciation is an important sub-system of language, without which successful communication is not achieved (Levis, 2018). Indeed, mastery of other sub-systems (vocabulary and grammar) of language does not determine a speaker’s overall proficiency in the language because pronunciation is a basic aspect of both grammar and vocabulary (Nation and Newton, 2009). That is, mastering second language (L2) pronunciation allows language learners to develop new words and structures. Nation and Newton (2009) argue that knowing the pronunciation of words and phrases helps learners to store them in the long-term memory. Thus, achieving a high level of L2 pronunciation is an essential component of linguistic and communicative competence. Derwing (2018) argues that “[s] ome individuals, despite a great deal of exposure to their L2, and ample opportunities to interact, still exhibited aspects of pronunciation that made them difficult to understand” (p. 322). Therefore, the development of comprehensible and intelligible speech is a fundamental aim of pronunciation instruction (henceforth PI). Although early research reported that some aspects of L2 pronunciation are unteachable (Taylor, 1993), more recent enquiries showed promising results of the efficacy of PI (Lee et al., 2015; Gordon, 2021).

While there is ample evidence in mainstream literature in support of the effectiveness of PI (see, for example, Lee et al., 2015 for a meta-analysis of research in this regard), the relationship between the type of instruction and L2 pronunciation gains is yet to be evidenced. This is attributed to the fact that theories of second language acquisition do not clearly distinguish between the different types of instruction (Norris and Ortega, 2001). In addition, research on L2 PI has provided conflicting data on the efficacy of instruction on the ultimate attainment of L2 learners. Lee et al. (2015) argue that existing research on the efficacy of PI shows “variance in observed effects” (p. 345). In addition, Thomson and Derwing (2015) reviewed existing studies and concluded that there are many explanations for the conflicting results including individual difference factors (e.g., motivation and aptitude), age, and type of instruction, to mention just a few. Lee et al. (2015) also argue that research in L2 PI has been extensive “across many learners and contexts (e.g., various target languages and proficiency levels), pedagogical approaches (with vs. without feedback), linguistic features (e.g., segmentals vs. suprasegmentals), and outcome types (i.e., constrained vs. guided vs. open-ended)” (p. 345). In addition, existing research on L2 pronunciation by Jordanian Arabic (JA) speakers has been concerned with testing the perception and production of segmental and suprasegmental features (see, for example, Abu Guba et al., 2023), or with student perceptions of PI (Alghazo, 2015). However, the type of instruction as a determining variable and a convincing explanation for results of studies in PI research has rarely been explored. This study fills this gap in the literature and ventures to explore the relative effect of the type of instruction on English pronunciation gains among Jordanian Arabic-speaking learners of English at the university level. Thus, the study seeks answers to the following research questions:

1. How does the type of instruction (perception-based vs. production-based) affect the English pronunciation outcomes of Jordanian university students?

2. What aspects of English pronunciation are affected by each type of instruction?

We subscribe to Shintani et al. (2013, p. 298) comparison between the two types of instruction as follows:

The essential difference between [comprehension-based instruction] CBI and [production-based instruction] PBI rests on whether production is or is not required and, therefore, how learners are expected to process the … features that are the target of the instruction. CBI does not require production of the target features.¹

In addition, we follow Lee et al. (2020) definitions of perception-based and production-based instruction which show that perception-based instruction aims “at increasing the participants’ identification or discrimination abilities” while production-based instruction aims at “eliciting the correct articulation of the target features while making use of corrective feedback” (p. 1). In PI, perception refers to “learners’ ability to identify L2 phonemic and suprasegmental components in the input and to discern contrasts between various L2 phonemes, as well as differences between L2 and [first language] L1 phonemes,” while production refers to “the ability to articulate the sounds of the L2” (Loewen, 2020, pp. 150–151).

Literature review

Compared to other systems of language (i.e., vocabulary and grammar), pronunciation received less attention from researchers and scholars until the 2010s.² Thomson and Derwing (2015) report that “the field is growing rapidly” (p. 339) but assert that “there is considerable variability in reporting standards that limits replicability. We note very modest scaffolding using existing research” (p. 339). This situation implies that more investigations into the standards which guide our teaching of L2 pronunciation are needed and that a strong bond among theory, research, and practice is to be established in order to set the standards of quality teaching and research in the area of L2 pronunciation. A notable example of the lack of standards in pronunciation teaching and research is a lack of solid evidence into the efficacy of instruction on the ultimate gains of L2 learners. In fact, research in L2 pronunciation has provided much evidence of the value and efficacy of instruction in the development of both accuracy and fluency measures (see Saito and Plonsky, 2019). However, as Thomson and Derwing (2015) argue, this body of research has provided “mixed results” (p. 326) and left a good number of questions about the efficacy of PI unanswered. One of the most important questions is that of the effect of the type of instruction on L2 pronunciation gains.

Of the early studies which provided evidence of the overall positive impact of instruction on pronunciation development is that by Derwing et al. (1998) who asked 37 native speakers of English to judge the pronunciation accuracy of 13 English as a second language (ESL) students in a speaking program at two time points (Week 1 and Week 12 of instruction). The results of their study demonstrated that instruction did result in improvement and that the utterances produced by the student participants after intervention were assessed to be more intelligible than those produced by the same students in Week 1 of instruction. The efficacy of PI was also reported in Barrera-Pardo (2004) review of 25 studies which concluded that 23 studies showed a positive effect of PI. Moreover, Couper (2003, 2006) noted that PI led to increase in learner awareness and accuracy in L2 pronunciation, a result also supported in his study which showed that PI led to fewer pronunciation errors. In a more recent study, Thomson and Derwing (2015) conducted a narrative review of studies on PI and concluded that “[e]ighty-2% of the studies reported significant improvement” (p. 332). In another similar investigation, Lee et al. (2015) reviewed 86 studies on PI and found that the majority manifested positive impacts of PI; in particular, they reported that seven studies showed that PI led to improvement in comprehensibility and intelligibility—“the gold standard,” as Thomson and Derwing (2015, p. 332) call it. This latter finding is important in research on PI because as, Derwing (2018) argues, “[t] he value of PI, surely, should be determined by the extent to which it can improve communication” (p. 320), rather than nativelike ability (cf. Thomson and Derwing, 2015).

As for studies which provided counterevidence to the efficacy of PI, the literature cites that of Mac Donald et al. (1994) who examined the efficacy of PI using the accentedness measure. Pre-, post-, and delayed post-tests produced speech samples which were assessed by listeners after 10–30 min of instruction. The results showed that “no single intervention was beneficial to all the learners who experienced it” (p. 95). In another study, Fullana and Mora (2009) conducted a study to investigate the performance of Catalan/Spanish learners of English in the perception and production of voicing contrasts in English and found that the amount of instruction did not predict pronunciation improvement. Surprisingly, most studies which provided counterevidence to the efficacy of PI were technology-based. Thomson and Derwing (2015) show that the studies which showed no/small effects of instruction are those which “provided PI using technology, whether entirely or in part” (p. 361); they attribute this finding to “the lack of adaptability and perceptual accuracy in computers compared to human teachers, and perhaps consequently their ability to provide appropriate feedback” (p. 361). For example, Kissling (2013) conducted a study to explore the effect of explicit PI for a group of English-speaking learners of Spanish, focusing on eight problematic sounds and utilising computer-aided modules. The results showed that both groups had the same gains on the post-test.

As a matter of fact, the efficacy of the type of instruction (perception-based and production-based) is a contentious issue in the field of instructed second language acquisition. Loewen (2020, p. 151) argues that there are two positions to explain the link between perception and production in L2 PI: The first position speculates that “perception and production are independent of each other,” and that instruction can target each independently, while the second views that “perception and production are more closely related,” and that instruction which targets any affects the other. In addition, second language acquisition researchers agree on the precedence of perception in L2 development and note that change in perception leads to change in production (see Saito, 2013). Loewen (2020) also asserts that L2 learners should receive perception-based instruction before we ask them to produce speech sounds. Sakai and Moorman (2018) conducted a meta-analysis to show the link between perception training and production abilities and concluded that “the two modalities are connected, insomuch as training the perception of L2 sounds can induce positive change in the productive mode” (p. 187). However, they “caution researchers to not equate the connection of the two modalities in long-term linguistic development to real-time neurological processing” (p. 187) because the relationship between perception training and production measures was found to be insignificant.

If we turn to studies which explored the effect of the type of instruction on the development of L2 pronunciation, we find very few studies in mainstream literature. Of particular importance is that by Lee et al. (2020) who examined the effect of the type of PI (perception-based vs. production-based) on pronunciation acquisition among 115 Japanese university students of English. PI lasted for 2 weeks, and improvement was assessed using pre-, post-, and delayed post-tests. The results showed an overall improvement in pronunciation accuracy but varied performance across groups. The major finding of the study is that “perception-based training may be the more effective training method across both segmental and suprasegmental features” (p. 1). This last result is surprising and warrants more studies in different contexts. Based on the foregoing, we notice that the findings of existing studies are incomplete and far from conclusive. Therefore, the present study aims to provide further testing of the effect of the type of PI on five aspects of L2 pronunciation: segmental, syllabic (epenthesis), prosodic (stress placement), global (comprehensibility³), and temporal (fluency⁴) aspects. It should be mentioned here that we embrace Segalowitz (2010) characterisation of the types of fluency in L2 pronunciation which is based on psycholinguistic measures and contend that the targeted type in this study is the ‘perceived fluency’ which refers to judgements “made about speakers based on impressions drawn from their speech samples” (p. 48). This type of fluency is often measured by using a Likert scale with two extremes: extremely fluent to extremely dysfluent (Derwing, 2017).

Methods and procedures

Participants

The participants were 64 university students who were enrolled in an English language course. However, four students missed the delayed post-test, and consequently their data were excluded. The participants were majoring in Applied English or English language and literature. The participants had a low-intermediate to intermediate level of proficiency (based on the results of the pre-test).⁵ Their ages ranged between 18 and 21 years. They were all native speakers of JA and had never been to an English-speaking country. Their exposure to English is limited to class time where instruction is mostly conducted in English. The participants were randomly divided into two groups (32 students in each, based on the initial count).

Setting

The study was conducted in an EFL context: a provisional governmental university in Jordan. In this context, exposure to English outside the confines of the classroom is rare, and usually limited to decontextualised phrases and structures. Communication outside the classroom occurs in JA. It should be mentioned that JA is a dialect of the Arabic language which is characterised by some phonological, lexical, and grammatical features that differ from those in Modern Standard Arabic (MSA).

Procedures

The study is interventionist. The principal researcher (who is also the teacher) is a specialist in English pronunciation. He explained to the students in the two sections the aims of the study and requested their written consent to participate in it. At Week 1 of the intervention, the researcher conducted a pre-test which included five sections to cover the five aspects of L2 pronunciation targeted in the study. At Week 6, the same test was run by the researcher to investigate the gains (if any) after instruction had occurred for 6 weeks. At Week 14, a delayed post-test was also conducted to validate the outcomes and make sure that the gains are in the long-term memory. The results of all tests were given to a statistician to run the appropriate statistical analyses for the experiment. The students were instructed using a British English syllabus and were thus requested to use it in the experiment. Each group received PI for approximately 2.5 h a week from the principal researcher which totaled 15 h of instruction. Both groups received PI on segmental (using minimal pairs) and suprasegmental (using phrases and dialogues) phonology focusing on English and Arabic differences and highlighting potential problems because of interference. For example, the students were apprised of the segmental inventories of the two languages, were taught about the syllable structure of the two languages and were instructed on how to place stress and prominence for fluency and comprehensibility purposes. Potential teaching techniques were utilized by the instructor researcher to ensure optimal understanding and performance. Those included body gestures, hand clapping, and communicative activities. If performance was erroneous, the instructor would give corrective feedback in the form of recast and repetition.

Instruments

Two types of tasks were included as outcome measures: controlled and free production tasks.⁶ This was necessary to ensure that the targeted aspects and the participants’ conscious knowledge of these aspects are carefully and explicitly tested and to identify gains (if any) in global and temporal aspects of L2 pronunciation (both comprehensibility and fluency) which can only be measured if spontaneous productions are elicited from the speakers. In the free production task, the participants were given pictures and a number of words and were asked to describe them using as many words of those in the controlled tasks as needed to fully describe the picture. The controlled tasks included 10 items in each. The first task tested the segmental aspect, the second tested the syllabic aspect, the third the prosodic aspect, the fourth the global aspect, and the fifth the temporal aspect. To ensure understanding of the test, the instructions were explained in the participants’ L1. The same instrument was used in the post-test and delayed post-test. The students were recorded in a quiet room to minimize any effect on the raters’ perception of their pronunciation. Each student was recorded individually, and each recording took almost 3 min.

Assessment

In order to assess the recordings of the participants’ performance on the five tests, an expert rater with a specialty in English pronunciation was recruited. The rater is a fluent non-native speaker of English with an expertise in British English pronunciation. He had (at the time of data collection) 7 years of teaching experience at the tertiary level. The rater assessed each utterance using a 10-point Likert scale where a score of 1 meant that the utterance is inaccurate, and a score of 10 is perfectly accurate. The reason for selecting this scale is because, as Southwood and Flege (1999) found, “[a] seven-point scale, although frequently used, may not be sensitive enough for all listeners to discriminate among sentences …. An 11- or 9-point scale might improve listener sensitivity when scaling degree of perceived foreign accent” (p. 346). The recordings were listened to by the rater, and they were assessed by him.

Piloting

In order to validate the tests, a pilot experiment was conducted, with four students sitting for pre-, post-, and delayed post-tests. The results were analysed. The aim of the pilot study was to ensure that there are no problematic aspects for the participants and that the tasks cover all targeted aspects. As a result of the pilot experiment, some words were replaced by others, and more hints were provided in the picture-description task.

Results

This section presents the results of the analysis which provide answers to the research questions. It should be recalled that the first question asked about the effects of the type of instruction (perception-based vs. production-based) on the English pronunciation outcomes of Jordanian university students. To answer this question, we sought to find any statistically significant differences at p. value (α = 0.05) between the averages of gain scores of the two study groups in the immediate post-test that are due to the teaching method: perception-based vs. production-based. To do so, the means and standard deviations of the gain scores of the two groups from the pre- to the immediate post-tests were calculated. The results are shown in Table 1 below.

TABLE 1

Table 1. The means and standard deviations of the scores of the two groups.

The results in Table 1 show that there are apparent differences in the means of the gain scores between the two groups in the post-tests. The mean of gain scores of the members of the first group taught using the perception-based method was 6.10, whereas the mean of the gain scores of the members of the second group taught by the production-based method was 6.19. To detect the statistical significance of these differences, the ANCOVA one-way analysis of variance was used, as shown in Table 2 below.

TABLE 2

Table 2. Results of the ANCOVA one-way analysis of variance.

The results in Table 2 show that there are no statistically significant differences at the p. value (α = 0.05) in the averages of the gain scores of the two groups in the post-test that are due to the teaching method (perception-based vs. production-based). The value of the F is 59,067 with a statistical significance of only (0.057), a statistically non-significant value. This analysis shows that although there were significant gains in L2 pronunciation for the members of both groups in the immediate post-tests, there was no significant difference between the gains of Group A as compared with Group B. This clearly reveals the important role of PI in the ultimate attainment of L2 learners regardless of the teaching method.

As for the second research question which asked about the aspects of English pronunciation which are affected by each type of instruction, we run five tests to measure five important aspects of L2 pronunciation: segmental, syllabic (epenthesis), prosodic (stress placement), global (comprehensibility), and temporal (fluency). In order to find out whether there are differences in the sub-tests, the means and standard deviations of the scores of the two groups of the study were calculated in the pre- and immediate post-tests, as shown in Table 3 below.

TABLE 3

Table 3. The averages and standard deviations of the scores of the two groups in the sub-tests.

Table 3 shows that there are apparent differences in the averages of the scores of the two groups in the sub-tests of the pre- and post-test, following the teaching method. To detect the statistical significance of these differences, multiple concomitant variance analysis (MANCOVA) was used, as shown in Table 4 below.

TABLE 4

Table 4. Results of the multiple concomitant variance analysis (MANCOVA).

The results in Table 4 indicate that there are statistically significant differences at the p. value (α = 0.05) for the scores of the two groups in the sub-tests of the immediate post-tests following the teaching method. The Hotelling’s value was 8.258 with 0.000 significance. The (F) value for the first test was 52,502 with a 0.000 statistical significance, which is a statistically significant value. The Eta Square is 0.498, meaning that 49.8% of the variance in the performance of the scores of the two groups in the first test is due to the teaching method. The (F) value for the second test was 32.131 with a 0.000 statistical significance, which is a statistically significant value. The Eta Square is 0.377, meaning that 37.7% of the variance in the performance of the scores of the two groups in the second test is due to the teaching method. The (F) value for the third test reached (42.284) with a 0.444 statistical significance of 0.000, which is a statistically significant value. The Eta Square was.444, meaning that 44.4% of the variance in the performance of the scores of the two groups in the third test is due to the teaching method, too. The (F) value for the fourth test was 185.135 with a 0.000 statistical significance, which is a statistically significant value. The Eta Square reached 0.777, meaning that 77.7% of the variance in the performance of the scores of the two study groups in the fourth test is also due to the teaching method. The (F) value for the fifth test reached 87,320 with a 0.000 statistical significance, which is a statistically significant value. The Eta Square reached.622, meaning that 62.2% of the variance in the performance of the two groups’ scores in the fifth test is due to the teaching method.

In order to find out who the variances were in favor of, the estimated marginal means were calculated for the scores of the first group and the second group on the aspects of the behavioral problems scale, as shown in Table 5 below.

TABLE 5

Table 5. The estimated marginal means for the scores of the two groups.

It is evident from Table 5 that the estimated marginal means in the sub-tests (first, second and third) for the first group that was taught using the perception-based method is higher than the estimated marginal means of the first group that was taught using the production-based method. Thus, the variances in these tests are in favor of the first group. The estimated marginal means in the sub-tests (fourth and fifth) for the first group that was taught using the perception-based method is lower than the estimated marginal means for the first group that taught using the production-based one. Thus, the variances in these tests are in favor of the second group.

As for the ultimate attainment of the participants in the delayed post-tests, we sought to find differences between the gain scores of the immediate post-test and scores in a delayed post-test (at week 14). This allowed us to find if there is any statistically significant difference at the p. value (α = 0.05) between the averages of the members of each group in the post- and delayed post-test. To answer this question, the means and standard deviations of the first group that was taught by the perception-based method were calculated in the subtest of the post- and delayed post-test using a T-test for the paired samples. Table 6 below shows the results.

TABLE 6

Table 6. Results of the T-test for the paired samples of the first group.

The results in Table 6 indicate that there are no statistically significant differences at the p. value (α = 0.05) for the scores of the first experimental group that was taught by the perception-based method in the sub-test of the post- and delayed post-test. If we turn to the differences between the scores of the immediate post-test and the delayed post-test for the second group of students who were taught using the production-based method, we find a similar result to that in relation to the perception-based method. To do so, we sought to find any statistically significant differences at the p. value (α = 0.05) between the averages of the members of the second group that was taught by the production-based method in the immediate post- and delayed post-tests. We therefore counted the means and standard deviations of the scores of the second group using a T-Test for the paired samples. Table 7 below shows the results.

TABLE 7

Table 7. Results of the T-test for the paired samples of the second group.

The results in Table 7 above indicate that there are no statistically significant differences at the p. value (α = 0.05) for the scores of the second experimental group that was taught by the production-based method in the fourth post- and delayed post-tests. There were statistically significant differences at the p. value (α = 0.05) for the scores of the second experimental group that was taught by the production-based method in the rest of the tests. The averages in the sub-tests (first, second, third and fifth) in the mean of the post-test were less than the mean of the delayed post-test. This reflects the lasting effect of the teaching method after ceasing to apply it.

Discussion

This study sought to explore the efficacy of the type of instruction on the acquisition of English pronunciation by Jordanian Arabic-speaking learners of English at the university level. It also ventured to test the effect of the type of PI on five aspects of L2 pronunciation: segmental, syllabic (epenthesis), prosodic (stress placement), global (comprehensibility), and temporal (fluency) aspects. In this section, we discuss the results obtained from the analysis from a theoretical point of view. In doing so, we respond to Thomson and Derwing (2015) call that researchers should not only ask the question of “what the consequences of instruction are, but … [also] why” (pp. 334–335, emphasis in original). In addition, the results are discussed in the light of research on the effectiveness of PI on the ultimate outcomes of L2 pronunciation acquisition. This necessitates an inclusion of the theorisations in the field of second language acquisition and those related to cognitive linguistics. Van Patten (2010) argues that theories of second language acquisition help us to understand how learning takes place and thus to teach effectively. Couper (2015) argues that cognitive phonology is one of the most relevant to PI because “pronunciation is a cognitive skill that can be learned using our general learning faculties” and because “pronunciation learning is situationally embedded involving a complex interplay of social and cognitive variables” (p. 420). Pennington and Rogerson-Revell (2019) contend that adults learn an L2 “by means of general cognitive capabilities” (p. 59) but show that L2 learners make more use of explicit, rather than implicit as is the case in children, learning mechanisms, a situation which makes L2 learning a difficult endeavour. Long (2015, p. 41) argues that adult L2 learners have “a somewhat weaker capacity for implicit learning, due particularly to age-related declines in the efficiency of instance learning” (emphasis in original).

Indeed, the acquisition of L2 pronunciation is said to be the most challenging aspect of the target language. Fraser (2010) argues that pronunciation is “the most difficult of the language skills … [although it is] the one they most aspire to master” (p. 358). This difficulty is attributed to many factors. First, adult L2 learners are disadvantaged—compared to children—in L2 pronunciation acquisition because the inborn innate ability to develop language, articulated by Chomsky (1965) in his theory of language acquisition, is argued to disappear or diminish over time (Singleton, 1989). Moreover, as Chomsky and Halle (1968, p. 3) note, pronunciation is not about phonological rules; there are “many other factors as well—factors such as memory restrictions, inattention, distraction, nonlinguistic knowledge and beliefs, and so on.” That is, many factors influence the ultimate attainment of L2 learners in pronunciation (Saito, 2019 for the role of aptitude; Kralova et al., 2017 for the role of anxiety; Flege, 2003 for the role of the first language; Lybeck, 2002 for the role of identity; and Hyltenstam and Abrahamsson, 2003 and Flege et al., 1995 for the role of age; among other factors). While some scholars such as Brown (2014) see that the complexity and slipperiness are in the learning process, others such as Ellis (2001, p. 37) claim that “the complexity is in the language, not the learning process.” This shows that—before we embark on a theoretical explanation of the results—PI alone is not the only determinant of acquisition and that our conclusions should be modest because “the impact of [those] ‘other factors’ on performance is much greater for the L2 learner” (Couper, 2015, p. 418). However, PI aids L2 learners to advance in the quest of developing accuracy and fluency.

The first important result to discuss here is that both groups (perception-based and production-based) demonstrated large gains in L2 pronunciation accuracy, and that the improvement remained in the delayed post-tests. This was evident in the fewer errors which the student participants made in the post- and delayed post-tests (see below for specific aspects). This finding is important as it provides solid evidence that PI is effective and plays a role in the ultimate attainment of L2 learners (cf. Purcell and Suter, 1980; Pennington and Richards, 1986). Not only was PI found effective for segmental aspects but also for more global aspects of comprehensibility and fluency. Recent research on L2 pronunciation acquisition showed a noticeable influence of instruction in the development of pronunciation (Saito, 2011; Derwing et al., 2014; Lee et al., 2015; Thomson and Derwing, 2015; Trofimovich et al., 2017). Kennedy et al. (2014) tested the role of PI in the improvement of L2 French pronunciation and found that there were significant improvements in learners’ production at segmental and some suprasegmental aspects and fluency. Lee et al. (2015) and Saito and Plonsky (2019) also showed a positive role of PI in the development of L2 pronunciation. Saito (2015) argues that with instruction “some learners are able to attain high-level L2 performance” (p. 741).

The study also demonstrated that PI has a long-term effect on the acquisition of English pronunciation by Arab learners of English, as evidenced in the results of the delayed post-tests. The results showed that the efficacy of perception-based PI was evident in the scores of the delayed post-test for all five aspects of segmental, syllabic, prosodic, global, and fluency. However, the analysis demonstrated that production-based PI did not result in a noticeable delayed global (comprehensibility) outcomes. This finding is important because perception-based PI showed a more lasting effect on all aspects of L2 pronunciation, a finding which gives support to explicit instruction based on explanation and modelling. This result is consistent with Lee et al. (2020) that “perception-based instruction would lead to greater improvement than production-based ones” (p. 8). The findings is also important because, as Pennington and Rogerson-Revell (2019, p. 69) note, “perception leads production to a considerable degree.” As a matter of fact, L2 production is strongly linked to L2 perception. Thus, Pennington and Rogerson-Revell (2019, p. 71) continue to argue that “improvement in L2 pronunciation requires modifying templates for perception and for production.” That is, production-based PI alone—without perception-based instruction which leads to learner awareness—might not be so effective at the long run. Indeed, learner awareness was found to be an influential factor in the ultimate attainment of L2 pronunciation.

To explain this finding from a theoretical point of view, we refer to Best and Tyler (2007) assertion that perceptual L2 learning is accomplished early in the experience of learning an L2 and diminishes over time: “very little perceptual benefit seems to accrue from additional experience past the initial period for late learners” (p. 21). Considering that segmental, syllabic and prosodic aspects of L2 pronunciation are in most cases the first aspects to teach to L2 learners, we realise why perception-based PI was more effective in the teaching of those aspects of L2 pronunciation, and less effective in teaching more global and temporal areas. We also explain the result with reference to Schmidt (2001, pp. 3–4) noticing hypothesis which emphasises the value of noticing and awareness in learning; the hypothesis stipulates that “the concept of attention is necessary in order to understand virtually every aspect of SLA [second language acquisition]” and that “SLA is largely driven by what learners pay attention to and notice in target language input and what they understand the significance of noticed input to be.” Housen and Pierrard (2005) argue that attention controls and leads to awareness. For example, Kennedy et al. (2014) tested the role of learner awareness in the achievement of L2 learners in pronunciation, found that there is a strong link between learner awareness L2 pronunciation accuracy and fluency, and concluded that “awareness can be viewed as a learner’s orientation toward language and language learning” and that “language awareness may be akin to many individual difference factors” (p. 91). Moreover, the result may be explicated in the light of second language acquisition theories, particularly the interlanguage theory, which stipulates that the efficacy of PI may not be so apparent as the individual is in the process of developing the L2 system (i.e., the restructuring process). As Yule and Macdonald (1995, p. 438) put it.

[T]he changes in direction observed in many cases across the three points in time should also make us cautious about jumping to premature decisions about the effectiveness (or not) of our instructional procedures. Indications of immediate improvement can disappear after a few days and signs of immediate deterioration can, in the same time span, be noticeably reversed.

More importantly, the study also tested the efficacy of PI using two types of instruction (perception-based and production-based) on five important aspects of English pronunciation. The results showed that perception-based PI significantly affects the gains in the segmental, syllabic, and prosodic aspects of English pronunciation and that production-based PI plays a greater role in the improvement of global (i.e., comprehensibility) and temporal (i.e., fluency) aspects of pronunciation. Flege (1996, p. 11) notes that segmental phonology (i.e., sounds) are better learned by activating perceptual abilities first and that learners “establish central perceptual representations for a range of physically different phones (‘sounds’) which signal differences in meaning.” Flege (1996) also argues that this is only the first step as learners also develop “motoric routines for outputting sounds in speech production” (p. 11). In his review of the implications of cognitive linguistics, Mompean (2014) argues that “language, including phonology, is the outcome of properties of cognition” (p. 357) and that perception is one of the three important cognitive abilities to learn an L2.

Indeed, existing research reveals the value of PI (regardless of the type of instruction) in L2 pronunciation improvement, as noted earlier. For example, Trofimovich et al. (2017) found that focused PI resulted in “significantly better ratings for accent, comprehensibility, and fluency” (p. 42) and concluded that PI “has an impact beyond specific aspects of L2 speech, contributing to listeners’ global judgments of L2 French speech” (p. 43). However, few studies scrutinised the role the type of PI plays in the development of pronunciation aspects. Our results demonstrate that perception-based PI affected the segmental, syllabic, and prosodic aspects of L2 pronunciation. This is attributed to the fact that such aspects play a lesser role in the overall assessment of speech production and that these aspects require a more auditory and articulatory attention than other global aspects. Zhang et al. (2009) argue that “[w]hat makes phonetic learning complicated is the fact that speech perception involves brain regions for acoustic-phonetic as well as auditory–articulatory mappings” (p. 237). Thus, perception-based PI which is focused on the development of perceptual abilities was found more effective in developing segmental and syllabic aspects of L2 pronunciation. This does not imply that perception-based PI is of less value but shows that it is a necessary part of instruction because the development of any aspect of pronunciation leads to overall development in the production of the target language. For example, Field (2005) showed that word stress errors result in less intelligibility. Zhang et al. (2009) also show that much of L2 pronunciation learning is neurally-governed, alluding to the complementary role of perception-based PI in L2 pronunciation development. In addition, research shows that although L2 learners usually more accurately perceive aspects of L2 pronunciation than produce them, “there are a few studies that indicate that learners are able to produce L2 sounds that they have difficulty perceiving correctly” and that “how L2 sounds are perceived does not systematically translate into how those sounds are produced by the learner” (Hummel, 2014, p. 145).

Conclusion and implications

This study has tested the efficacy of the type of PI (perception-based vs. production-based) on the acquisition of English pronunciation by Jordanian Arabic-speaking learners of English. The study has provided evidence that PI is effective in developing most aspects of L2 pronunciation (e.g., segmental, syllabic, prosodic, global and temporal) and that each type of PI is more beneficial for some aspects than the other. This is reminiscent of Saito (2012) assertion that “instruction is effective not only for improving specific segmental and suprasegmental aspects of L2 sounds … but also for enhancing listeners’ overall judgment of comprehensibility” (p. 849). However, our findings are not conclusive because, as noted above, a multitude of factors play a role in the development of L2 pronunciation and because no single type of PI is effective for all learners who experience it. Mac Donald et al. (1994) argue that “the wide range of different individual reactions should serve as a reminder that the individual learner may represent a more powerful variable than does the instructional setting in the acquisition of pronunciation” (pp. 95–96).

The findings of this study provide important implications for L2 learners and teachers who find it a challenge to develop the pronunciation sub-system of language. As noted above, teachers of English pronunciation have, for a long time, thought that many aspects of English pronunciation are unteachable (see Taylor, 1993; Burns, 2006). This study adds to existing ones on the effectiveness of PI and evidence the efficacy of PI in developing various aspects of L2 pronunciation. Thus, we need to include more training in our teacher education programs and to raise awareness among pronunciation teachers of the value of PI. In addition, theorists should devise a theory of pronunciation to guide teachers and researchers in their quest to address aspects L2 pronunciation. As Foote and Trofimovich (2017, p. 75) argue, “one of the most acute problems, which has persisted despite an increase in studies targeting pronunciation, is a lack of theory to guide pronunciation research.”

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author contributions

SA conducted the study, analysed the data and wrote the discussion. MJ wrote the introduction and literature review. MA wrote the conclusion and checked references. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^Scholars use different terms to refer to the same concept. Loewen (2020) uses—in addition to the terms used above—input-based vs. output-based instruction (p. 112).

2. ^Thomson and Derwing (2015) believe that scholars attribute this to the provision of "Communicative Language Teaching, in which a focus on meaning was prioritized over form-focused instruction, under the assumption that pronunciation would improve through exposure" (p. 326). The reader is also referred to Huensch (2019).

3. ^Comprehensibility is defined as "[d]egree of effort required to understand accented speech, usually measured using scalar responses from listeners" (Derwing, 2018, p. 321).

4. ^Fluency is defined as "[t]he flow, fluidity, or smoothness of speech (entails factors such as speech rate, mean length of run between pauses (usually measured in syllables), hesitation forms etc.), often measured using scalar responses, but temporal measures are common as well (e.g., syll/s)" (Derwing, 2018, p. 321).

5. ^Although the proficiency level of students might have an influence on the results of intervention, it was not considered as a variable in this study.

6. ^These types of tasks target the two types of knowledge in L2 pronunciation acquisition: explicit and implicit pronunciation knowledge, respectively. A note on the terminology is in order. While DeKeyser (2007) calls them implicit and explicit knowledge or declarative and procedural knowledge, Saito and Plonsky (2019) use controlled knowledge and spontaneous knowledge.

References

Abu Guba, M. N., Mashaqba, B., Jarbou, S., and Al-Haj Eid, O. (2023). Production of vowel reduction by Jordanian–Arabic speakers of English: an acoustic study. Poznan Stud Contemp Linguist 59, 1–25. doi: 10.1515/psicl-2022-2011

CrossRef Full Text | Google Scholar

Alghazo, S. M. (2015). The role of curriculum design and teaching materials in pronunciation learning. Res Lang 13, 316–333. doi: 10.1515/rela-2015-0028

CrossRef Full Text | Google Scholar

Barrera-Pardo, D. (2004). Can pronunciation be taught? A review of research and implications for teaching. Revista Alicantina de Estudios Ingleses 17, 31–38. doi: 10.14198/raei.2004.17.03

CrossRef Full Text | Google Scholar

Best, C. T., and Tyler, M. D. (2007). “Nonnative and second-language speech perception: commonalities and complementaries” in Second Language Speech Learning: The Role of Language Experience in Perception and Production. eds. M. J. Munro and O.-S. Bohn (Amsterdam: John Benjamins), 13–34.