No clear benefit of transcutaneous auricular vagus nerve stimulation for non-native speech sound learning

Honda, Claire T.; Bhutani, Neha; Clayards, Meghan; Baum, Shari

doi:10.3389/flang.2024.1403080

ORIGINAL RESEARCH article

Front. Lang. Sci., 25 July 2024

Sec. Neurobiology of Language

Volume 3 - 2024 | https://doi.org/10.3389/flang.2024.1403080

No clear benefit of transcutaneous auricular vagus nerve stimulation for non-native speech sound learning

Claire T. Honda^1,2^*

Neha Bhutani³

Meghan Clayards^2,4,5

Shari Baum^2,4

¹Integrated Program in Neuroscience, McGill University, Montreal, QC, Canada
²Centre for Research on Brain, Language and Music, Montreal, QC, Canada
³Revai, Montreal, QC, Canada
⁴School of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada
⁵Department of Linguistics, McGill University, Montreal, QC, Canada

Introduction: Learning to understand and speak a new language can be challenging and discouraging for adults. One potential tool for improving learning is transcutaneous auricular vagus nerve stimulation (taVNS), which modulates perception, memory, and attention systems. It has recently been reported that taVNS can improve English speakers' ability to perceive unfamiliar Mandarin tones. The current project explored the potential benefits of taVNS for language learning beyond tone perception.

Methods: We studied adults' ability to perceive and produce unfamiliar speech sounds as well as any potential change in language learning motivation from pre- to post-training. Forty-five native English speakers were divided into three groups and were trained to perceive German sounds: one group received stimulation during easier-to-learn sounds (vowels), one group received stimulation during harder-to-learn sounds (fricatives), and a control group received no stimulation.

Results and discussion: We did not find evidence that taVNS improved perception or production of the German sounds, but there was evidence that it did improve some aspects of motivation. Specifically, the group that received taVNS during easier sounds showed a significant decrease in feelings of tension/pressure about language learning, while the other groups did not. Overall, the present study does not find that taVNS holds benefits for the acquisition of new speech sounds; however, the field is nascent, and so the potential applications of taVNS for language learning remain to be clarified.

1 Introduction

Many adult learners struggle to attain native-like performance across various measures of linguistic aptitude (e.g., Munro and Mann, 2005; Abrahamsson and Hyltenstam, 2009). Indeed, language learning outcomes tend to worsen as age of acquisition increases (Johnson and Newport, 1989; Pulvermüller and Schumann, 1994; Weber-Fox and Neville, 1996; Kang and Guion, 2006; Abrahamsson and Hyltenstam, 2009; White et al., 2013). One aspect of language acquisition that poses particular challenges for adult learners is the perception and production of new speech sounds (Iverson et al., 2003; Díaz et al., 2012). While infants implicitly learn to differentiate language sounds through unsupervised exposure (Maye et al., 2002), adults can benefit from explicit instruction in the form of supervised phonetic training paradigms (McCandliss et al., 2002; Iverson et al., 2005); and even after training, many adults still show relatively poor differentiation of new speech sounds in perception and production (Strange and Dittmann, 1984; Hanulíková et al., 2012). The maturation of the brain has been argued to be a contributing factor in these age-related learning differences (see Stowe and Sabourin, 2005, for a review). One potential means of increasing the plasticity of the brain and improving learning in adulthood is through the use of neurostimulation techniques such as vagus nerve stimulation (VNS).

The vagus nerve is the longest cranial nerve in the body, reaching from the medulla down to the colon and innervating multiple organs along the way (Yuan and Silberstein, 2016a). The nerve's afferent fibers send sensory input to the vagal nuclei, which then pass the information along to various brain regions implicated in memory, perception, arousal, and affect, including the locus coeruleus, raphe nucleus, amygdala, thalamus, hippocampus, and nucleus accumbens (Sawchenko, 1983; Berthoud and Neuhuber, 2000; Frangos et al., 2015; Yuan and Silberstein, 2016a). By modifying the activity of the vagus nerve through stimulation, it is therefore possible to alter the activity of multiple brain areas and impact their associated functions (Frangos et al., 2015; Yuan and Silberstein, 2016b)—an approach that has advantages over other methods that only modulate localized neural activity (e.g., transcranial magnetic stimulation, transcranial direct current stimulation, or direct chemical stimulation; Bandler, 1969; Hallett, 2000; Thair et al., 2017). Indeed, in both animals and humans, VNS via an implanted electrode can improve memory (Clark et al., 1995, 1999; Ghacibeh et al., 2006; Sun et al., 2017), likely by modulating synaptic plasticity in the hippocampus (Zuo et al., 2007). Similarly, VNS has been shown to increase arousal and alertness as quantified by behavioral and neural measures (Rizzo et al., 2003; Collins et al., 2021). Such increases in arousal appear to be the result of enhanced excitatory activity in the locus coeruleus and other subcortical structures, leading to widespread activation throughout the cortex (Collins et al., 2021). In addition, VNS can improve positive affect by promoting the release of serotonin and noradrenaline from the raphe nucleus and locus coeruleus (Elger et al., 2000; Austelle et al., 2022). The positive effects of VNS on mood are further attested to by its approved use as a treatment for major depressive disorder (Austelle et al., 2022). In terms of modulating perception, rodent and human studies have also demonstrated that VNS can be paired with auditory stimuli to induce lasting, stimulus-specific plasticity in the auditory cortex (Shetake et al., 2012; De Ridder et al., 2014; Engineer et al., 2015; Lai and David, 2021).

Although VNS shows great promise as a neuromodulatory technique, it involves surgically implanting electrodes in the neck, rendering it invasive and inaccessible to the majority of the population. More recently, trancutaneous auricular vagus nerve stimulation or taVNS—a non-invasive counterpart to VNS—has been introduced as a similarly effective means of modulating neural activity (Frangos et al., 2015; Van Leusden et al., 2015; Yakunina et al., 2017). The auricular branch of the vagus nerve passes just under the skin of the outer ear (cymba concha, cymba cavum, external acoustic meatus, and tragus), and so taVNS can be administered in a straightforward and accessible way by placing electrodes against the ear (Frangos et al., 2015; Yakunina et al., 2017; Badran et al., 2018; Butt et al., 2020).

The brain regions affected by taVNS are similar to those affected by VNS; these include the locus coeruleus, raphe nucleus, amygdala, insula, thalamus, hippocampus, and nucleus accumbens (Frangos et al., 2015; Yakunina et al., 2017; Badran et al., 2018). As with VNS, taVNS has been found to modulate human perception and cognition. For example, studies have shown that taVNS can improve memory (Jacobs et al., 2015; Sun et al., 2021; Thakkar et al., 2023), arousal (Sharon et al., 2021; Chen et al., 2023), mood (Ferstl et al., 2022), tinnitus symptoms (Shim et al., 2015), and interoception (Villani et al., 2019), as well as decrease reaction times (Chen et al., 2021).

Learning a new language depends crucially on the ability to attend to and remember newly learned information. Thus, taVNS may hold promise for accelerating language learning due to its effects on arousal and memory. Indeed, preliminary evidence suggests that taVNS can enhance memory for spoken and written word lists under some conditions (Giraudier et al., 2020; Kaan et al., 2021), and can enhance reading skills in adults learning a new orthography (Thakkar et al., 2020). The potential for taVNS to aid language learning is also implied by work showing that arousal state modulates phonetic perception (Schuerman et al., 2022). Beyond its effects on arousal and memory, taVNS may enhance language learning by increasing auditory plasticity in the adult brain. As previously mentioned, invasive VNS can lead to stimulus-specific plasticity in the auditory cortex (Shetake et al., 2012; De Ridder et al., 2014; Engineer et al., 2015); and studies of event-related potentials have found that delivering taVNS during auditory perception tasks can enhance auditory preattentive processing (N1 amplitude; Rufener et al., 2023), selective attention (P3 amplitude; Rufener et al., 2018), and lexico-semantic encoding (N400 amplitude; Phillips et al., 2021), as well as decrease auditory processing time (P3 latency; Rufener et al., 2018). Perhaps taVNS therefore induces complementary benefits to arousal, memory, and auditory processing, which work in concert to facilitate the acquisition of a new language.

Important evidence that taVNS may accelerate language learning in adulthood comes from recent work by Llanos et al. (2020). The authors demonstrated that taVNS, in conjunction with perceptual training, can enhance native English speakers' ability to label certain Mandarin tones (Llanos et al., 2020). They divided participants into three groups: one that received stimulation during easier-to-perceive tones, one that received stimulation during harder-to-perceive tones, and a control group that received no stimulation. They found that taVNS specifically enhanced the labeling of easier-to-perceive tones for the group that was stimulated during those tones; this is likely because taVNS can increase arousal, and arousal specifically enhances memory for perceptually salient (i.e., easier-to-perceive) stimuli (Mather and Sutherland, 2011; Mather et al., 2016; Llanos et al., 2020). Nonetheless, a follow-up study by the same research group did not find any overall differences in tone learning performance between stimulated groups and a control group (McHaney et al., 2023). Exploratory analyses did reveal that participants' tone labeling accuracy increased faster for trials during which taVNS was administered compared to trials without taVNS, and that this effect was most pronounced when taVNS amplitude was low (McHaney et al., 2023). However, these effect sizes were small and there was no main effect of stimulation, so taVNS did not improve overall accuracy (McHaney et al., 2023). Around the same time, Pandža et al. (2020) investigated native English speakers' ability to associate meaning with Mandarin pseudowords that differed in lexical tone. They found that taVNS during Mandarin lexical tone training led to enhanced performance on a subsequent meaning recognition task (Pandža et al., 2020). Additionally, they found that taVNS before training was associated with greater decreases in reaction time on the recognition task and with more accurate performance on a recall task; however, these findings did not hold when taVNS was administered during (rather than before) training. Similar mixed results were obtained by Phillips et al. (2021): taVNS before or during Mandarin lexical tone training did not improve performance on a word learning task, but taVNS before training sped up reaction times on a recognition task, and taVNS during training led to improved recognition of mismatch trials on the recognition task. As such, taVNS appears to have some potential for enhancing lexical tone learning, but effects are not always consistently found; the extent of its efficacy and whether this may generalize beyond the learning of lexical tones remains unclear. Furthermore, while taVNS has shown some benefits for the perception of unfamiliar language sounds, it is not yet known whether those benefits extend to the production of unfamiliar language sounds.

Apart from facilitating language learning itself, taVNS may improve the subjective learning experience through its effects on mood and motivation. Calloway et al. (2020) found that participants who received taVNS prior to being trained on a language learning task showed greater reductions in negative affect from pre- to post-training compared to control participants who did not receive stimulation. Improvements in mood could subsequently impact motivation and learning outcomes, given that mood is a factor affecting learners' perceptions of success and failure during language learning (Williams et al., 2004). taVNS also seems to play a role in motivation, having been shown to increase adults' motivation to obtain food rewards (Neuser et al., 2020). Yet, to our knowledge, the potential impacts of taVNS on language learning motivation have not been investigated to date.

The current project had the broad aim of determining whether taVNS can enhance the learning of unfamiliar non-tonal speech sound contrasts. More specifically, we had 3 objectives: to determine whether taVNS during non-native speech perception training (1) enhances the perception of the trained sounds, (2) enhances the subsequent production of the same speech sounds, and (3) enhances motivation associated with language learning. The first objective was addressed by running a conceptual replication of Llanos et al. (2020) using unfamiliar phonemic contrasts rather than lexical tones. In doing so, we hoped to clarify and extend the previous equivocal findings on taVNS and language learning. To this end, 45 native English speakers were trained on a perceptual labeling task for German front rounded vowels and fricatives. During training, 15 of the participants received stimulation paired with the vowels (“easier” phonemic contrast), 15 received stimulation paired with the fricatives (“harder” phonemic contrast), and 15 received no stimulation (control group). The second and third objectives involved examining the potential impacts of taVNS on elements of language learning that have thus far remained unexplored. To accomplish this, participants completed a German speech production task and a motivation questionnaire pre- and post-training. We anticipated that taVNS would enhance perception and production of the unfamiliar contrasts relative to the control group, and that its effects would be greatest for vowel learning since the vowel contrast is more perceptually salient than the consonant one. We also predicted that participants in the taVNS groups would show greater increases in language learning motivation from pre to post compared to controls.

2 Materials and methods

2.1 Participants

Forty-five adults (33 females, 10 males, two preferring not to answer) were recruited from the Montreal area. This number was chosen based on the similar work of Llanos et al. (2020), who recruited 36 participants; they had 12 participants per group, and we decided to obtain 15 per group in order to try to replicate and extend their results. All participants identified as monolingual English speakers and were unfamiliar with German. Participants were aged 18–35 (mean: 23.0) with normal hearing thresholds in both ears as determined by an audiometric screening, and with no history of literacy, language, or cognitive impairments. People with medical implants, with metal braces, or who were pregnant were excluded for safety reasons. At the beginning of the experiment, participants were randomly assigned to one of three groups: taVNS-vowel (N = 15), taVNS-fricative (N = 15), and Control (N = 15). Participants signed an informed consent form and received monetary compensation ($40). The duration of the entire study was ~1.5 h. The research protocol was approved by the Institutional Review Board of the Faculty of Medicine and Health Sciences of McGill University.

2.2 Tasks

2.2.1 Demographic information

Participants completed a questionnaire about demographics, language history and proficiency, and musical experience, since these factors could influence speech processing. The questionnaire was adapted from the Language history questionnaire (LHQ 2.0; Li et al., 2013) and the Montreal Music History Questionnaire (MMHQ; Coffey et al., 2011). One-way ANOVAs confirmed that the extent of second language (L2) experience and musical experience did not differ significantly across groups. A summary of these ANOVAs and of demographic information for each group can be found in Supplementary Table 1.

2.2.2 Non-native speech perception training

To address our first objective, participants were trained to distinguish a German consonant contrast (palatal vs. postalveolar fricative; ç vs. ʃ) and a German vowel contrast (tense vs. lax high front rounded vowel; yː vs. Y) which are known to be perceptually challenging sounds for native English speakers (Mayr and Escudero, 2010). English speakers tend to perceive both German ç and ʃ as English ʃ (Moulton, 1962), whereas they tend to perceive German yː and Y as English uː and υ, respectively (Strange et al., 2004; Mayr and Escudero, 2010). In line with Best's Perceptual Assimilation Model, which predicts that non-native sounds will be better discriminated when they are assimilated to two different native categories than when they are assimilated to a single native category (Best, 1994), previous work has shown that native English speakers perceive the German vowel contrast more accurately than the fricative one (Honda et al., 2024). The 10 German minimal pairs used in the training task are displayed in Table 1. To construct the stimuli, four native German speakers were recorded (two males, two females). The resulting sound files were edited to leave 20 ms before and after each production, and maximum amplitudes were normalized across speakers using GoldWave version 6.15 (GoldWave Inc, 2015). Each speaker produced each minimal pair once, resulting in a total of 80 speech stimuli (4 speakers × 2 contrasts × 10 words).

Table 1

Table 1. German minimal pairs used in the non-native speech perception training task.

The training procedure was based on that of Llanos et al. (2020), who used a forced-choice task to present stimuli in six training blocks and one generalization block. During training, half of our German stimuli (N = 40, from two speakers—one male, one female) were presented in six blocks, with each stimulus being presented once per block. On each trial, participants heard a stimulus and indicated which phoneme it contained by choosing between two options via mouse click (side of the screen counterbalanced across participants). The palatal vs. postalveolar fricatives were represented by the symbols “ç”/“sh” and the tense vs. lax vowels were represented by the symbols “üː”/“ü” to facilitate learning without needing to teach participants the International Phonetic Alphabet. Visual feedback (“Correct”/“Incorrect”) was provided immediately after each trial. Feedback lasted 1,000 ms, and there were 500 ms between the end of feedback and the onset of the following stimulus. After the six training blocks, participants completed a Generalization block during which they labeled the other half of the stimuli (N = 40, from the other two speakers). There was no feedback or stimulation during the Generalization block. To avoid physical interference with the stimulation electrodes placed on the left ear, audio was delivered monaurally through the right ear with an insert earphone. The Training and Generalization blocks were programmed and presented using E-Prime 3.0.

2.2.3 Electrical stimulation procedure

Transcutaneous stimulation of the vagus nerve occurred during the perception training task. Replicating the procedure used in Llanos et al. (2020), stimulation was delivered through the cymba concha and cavum concha of the outer ear, at a level below each participant's perceptual threshold (as described further below). The participant's left ear was first cleaned with an alcohol swab. Next, silicon putty was molded to the shape of the participant's ear. The molded putty had an indentation across the middle caused by the crus of the helix, demarcating the cymba concha and cavum concha on either side. Two Ag-AgCl disc electrodes were then embedded in the putty in the center of the areas corresponding to the cymba concha (cathode) and cavum concha (anode) and covered with conductive gel. Finally, the mold was pressed into place in the ear. The same experimenter performed the electrode setup for all participants to ensure maximal consistency in the procedure. Electrical stimulation was generated with a BIOPAC STMISOLA Constant Current Isolated Linear Stimulator. Consistent electrode contact was ensured by monitoring the stimulator's “Protect” light, which turns on only when contact is lost (this occurred on occasion during electrode setup and calibration, in which case the setup steps were repeated, but did not occur during training). Stimulation waveforms consisted of 14 biphasic square-wave pulses (150 μs pulse width), delivered at a rate of 25 Hz and with an amplitude no higher than 3 mA for safety reasons. The pulse train began at the onset of the auditory stimulus and continued for 560 ms. These stimulation parameters were selected based on Llanos et al. (2020) who found significant taVNS effects on speech sound learning using the same pulse width, frequency, and amplitude specifications, and they also closely resemble the parameters used in other work (e.g., Engineer et al., 2015; McHaney et al., 2023). Pulses were generated using Matlab (Mathworks, 2017) and transmitted to the stimulator via a Measurement Computing USB-1208HS DAQ card.

Before the non-native speech perception training, each participant's perceptual threshold for the taVNS was identified through a calibration procedure. During calibration, individual pulse trains were delivered with the same parameters described above, starting at 0.1 mA and increasing in steps of 0.2 mA until the participant indicated feeling the stimulation. Amplitude was then decreased in steps of 0.1 mA until the participant no longer felt the stimulation. Each participant's threshold was recorded as the amplitude at which they could reliably begin to feel the stimulation across at least two repetitions of this procedure. During training, stimulation was delivered with a pulse amplitude 0.2 mA below the participant's perceptual threshold. There were no significant differences in pulse amplitude between the two groups that received stimulation (taVNS-vowel: M = 0.61 mA, SD = 0.35 mA; taVNS-fricative: M = 0.67 mA, SD = 0.45 mA; two-sample t-test: t₂₆ = −0.41, p = 0.69). The control group underwent the same setup and threshold determination procedures so that all participants were blind to the condition to which they were assigned.

2.2.4 Non-native speech production task

To address our second objective, participants completed a non-native speech production task before and after the speech perception training. We wanted to test how well the perceptual training transferred to participants' general ability to produce the trained sounds accurately, and not simply their ability to imitate words they had been trained on. We therefore used a consonant-vowel-consonant (CVC) syllable production task and compared pre- and post-training productions. Each trial of the task involved a familiarization component followed by a production component. Participants were first familiarized with a non-native speech sound by hearing it presented in an isolated syllable (/çə/ and /ʃə/ for the fricatives, /yːl/ and /Yl/ for the vowels). Speech sounds were produced by a third male native German speaker, different from the ones used in training. In order to reduce transfer of learning between the production task and the subsequent perceptual training task, we tried to maximize differences between the stimuli by using isolated syllables and a different voice (Bradlow et al., 1997; Baker and Trofimovich, 2006). The auditory presentation of the non-native speech sounds was accompanied by visual presentation of letters on a screen so that participants learned to associate each sound with a simplified version of its corresponding orthography (palatal vs. postalveolar fricatives represented by “ç”/“sh” and tense vs. lax vowels represented by “üː”/“ü” as described in the training task). In this way, the production task also served to familiarize participants with the phonemes that would subsequently be trained. On each trial, after being familiarized with the native exemplar of the speech sound, participants were prompted to produce the sound within a CVC syllable. There were six different CVCs to produce for each sound (6 × 4 sounds = 24 productions total), and CVCs were written using English orthography combined with the letters that had been associated with the German sounds (ç, sh, üː, ü). Trials were blocked so that all six trials for a given sound appeared together, and block order was randomized. Table 2 displays the full list of CVCs used. Participants' productions were recorded using a headset microphone (Logitech, Switzerland). For each participant, all 24 productions were recorded before and after the training phase. The production task was programmed and presented using E-Prime 3.0.

Table 2

Table 2. German CVCs used in the non-native speech production task.

Recordings of participants' productions were subsequently presented to three native German speakers. The native speakers completed ratings of the recordings at home on Gorilla Experiment Builder (www.gorilla.sc; Anwyl-Irvine et al., 2020), using their own headphones. Productions were presented in three sessions of ~1 h each; all three sessions were completed within 1 week, with a minimum break of 1 h between sessions to avoid fatigue. Within each session, productions from one third of participants (five per group) were randomly mixed, and productions from pre- and post-training were randomly mixed. For each production, the native speakers indicated which sound they heard via mouse click [2-alternative forced choice (2AFC) between the two sounds that make up the contrast]. In addition, the three native speakers rated the quality of each production using a 7-point Likert scale (1 = poor, 7 = native-like). These quality ratings provided a more fine-grained measure of pronunciation ability. While acoustic analyses could also have been used to rate participants' productions, native speaker ratings were chosen because they provide a global accuracy measure that accounts for a variety of acoustic and articulatory dimensions which would be difficult to examine individually in isolation.

2.2.5 Language learning motivation

To address our third objective, we measured participants' motivation to learn foreign languages using a modified version of the Intrinsic Motivation Inventory (IMI: McAuley et al., 1989). Four items from three subscales (Interest/Enjoyment, Perceived Competence, Tension/Pressure) were used (following Saito, 2021), as shown in Table 3. Participants indicated how true the items were for them on a 7-point Likert scale (1 = not at all true, 7 = very true). Higher scores represent higher motivation for Interest/Enjoyment and Perceived Competence, while lower scores represent higher motivation for Tension/Pressure. The IMI was administered pre- and post-training.

Table 3

Table 3. Intrinsic Motivation Inventory (IMI), modified for the current study.

3 Analyses and results

The raw data and all code used to process and analyze it is publicly available on the OSF (https://osf.io/fdsaz/?view_only=d4f9ea6ef9804606b4972b3488981dc3). Details and output of all analyses described below can be found in the R Markdown document on the same OSF page. For all regression models reported in this experiment, the maximal random effects structure was used (Barr et al., 2013) unless the model failed to converge, in which case random effects were removed one at a time until model convergence was achieved. Note that in cases where our analyses were following previous work by Llanos et al. (2020) and McHaney et al. (2023), our maximal model is reported in the main text but an additional model with exactly the same structure as in the previous work was also fit for the sake of replication. The additional models' output can be found in the R Markdown document. All analyses were carried out in R (R Development Core Team, 2020): mixed-effects logistic models were fit using the package lme4 (Bates et al., 2015), linear mixed-effects models were fit using the package lmertest (Kuznetsova et al., 2017), and ordinal mixed-effects models were fit using the package ordinal (Christensen, 2022).

3.1 Effects of stimulation on perception

3.1.1 Accuracy improvement during training

3.1.1.1 Analyses

To assess the potential effects of taVNS on the learning of non-native sounds, a mixed-effects logistic regression model was fit similarly to that found in Llanos et al. (2020). The dependent variable was trial-level responses (correct/incorrect) for each participant during the training blocks. Fixed effects consisted of group (taVNS-vowel, taVNS-fricative, and Control = reference level), trial number (1–240; centered and divided by 2 SD), and contrast (fricative = 0.5, vowel = −0.5), along with all two- and three-way interactions among those three variables. For this analysis, the maximal model that converged included by-subject random intercepts and by-subject random slopes of trial number, contrast, and the interaction between trial number and contrast, without correlations between random effects. The group-by-trial interaction revealed whether the taVNS groups showed greater improvement over the course of training compared to the control group.

Following McHaney et al. (2023), another mixed-effects logistic regression model was fit to determine whether participants' performance during the training blocks depended on the amplitude of taVNS received or the type of trial (stimulated vs. unstimulated). The dependent variable was trial-level accuracy for participants in the two stimulation groups. There were fixed effects of trial type (stimulated = 0.5, unstimulated = −0.5), trial number, amplitude (centered and divided by 2 SD), and all two- and three-way interactions among these variables. Random effects consisted of by-subject and by-stimulus random intercepts, by-subject random slopes of trial type and trial number, and by-stimulus random slopes of trial number and amplitude, without correlations between random effects. A linear mixed-effects regression model was also fit predicting the retention of correct stimulus-response associations across blocks (as done in Llanos et al., 2020), to determine whether stimulation improved retention over time. The dependent variable for this model was the percentage of trials correctly labeled on both the current block and the previous block, starting at block 2. Fixed effects consisted of group, block (2–6; block 2 = reference level), and contrast, as well as all two- and three-way interactions among them. Random effects consisted of by-subject random intercepts and by-subject random slopes of contrast, without correlations between random effects.

3.1.1.2 Results

Figure 1 displays accuracy over the course of training, at the individual and group level. We can see that accuracy improved for all three groups and for both speech sounds over the course of training. For the model predicting trial-level accuracy during training, there was a significant effect of trial number, indicating that the control group improved their performance over time across both contrasts ( $\hat{β}$ = 0.619, p < 0.001); we can conclude that training resulted in learning even when no stimulation was administered. No other significant effects were found; see Table 4 for a full model summary. Importantly, there were no significant group-by-trial number interactions; this demonstrates that although all three groups showed improved performance over time, the stimulation groups did not show greater improvement compared to the control group.

Figure 1

Figure 1. Accuracy over time on the training task (from the first to the last block), for individuals and groups. Thin translucent lines represent individual participants' data, whereas thick solid lines represent aggregate group data. Vertical lines denote 95% confidence intervals for group accuracy on each block. Note the high levels of individual variability. Overall, accuracy improved over the course of training for all groups, and there were no significant group differences.

Table 4

Table 4. Summary of the mixed-effects logistic regression model predicting accuracy on the training task.

For the model including stimulation amplitude and trial type as fixed effects, there was again only a significant effect of trial number ( $\hat{β}$ = 0.684, p < 0.001); performance on the perception task improved during training regardless of the stimulation condition, and these improvements in performance did not depend on stimulation amplitude. The lack of a significant main effect of trial type (stimulated vs. unstimulated) also indicates that stimulation failed to improve training performance. See Supplementary Table 2 for full model output. Supplementary Figure 1 shows the similar accuracy trajectory over time for stimulated and unstimulated trials. For the model predicting retention of correct stimulus-response associations, stimulation did not significantly predict retention rates. The only significant predictor was block (p = 0.011 for block 3 retention compared to block 2 retention, p < 0.005 for the other three blocks compared to block 2 retention), indicating better retention as training progressed (i.e., learning). See Supplementary Table 3 for model output and Supplementary Figure 2 for each group's retention rates across blocks. Overall, these analyses converge on the conclusion that performance on the training task improved over time but was not affected by stimulation.

3.1.2 Reaction times during training

3.1.2.1 Analyses

Given that taVNS has in some cases been shown to decrease reaction times (RTs; Pandža et al., 2020; Chen et al., 2021; Phillips et al., 2021) and increase post-error slowing (PES; Sellaro et al., 2015), analyses were run to determine the potential effects of stimulation on RTs and PES during training. For the RT analysis, trials on which participants responded incorrectly were removed (19% of trials) along with trials where RTs were < 200 ms or > 2.5 SD above the participant's mean (a further 2% of trials), following Giannakopoulou et al. (2017). The distribution of raw RT values was positively skewed, so RTs were log transformed. A mixed-effects linear regression model was fit with the resulting cleaned and transformed RTs as the dependent variable. Fixed effects consisted of group, trial number, contrast, and all two- and three-way interactions between them. Random effects consisted of by-stimulus and by-subject random intercepts, as well as by-subject random slopes of trial number, contrast, and the interaction between trial number and contrast, without correlations between random effects. To calculate PES values, the RT on a trial following an error was subtracted from the RT on the trial preceding that error. A linear mixed-effects regression model was then fit predicting PES values, with group as a fixed effect and by-stimulus random intercepts as the maximal random effects structure promoting convergence and non-singularity.

3.1.2.2 Results

For the RT model, the only significant predictor was trial number ( $\hat{β}$ = −233.656, p < 0.001), indicating that participants in the control group became faster at responding to both contrasts as the task progressed. The lack of group-by-trial number interaction indicates that stimulation did not affect this decrease in reaction times. See Supplementary Table 4 for model output and Supplementary Figure 3 for each group's RTs plotted against trial number. For the PES model, no significant predictors were found; stimulation did not increase post-error slowing. See Supplementary Table 5 for model output and Supplementary Figure 4 for PES values per group.

3.1.3 Accuracy during generalization block

3.1.3.1 Analyses

As in Llanos et al. (2020), a mixed-effects logistic regression model was fit with the dependent variable being trial-level accuracy in the generalization block and in block 1 of training. Fixed effects were group, block (generalization, block 1 = reference level), and contrast, along with all two- and three-way interactions between them. Random effects consisted of by-stimulus and by-subject random intercepts, along with by-subject random slopes of block, contrast, and the interaction between block and contrast. The group-by-block interaction enabled us to determine whether stimulation groups showed better generalization of their learning after accounting for baseline performance during block 1 of training.

3.1.3.2 Results

Accuracy during block 1 and the generalization block is displayed in Figure 2. Table 5 shows the output of the model predicting accuracy during block 1 and the generalization block. There were no significant effects of group in block 1, demonstrating that groups did not differ in baseline performance across contrasts at the beginning of training. Block was the only significant predictor ( $\hat{β}$ = 1.022, p < 0.001), indicating that performance was better during the generalization block than during block 1 of training—in other words, participants successfully learned and generalized their learning to new voices, regardless of group.

Figure 2

Figure 2. Accuracy on block 1 of training and on the generalization block, for individuals and groups. Thin translucent lines represent individual participants' data, and thick solid lines represent aggregate group data. Vertical lines denote 95% confidence intervals for group accuracy on each block. Note the high levels of individual variability. Overall, accuracy improved from block 1 to the generalization block, and there were no significant group differences.

Table 5

Table 5. Summary of the mixed-effects logistic regression model predicting accuracy on block 1 and the generalization block.

3.2 Effects of stimulation on production

3.2.1 Analyses

Interrater reliability measures were obtained for the native German speakers' ratings of the production data. Fleiss' Kappa was calculated for the 2AFC ratings and the intraclass correlation coefficient (ICC; two-way random-effects model) was calculated for the Likert ratings (Gisev et al., 2013; Koo and Li, 2016). These calculations revealed acceptable reliability for both rating types (Fleiss' Kappa = 0.384; ICC = 0.594) based on established guidelines (“fair” to “substantial” according to Landis and Koch, 1977).

For the 2AFC ratings, a mixed-effects logistic regression model was fit with trial-level accuracy as the dependent variable. Fixed effects were group, time (post, pre = reference level), contrast, and all two- and three-way interactions among them, as well as rater (first rater = reference level). Random effects consisted of by-subject random intercepts and by-subject random slopes of time, contrast, and the interaction between time and contrast, without correlations between random effects. The group-by-time interaction revealed whether stimulation groups showed greater increases in production accuracy from pre to post compared to the control group.

For the Likert ratings, an ordinal mixed-effects regression model was fit. Fixed effects were the same as for the model predicting 2AFC ratings. Random effects consisted of by-subject random intercepts and by-subject random slopes of time, contrast, and the interaction between time and contrast. The group-by-time interaction revealed whether native speakers' ratings of the productions increased more from pre to post for the stimulation groups compared to the control group.

3.2.2 Results

Figure 3 shows the differences in ratings of productions from pre- to post-training, for the 2AFC ratings (Figure 3A) and the Likert ratings (Figure 3B). For the model predicting 2AFC ratings, rater was a significant predictor, with raters 2 and 3 tending to rate pre-training productions from participants in the control group more accurately compared to rater 1 ( $\hat{β}$ = 0.162, p = 0.014 for rater 2; $\hat{β}$ = 0.215, p = 0.001 for rater 3). Table 6 displays the full output of the model. For the model predicting Likert ratings, rater was again a significant predictor, with rater 3 giving higher ratings than rater 1 ( $\hat{β}$ = 0.785, p < 0.001) for pre-training productions from control group participants. Contrast was also a significant predictor, with rater 1 giving higher ratings to fricatives than to vowels ( $\hat{β}$ = 0.612, p < 0.001). This effect was driven by high ratings for the fricative ʃ since it also occurs in English. Finally, time was also a significant predictor ( $\hat{β}$ = 0.230, p = 0.005) and did not interact with other predictors, revealing that ratings of production quality increased from pre- to post-training regardless of group. Full model output can be found in Table 7. Across both models, the lack of group effects or of group-by-time interactions suggests that the administration of taVNS during perceptual training did not specifically improve the subsequent production of the trained sounds.

Figure 3

Figure 3. Native German speakers' ratings of participants' productions pre- and post-training. (A) Accuracy on a 2-alternative forced choice task. (B) Ratings on a 7-point Likert scale (1 = poor, 7 = native-like). Thin translucent lines represent individual participants' data, whereas thick solid lines represent aggregate group data. Vertical lines denote 95% confidence intervals. There was a significant increase in Likert ratings but not in accuracy judgments of productions from pre- to post-training, and there were no significant differences between groups.

Table 6

Table 6. Summary of the mixed-effects logistic regression model predicting accuracy of native speakers' ratings of participants' productions.

Table 7

Table 7. Summary of the ordinal mixed-effects regression model predicting native speakers' Likert ratings of participants' productions.

3.3 Effects of stimulation on motivation

3.3.1 Analyses

To determine the potential effects of taVNS on language learning motivation, ordinal mixed-effects models were fit as in Saito (2021). Fixed effects consisted of group, time (post, pre = reference level), and the interaction between the two, while random effects consisted of by-item random intercepts, by-subject random intercepts, and by-subject random slopes of time. The group-by-time interaction revealed whether stimulation groups showed greater motivation increases from pre to post compared to the control group. A first model was fit predicting responses to all items of the IMI. Three follow-up models were then fit predicting responses to each subscale: one model had an identical structure to the overall model, while the other two had only by-subject random intercepts to avoid singular fits. These models allowed us to investigate whether different aspects of motivation might be differentially affected by stimulation. Given that this follow-up analysis involved fitting more than one model to the same dataset, Bonferroni corrections were performed on the resulting p-values by dividing alpha by the number of comparisons being made (0.05/3 = 0.017).

3.3.2 Results

Individual- and group-level motivation scores on each subscale pre- and post-training are displayed in Figure 4. As was the case for the perception and production measures, considerable individual variability in motivation scores was found. There were no significant predictors for the model predicting responses across all subscales (Supplementary Table 6). For the model predicting responses on the Tension/Pressure subscale, there was a group-by-time interaction such that the taVNS-vowel group showed greater increases in scores (indicating reduced feelings of tension and pressure) compared to the control group ( $\hat{β}$ = 1.042, p = 0.029). Although this interaction was not significant after Bonferroni correction (p < 0.017), post-hoc pairwise comparisons between each group's scores pre- and post-training revealed that the taVNS-vowel group did show a significant difference from pre to post ( $\hat{β}$ = −0.846, p < 0.001), which was not the case for the other groups. Stimulation during easier-to-learn sounds may therefore have decreased participants' feelings of tension and pressure associated with language learning. Output of the model and of the post-hoc comparisons can be found in Table 8.

Figure 4

Figure 4. Motivation scores on each subscale from pre- to post-training, for individuals and groups. Higher scores indicate increased feelings of enjoyment and competence and decreased feelings of pressure. Thin translucent lines represent individual participants' data, whereas thick solid lines represent aggregate group data. Vertical lines denote 95% confidence intervals. Note again the high levels of individual variability. There were no differences between groups except on the pressure subscale, where the taVNS-vowel group showed a significant increase in scores while the other two groups did not.

Table 8

Table 8. Summary of the ordinal mixed-effects regression model and post-hoc pairwise comparisons predicting participants' motivation ratings on the pressure/tension subscale.

4 Discussion

The goal of the present study was to investigate whether the administration of taVNS during non-native speech perception training could enhance (1) participants' perception of the trained sounds, (2) their subsequent production of those same sounds, and (3) their language learning motivation. Native English speakers underwent training to perceive unfamiliar German vowels and fricatives, and were randomly assigned to one of three groups: taVNS-vowel (stimulation during easier-to-perceive sounds), taVNS-fricative (stimulation during harder-to-perceive sounds), or Control (no stimulation). Participants completed a German speech production task and a language learning motivation questionnaire before and after training. Contrary to our expectations, we did not find clear benefits of taVNS for perception, production, or motivation. However, it is possible that taVNS during training may have specifically improved language learning motivation related to pressure/tension. In particular, the taVNS-vowel group showed reduced feelings of pressure from pre- to post-training, indicating that stimulation during easier-to-perceive sounds may alleviate certain negative feelings associated with learning a new language in adulthood. The fact that taVNS was found to affect feelings of pressure but not the other motivation subscales measured (enjoyment and competence) is in line with previous work showing that taVNS prior to language learning tasks decreased negative affect but did not increase positive affect (Calloway et al., 2020). Nevertheless, as elaborated upon below, the potential benefits of taVNS for language learning and motivation remain largely uncertain; further research is called for to clarify and extend these findings.

4.1 Lack of evidence for taVNS-related improvement in non-native phonetic perception

While we had hypothesized that taVNS might improve the perception of non-native phonemic contrasts, there are several potential reasons why this did not end up being the case. Given that music experience has been shown to predict the success of non-native sound learning (e.g., Slevc and Miyake, 2006; Perfors and Ong, 2012), one potential concern is that differences in our groups' musical backgrounds could have influenced our results. While an ANOVA revealed no significant differences between groups as mentioned earlier, the taVNS-fricative group had the fewest years of music training and the control group had the most (see Supplementary Table 1). To verify whether these numeric differences could be playing a role in the efficacy of stimulation, a mixed-effects logistic regression model was fit. The model had the same structure as the aforementioned model predicting accuracy during training, but included years of music training (centered and divided by 2 SD) as an additional fixed effect. As with the model that did not include music training, trial number was the strongest predictor, indicating learning over time ( $\hat{β}$ = 5.195, p < 0.001); and music training was a significant predictor ( $\hat{β}$ = 2.376, p = 0.018), pointing to the benefits of music experience for speech sound learning (see R Markdown document on the OSF page for full model output). However, no significant main effects of group or group-by-trial interactions were found; even after accounting for music training, taVNS did not improve overall accuracy or accelerate learning across trials.

An additional concern is that participants generally showed quite accurate performance on the training task, even from the beginning (see Figure 1). It is possible that stimulation-related effects were not observed because there was limited room for improvement in performance. To partly alleviate this concern, an additional mixed-effects logistic regression model was fit with the same structure as the main model predicting accuracy during training, but with block 1 accuracy (centered and divided by 2 SD, and excluding accuracy on block 1 trials) as an additional fixed effect. As with the other training models, trial number was the strongest predictor ( $\hat{β}$ = 3.939, p < 0.001), and there were no significant main effects of group or group-by-trial interactions (see R Markdown document). Thus, although many participants had limited room for improvement during training, it does not appear that any potential effects of stimulation depended on initial accuracy.

Having addressed these concerns through additional analyses, we may turn to other possible explanations for our null findings. Previous work with taVNS and speech sound learning has focussed only on tonal contrasts (Llanos et al., 2020; Pandža et al., 2020; Phillips et al., 2021; McHaney et al., 2023). It is possible that taVNS does not benefit the learning of phonemic contrasts in the same way that it may benefit the learning of tonal contrasts. Evidence suggests that during the processing of novel phonemic contrasts, learners show neural activity in the same regions of the left hemisphere that are activated by native phonemic contrasts (Golestani and Zatorre, 2004). In contrast, during the processing of novel tonal contrasts, learners show neural activity in regions of the right hemisphere that are commonly associated with non-linguistic pitch processing (Hsieh et al., 2001). Perhaps our results differ from those of previous taVNS and speech sound learning studies due to differences in how novel tonal vs. phonemic contrasts are processed. Furthermore, it is likely that native English speakers show different perceptual assimilation patterns for the phonemic contrasts used in the current study compared to the tonal contrasts used in previous work (Strange et al., 2004; So and Best, 2014); differences in assimilation could affect the discriminability of the contrasts (Best and Tyler, 2007), leading to differing effects of taVNS on the acquisition of tonal vs. phonemic contrasts. Relatedly, in the current work it was only feasible to test participants with one language background on their acquisition of one subset of non-native phonemic contrasts. Future research could test participants with a variety of backgrounds and demographic characteristics on their acquisition of different speech sounds in order to determine whether any potential taVNS effects may be moderated by factors such as participant language experience, age, or the relationship between the native and non-native languages. Another interesting avenue for future work would be to examine how taVNS affects measures of linguistic and non-linguistic memory and attention, both in the auditory and visual domains (e.g., pairing taVNS with reading or sign language training as well as with other non-linguistic cognitive tasks), in order to disentangle the specificity of stimulation-related effects and determine the optimal situations for taVNS use. It will be important for researchers to carefully design the methodology of future experiments with a view to detecting and differentiating the particular mechanism(s) of taVNS-related improvement that are expected to be at play (e.g., enhancements to auditory plasticity vs. arousal vs. memory).

It should also be noted that although the present study was conceived as a conceptual replication of previous work on tonal contrast learning (Llanos et al., 2020), some differences in study design may account in part for discrepancies between our findings and those of prior studies. For instance, in Llanos et al. (2020), stimulation began 300 ms prior to stimulus onset, and stimuli consisted of single syllables in which the trained speech sound reliably occurred in the same position. In the present work, mono- and disyllabic stimuli were used in which the trained speech sound could occur on either the first or the second syllable; this variability in the position of the trained sound may have reduced the salience of our non-native contrasts. Differences in stimulation timing and in stimulus salience may therefore in part explain some of the disparities between our findings and those of Llanos et al. (2020). Nonetheless, participants' overall high accuracy on our training task (as mentioned above) suggests that the non-native contrasts were salient enough to be effectively acquired. Beyond this, our training task provided participants with two response options, whereas Llanos et al. (2020) provided participants with four response options. As such, participants were more likely to respond correctly by chance in our study, which may have reduced our ability to detect taVNS-related improvements in perception because accuracy scores during training were less variable. Note, however, that Figures 1, 2 show widespread variability in performance all the same. Finally, in the current work and in Llanos et al. (2020), a single training session was administered; in contrast, in Pandža et al. (2020) and Phillips et al. (2021), two training sessions were administered on separate days. Perhaps taVNS is more effective when paired with speech sound training that spans more than a single session and that includes an opportunity for memory consolidation in between. This being said, given that even the 2-day training paradigms used in previous work bear limited resemblance to true language learning which occurs over long timescales, longitudinal work is needed to assess the effects of taVNS on more naturalistic learning.

It is plausible that taVNS does hold potential for improving speech sound learning, but that the optimal stimulation parameters for language learning have not yet been identified. Administration of taVNS entails the selection of various parameters including stimulation amplitude, pulse width, frequency, duration, and timing relative to stimulus presentation. Other work with taVNS has employed a wide range of parameters. For example, stimulation amplitude has sometimes been much higher than in the present study (>4 mA; Jacobs et al., 2015; Liu et al., 2018; Kaan et al., 2021); frequency has ranged from 5 Hz (Thakkar et al., 2020) to 300 Hz (Phillips et al., 2021); stimulation has sometimes been delivered before (Calloway et al., 2020), during (Llanos et al., 2020; McHaney et al., 2023), or after (Clark et al., 1999) a learning task; stimulation has been administered in short targeted bursts (< 1 s duration; De Ridder et al., 2014; Engineer et al., 2015; Llanos et al., 2020; McHaney et al., 2023) as well as continuously over an extended time period (>10 min duration; Ventura-Bort et al., 2018; Calloway et al., 2020; Kaan et al., 2021); and stimulation electrodes have been placed on the cymba concha and cavum concha (Llanos et al., 2020) or on the outer ear canal (Phillips et al., 2021). While we selected parameters following previous work on taVNS and language learning (Llanos et al., 2020), the field remains nascent and more research is needed to identify the most appropriate taVNS parameters for different desired outcomes. Future work could systematically compare the effects of different taVNS parameters on language learning in order to clarify whether stimulation has benefits and whether certain parameters may be more effective than others.

Another possibility is that taVNS may not be a reliable means of improving the learning of new speech sound contrasts, even tonal ones. The studies that have examined taVNS and tonal contrast learning to date have shown mixed outcomes, as reviewed in the introduction. The present study aimed to conceptually replicate the work of Llanos et al. (2020), who reported faster rates of Mandarin tone learning for participants receiving taVNS compared to controls. However, the same research group recently published a partial replication of their own work in which they did not find that tone learning rates differed significantly by experimental group (McHaney et al., 2023). Their additional exploratory analyses revealed only modest effects whereby taVNS at lower amplitudes initially increased learning rates during the training task, but without increasing overall accuracy on the task (McHaney et al., 2023). The other existing work on taVNS and tone learning comes from a research group that has administered taVNS under two different conditions (before vs. during Mandarin lexical tone training) and has tested a variety of outcome measures including accuracy and reaction times on word learning, lexical recognition, and recall tasks (Pandža et al., 2020; Phillips et al., 2021). The researchers only found stimulation-related benefits for a few of the many possible combinations of conditions and outcome measures. For example, Pandža et al. (2020) found that taVNS during (but not before) training was significantly associated with greater accuracy (but not decreased reaction times) on certain trials of the recognition task (but not the recall task). In a similar vein, Phillips et al. (2021) found that taVNS before (but not during) training was significantly associated with decreased reaction times (but not increased accuracy) on the recognition task (but not the word learning task). If taVNS were an effective method for improving tone learning, its effects might be expected to emerge in a more robust and uniform way across conditions and studies. These results also suggest that future work should carefully consider and compare different experimental conditions and outcome measures, since effects may depend on the nature of the stimulation and the tasks being administered.

Although we did not find specific benefits of taVNS for non-native speech sound perception, it is worth noting that our analyses converged on the conclusion that participants did in fact improve their perception of the non-native sounds over the course of training, regardless of group. While learning to perceive unfamiliar phonemic contrasts in adulthood is often difficult, it is clear that supervised training paradigms such as this one can facilitate the learning process, as has also been found in previous work (e.g., Bradlow et al., 1997; Iverson et al., 2005; Giannakopoulou et al., 2017; Reetzke et al., 2018). Regardless of the effectiveness of taVNS, it will be fruitful for future research to continue to investigate the optimal paradigms for training non-native perception in adults.

4.2 Lack of evidence for taVNS-related improvement in non-native phonetic production

As with non-native perception, no group effects were observed for our non-native production measures; taVNS did not specifically improve production from pre- to post-training. This outcome is perhaps not surprising given that no other studies to date have yet investigated taVNS and non-native production, and that the previous findings about taVNS and non-native perception have been mixed. In the present study, as in prior work, stimulation was delivered during perceptual training. Accordingly, any stimulation-related effects would be anticipated to emerge most notably for perceptual outcome measures; considering that no such effects on perception were found, it is unsurprising that no effects on production were found either.

As discussed above, the perceptual training task resulted in overall improvement in non-native perception regardless of experimental group. On the other hand, improvement in non-native production was less clear. Native speaker ratings of participants' productions did not improve pre- to post-training for the ratings involving a forced choice between the two sounds making up a contrast. For the more fine-grained measure where productions were rated on a seven-point scale from “poor” to “native-like”, statistically significant improvement was found, though the size of the effect was not large (see Figure 3B). Our findings are in line with Sakai and Moorman's (2018) recent meta-analysis of the effects of perceptual training on non-native production. The authors found that perceptual training resulted in medium-sized effects on perception outcomes and in small effects on production outcomes (Sakai and Moorman, 2018). As such, non-native perception and production are understood to be linked, but perceptual training does not necessarily lead to reliable or significant improvements in production. It should also be noted that our production task differed in format from the perception training, which may account in part for the lack of strong improvement in production post-training. The training task involved listening to non-native words (Table 1), whereas the production task involved hearing and seeing isolated non-native phonemes as exemplars and then producing CVCs (Table 2). These differences arose because the production task was designed so that participants would not directly repeat or imitate the exemplar and so that the stimuli would be feasible to produce for inexperienced learners. In future studies with participants who have greater non-native language experience, perception and production tasks could be made more similar in order to specifically examine the effects of perception training on production. For instance, the same words could be used as stimuli during the perceptual training task and the production task (e.g., Brosseau-Lapré et al., 2013). Since our task did not measure spontaneous speech production, additional work will also be needed to determine the relationship between our training paradigm and more naturalistic production measures.

4.3 Evidence for taVNS-related improvement in language learning motivation

When looking across all items of our motivation questionnaire, we did not find effects of taVNS on language learning motivation. However, when focussing on the items belonging to the tension/pressure subscale of the questionnaire, we did find an effect: from pre- to post-training, the taVNS-vowel group showed a significant decrease in feelings of tension and pressure associated with language learning, which was not the case for the other two groups. Recall that the taVNS-vowel group received stimulation during easier-to-perceive (vowel) sounds. Llanos et al. (2020) also found taVNS effects specifically for the group stimulated during easier-to-perceive non-native sounds—this group showed enhanced learning over the course of non-native perception training. The authors argued that this finding emerged because taVNS increases arousal, and such modulation of arousal can specifically enhance memory consolidation for more perceptually salient stimuli (Llanos et al., 2020). While we did not find enhanced learning for the taVNS-vowel group, the fact that the group's language learning motivation increased could similarly relate to the perceptual saliency of the stimuli. Perhaps participants naturally felt more capable and relaxed when responding to the vowel trials on the training task because the vowel contrast was more perceptually salient, and so the administration of taVNS during those trials served as a reinforcement signal that modulated neural activity related to affect and reward, in turn leading to decreased feelings of tension post-training.

Prior work supports a role for taVNS in decreasing feelings of tension associated with language learning. There is preliminary evidence that taVNS may improve fear extinction after a fear conditioning task (Burger et al., 2017) and reduce spontaneous negative thoughts after a worry induction task (Burger et al., 2019). The technique may therefore have the potential to lessen participants' overall feelings of fear and of worry. taVNS has additionally been found to increase participants' confidence in their ability to perform a task successfully (Villani et al., 2019). In the particular context of language learning, administration of taVNS prior to a second language learning task has in some cases been demonstrated to reduce negative affect and anxiety (Calloway et al., 2020). All of these findings point to taVNS as a possible means of reducing the stress and tension that can be felt by adults during the language learning process.

At the neural level, these positive effects make sense given that taVNS has been shown to modulate the activity of various brain regions and networks involved in affect and motivation, including the locus coeruleus, raphe nucleus, and limbic system (Frangos et al., 2015; Yakunina et al., 2017; Badran et al., 2018). In clinical contexts, VNS is known to promote the release of serotonin from the raphe nucleus, leading to improved mood (Austelle et al., 2022). The reductions in feelings of pressure and tension observed in our taVNS-vowel group may be attributable in part to such changes in neural activity and in neurotransmitter release. Note, however, that this explanation remains speculative; it was beyond the scope of the current study to measure neural activity patterns or neurotransmitter levels. Future studies could consider including such additional measures to untangle the potential mechanisms whereby taVNS increases language learning motivation. It is also not entirely clear why taVNS would selectively reduce feelings of pressure/tension without affecting the other subscales of motivation measured here (namely, interest/enjoyment and perceived competence). More research is needed to clarify the generalizability of our findings.

In conjunction with language aptitude, motivation is known to be an important factor in predicting language learning outcomes (e.g., Gardner, 2000; Dörnyei, 2001; Saito et al., 2018). Individuals who are motivated—for example, who are willing to expend effort in learning a language, who want to achieve a high level of competence in the language, and who have favorable attitudes toward the learning situation—tend to have greater non-native language achievement (Masgoret and Gardner, 2003). A meta-analysis found that, across different ages and learning environments, the correlation between motivation and second language achievement ranges from around 0.29–0.39 depending on the particular measure of achievement (Masgoret and Gardner, 2003). This effect size is considered small to medium based on Plonsky and Oswald's (2014) conventions for second language research, or medium to large based on Gignac and Szodorai's (2016) conventions for individual differences research. As such, the role of motivation in language acquisition is non-negligible, and if taVNS truly does impact motivation then this could have important repercussions for adult learners who are struggling to acquire a new language. Future work with taVNS could examine the construct of language learning motivation in greater detail, employing more extensive measures of motivation associated with conceptual frameworks such as the L2 motivational self system (Dörnyei, 2009) and the socio-educational model of second language acquisition (Gardner, 2000).

5 Conclusion

In sum, we examined the potential effects of taVNS on the perception and production of non-native phonemic contrasts and on language learning motivation. taVNS had previously shown positive (but inconsistent) effects on the perception of non-native tonal contrasts (Llanos et al., 2020; Pandža et al., 2020; Phillips et al., 2021; McHaney et al., 2023), and so we sought to determine whether such effects might extend to the perception of phonemic (non-tonal) contrasts. To our knowledge, this is also the first time that taVNS has been investigated in relation to non-native phonetic production or language learning motivation. Overall, no clear effects of taVNS on non-native perception or production emerged. Our results hint at a potential benefit of taVNS for language learning motivation—in particular, stimulation during the learning of easier-to-perceive sounds may decrease feelings of tension and pressure associated with language learning. Nevertheless, taVNS did not increase overall motivation across the three subscales of our motivation questionnaire or across our two stimulated groups, so its efficacy is not clear. On the whole, while taVNS is a promising technique with a multitude of potential applications, from treatment of epilepsy (Liu et al., 2018) to relief of tinnitus symptoms (Shim et al., 2015), its usefulness in the context of language learning remains to be determined. Research with taVNS is still just beginning to emerge, and the stimulation parameters and outcome measures used in previous work have been heterogeneous; going forward, it will be important to systematically compare a variety of stimulation conditions and language acquisition outcomes in an effort to more conclusively determine any possible uses of taVNS in language learning contexts. Improving both language acquisition and motivation is an especially important endeavor given the plurality of adults who are now learning new languages in our diverse and globalized world.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Institutional Review Board of the Faculty of Medicine and Health Sciences of McGill University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

CH: Conceptualization, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft. NB: Conceptualization, Methodology, Supervision, Writing – review & editing. MC: Conceptualization, Methodology, Supervision, Writing – review & editing. SB: Conceptualization, Methodology, Supervision, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by a grant awarded by the Natural Sciences and Engineering Research Council of Canada (NSERC) to SB, a grant awarded by the Social Sciences and Humanities Research Council (SSHRC) to MC, and an NSERC Postgraduate Scholarship-Doctoral (PGS D) grant along with a Mitacs Accelerate Award awarded to CH.

Conflict of interest

NB is the co-founder of Revai working on developing taVNS for cognitive enhancement. While she assisted with designing the study as outlined under the Author contributions section, she was not involved in analyzing or visualizing the data.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/flang.2024.1403080/full#supplementary-material

References

Abrahamsson, N., and Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: listener perception versus linguistic scrutiny. Lang. Learn. 59, 249–306. doi: 10.1111/j.1467-9922.2009.00507.x

No clear benefit of transcutaneous auricular vagus nerve stimulation for non-native speech sound learning

1 Introduction

2 Materials and methods

2.1 Participants

2.2 Tasks

2.2.1 Demographic information

2.2.2 Non-native speech perception training

2.2.3 Electrical stimulation procedure

2.2.4 Non-native speech production task

2.2.5 Language learning motivation

3 Analyses and results

3.1 Effects of stimulation on perception

3.1.1 Accuracy improvement during training

3.1.1.1 Analyses

3.1.1.2 Results

3.1.2 Reaction times during training

3.1.2.1 Analyses

3.1.2.2 Results

3.1.3 Accuracy during generalization block

3.1.3.1 Analyses

3.1.3.2 Results

3.2 Effects of stimulation on production

3.2.1 Analyses

3.2.2 Results

3.3 Effects of stimulation on motivation

3.3.1 Analyses

3.3.2 Results

4 Discussion

4.1 Lack of evidence for taVNS-related improvement in non-native phonetic perception

4.2 Lack of evidence for taVNS-related improvement in non-native phonetic production

4.3 Evidence for taVNS-related improvement in language learning motivation

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher's note

Supplementary material

References

94% of researchers rate our articles as excellent or good