Task-Specific Modulation of Human Auditory Evoked Response in a Delayed-Match-To-Sample Task

Rong, Feng; Holroyd, Tom; Husain, Fatima T.; Contreras-Vidal, Jose; Horwitz, Barry

doi:10.3389/fpsyg.2011.00085

ORIGINAL RESEARCH article

Front. Psychol., 09 May 2011

Sec. Auditory Cognitive Neuroscience

volume 2 - 2011 | https://doi.org/10.3389/fpsyg.2011.00085

Task-specific modulation of human auditory evoked response in a delayed-match-to-sample task

Feng Rong^1,2

Tom Holroyd³

Fatima T. Husain¹

Jose L. Contreras-Vidal²

Barry Horwitz¹*

¹ Brain Imaging and Modeling Section, National Institute on Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD, USA
² Graduate Programs in Neuroscience and Cognitive Science, Department of Kinesiology, University of Maryland, College Park, MD, USA
³ MEG Core facility, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA

In this study, we focus our investigation on task-specific cognitive modulation of early cortical auditory processing in human cerebral cortex. During the experiments, we acquired whole-head magnetoencephalography data while participants were performing an auditory delayed-match-to-sample (DMS) task and associated control tasks. Using a spatial filtering beamformer technique to simultaneously estimate multiple source activities inside the human brain, we observed a significant DMS-specific suppression of the auditory evoked response to the second stimulus in a sound pair, with the center of the effect being located in the vicinity of the left auditory cortex. For the right auditory cortex, a non-invariant suppression effect was observed in both DMS and control tasks. Furthermore, analysis of coherence revealed a beta band (12∼20 Hz) DMS-specific enhanced functional interaction between the sources in left auditory cortex and those in left inferior frontal gyrus, which has been shown to be involved in short-term memory processing during the delay period of DMS task. Our findings support the view that early evoked cortical responses to incoming acoustic stimuli can be modulated by task-specific cognitive functions by means of frontal–temporal functional interactions.

Introduction

Modulation of auditory cortical responses evoked by acoustic stimuli has been widely observed in both animal and human research. Studies using anesthetized or awake animals have shown modulation effects induced by acoustic context (Condon and Weinberger, 1991; Ulanovsky et al., 2003; Bartlett and Wang, 2005), attention (Fritz et al., 2003), behavioral state (Gottlieb et al., 1989; Fritz et al., 2005), and self-initiated vocalization (Eliades and Wang, 2003). A broad spectrum of excitatory and/or inhibitory modulation effects have been observed in studies with different focuses and different experimental manipulations. In humans, such a broad spectrum of modulation effects has also been reported to occur in a number of evoked cortical responses, including the M100/N1 response in magnetoencephalographic (MEG) and electroencephalographic (EEG) studies (Hillyard et al., 1973; Stanny and Elfner, 1980; Näätänen, 1990; Woldorff et al., 1993; Jääskeläinen et al., 2004; Ahveninen et al., 2006; Sabri et al., 2006; Manuel et al., 2010).

As one of the early MEG/EEG evoked cortical responses with a latency of around 100 ms after stimulus onset, the M100/N1 is believed to be correlated with the detection of changes in the acoustic environment (Näätänen and Picton, 1987; Hari, 1990). Modulation of this transient response has shown both enhancement and suppression effects in previous experiments. Passive listening (PSL) tasks showed adapted M100/N1 response to repetitively presented stimuli (May et al., 1999; May and Tiitinen, 2004). In behavioral paradigms requiring active manipulation of attention to a task-related auditory domain, such as dichotic listening (Hillyard et al., 1973; Woldorff et al., 1993; Brancucci et al., 2004) and selective attention tasks (Fujiwara et al., 1998), enhancement of the M100/N1 response to the attended and relative suppression of the responses to the unattended stimuli/features (Sabri et al., 2006) have been observed. Self-initiated tones (Schafer and Marcus, 1973; Martikainen et al., 2005) or speech sounds (Houde et al., 2002) have displayed exclusively suppressive effects. Active task performance paradigms requiring memory processing, such as discrimination (Melara et al., 2005) and working memory tasks (Lu et al., 1992; May and Tiitinen, 2004; Luo et al., 2005), have shown a mixture of modulation effects – increases, decreases, or both have been observed. Hypotheses concerning the mechanistic interpretation of these findings include forward masking (Wehr and Zador, 2005), “repetitive suppression” (Näätänen et al., 2001; Ulanovsky et al., 2003), and feedback modulation from downstream neural populations (Miller and Cohen, 2001; Friston, 2005). Forward masking and repetitive suppression hypotheses emphasize the intrinsic automatic adaptation to repeated stimulus presentations (for a review, see Grill-Spector et al., 2006 and the comment in Baldeweg, 2006) and insensitivity to different cognitive and behavioral conditions, whereas feedback modulation can arise from functional interactions between multiple regions involved in specific cognitive functions.

In this study we use MEG and analysis of the current sources inside the brain to investigate the modulation of evoked responses in human auditory cortex during performance of a delayed-match-to-sample (DMS) task (see Abbreviation Table in Appendix for a list of all major abbreviations we use). The analysis was via comparison with two control tasks: PSL and simple counting (CNT). Performing the DMS task involves formation, maintenance, and manipulation of the short-term memory (STM) of the first sound (S1) in a pair of acoustic stimuli during a silent delay period (Gottlieb et al., 1989; Zatorre and Samson, 1991; Lu et al., 1992; Pasternak and Greenlee, 2005), as well as decision-making and motor responses based on the comparison to the perceived second stimulus (S2; Postle et al., 1999). By contrast, the PSL task does not require the active maintenance of the STM trace, although participants still need to pay attention and listen to the sounds; the CNT task requires participants to maintain the numeric memory of the presence of the sounds, but not the memory of their acoustic features, which is required during performance of the DMS task. We hypothesized that a task-specific modulation of the auditory evoked responses (AER) to S2, possibly related to maintenance/retrieval of the STM of S1 and anticipation of the upcoming S2, will be observed in DMS task.

In addition, it has been suggested that during cognitive task performance, anterior–posterior oscillations in a broad spectrum of frequency bands are involved in memory processing (Klimesch, 1999; Lutzenberger et al., 2002; Palva and Palva, 2007). By measuring coherence between the cortical current sources in frequency bands from delta to gamma, we investigate the DMS-specific functional interactions between cortical regions to explore the involvement of these top-down neural mechanisms in the DMS-specific modulation of human auditory cortex.

Materials and Methods

Participants

Healthy right-handed adults (n = 12; age, 23–35 years; six females) with normal or corrected-to-normal vision and normal hearing participated in the experiments. For each participant, MEG and structural MRI data were acquired in separate scans. Informed consents by the participant were obtained before each scan. The consent forms were approved by the NIDCD-NINDS IRB (protocol NIH 92-DC-0178) and University of Maryland, College Park IRB (IRB#01566).

Tasks and Stimuli

Each MEG scan had nine recording sessions. Six of them were task sessions with three types of task conditions: PSL, counting (CNT), and a delayed-match-to-sample (DMS) task. Each task had two sessions with two different types of stimuli for each. The stimuli (Figure 1A) were pure tones (Tone) and tonal contours (TC). Each stimulus is an acoustic sound with duration of 350 ms. Each Tone has one frequency component. Each TC consists of two 125 ms up or down frequency modulated (FM) sweeps interspersed by a 100-ms tone. We kept the tasks in the order of PSL → CNT → DMS to avoid a potential CNT or DMS task performance influence on PSL. The order of Tone or TC sessions with same task type was randomly assigned and counter-balanced among participants. Each recording session had 100 trials. Each trial (Figure 1B) was 3.7 s in duration, started with a 500-ms silent period (baseline), followed by a pair of stimuli (S1 and S2, respectively) with a 1-s silent period (delay) between S1 and S2, and a 1.5-s inter-trial interval (ITI) after offset of S2. The ITI period also served as the response time (RT) in the DMS task. Within each recording session, match (exactly identical S1 and S2) and non-match (different S1 and S2) trials were randomly mixed and counter-balanced. The sound stimuli were presented to a participant at a fixed level between 65 and 75 dBA, which was determined by testing the participant before the MEG scan to make sure the participant could hear the sounds clearly and comfortably. Each session began with a visual instruction presented on a screen that informed the participant about the task condition, response requirement, and type of stimuli. The instruction also informed the participants of our requirement of fixating on a cross mark at the center of the screen during each trial. In the PSL sessions, participants were instructed to relax, stay still, and listen to the sounds without any response; in the CNT sessions, participants were instructed to count the number of sounds and report how many they had heard at the end of each corresponding session; in the DMS sessions, participants were instructed to compare the two sounds in each trial, and press the left button with the left thumb for a match and press the right button with the right thumb for a non-match. The button box was held in both hands in all sessions. Therefore, each experimental condition is a combination of task type (PSL, CNT, or DMS), sound type (Tone or TC), and S1/S2 matching type (match or non-match). In addition to the task sessions, each participant had two DMS training sessions and one click-counting session. The DMS training sessions were before the DMS tasks; each had 40 trials with either Tones or TC to familiarize the participant with the task. In the click-counting session, which was used to determine the peak latency and time window of the M100 response, we played 50 ms 1 kHz clicks and instructed the participant to count the number of the sounds.

FIGURE 1

Figure 1. Stimuli and tasks. (A) The spectrogram of representative stimuli. The gray scale represents the power spectral density (dB/Hz) of the sound stimuli. (B) The timeline of each trial for the passive listening, counting, and delayed-match-to-sample tasks. S1 and S2 denote the time window of the stimuli presentation. The inter-trial-interval (ITI)/response period is 1.5 s.

Data Acquisition

Participants lay in a supine position during the MEG scans. MEG signals were recorded with a CTF Omega2000 275-channel whole-head MEG System (CTF Systems, Inc., Coquitlam, Canada) placed in a magnetically shielded room (Vacuumschmelze, Germany) inside the MEG Laboratory of the National Institute of Mental Health (Bethesda, MD, USA). The ongoing MEG signals were sampled at 600 Hz, filtered with a 150-Hz low pass analog filter, balanced with third gradient coils for noise reduction, and then stored for off-line analysis. Temporal events, such as stimuli onsets and button presses (DMS sessions only) in each trial, were on-line marked. In a separate scan, we acquired the anatomical map of the same participant’s brain with a T1-weighted protocol (MPRAGE; 24 cm × 24 cm FOV; 128 axial slices; 1 mm × 1 mm × 1.2 mm voxel size), using a 3-Tesla Signa MR scanner (General Electric, Waukesha, WI, USA). For the purpose of spatial alignment between the MEG sensors and the anatomical structures, three fiducial points (one nasion and two preauricular) were marked for each participant. On these points, head coils were fixed during the MEG scanning and Vitamin E capsules were attached during the MRI scanning to mark their locations. In addition, we localized the head coils at the beginning and the end of each MEG recording session to detect head motions. When head movements exceeded 0.5 cm during a session, the whole session was discarded and the subject was rescanned.

Data Analysis

Preprocessing

With the stored raw MEG signal, we took four preprocessing steps to reduce noise and artifact contamination: (1) remove the DC offset based on the whole trial trend; (2) remove the power line noise plus harmonics with notch filters centered at 60, 120, 180, and 240 Hz (fourth order paired band elimination filters with width = 8 Hz); (3) remove the low-frequency fluctuations with a high-pass filter (stop frequency = 0.5 Hz); and (4) remove artifacts (EKG, EOG, and motion related signals) using an automatic clustering method based on independent component analysis (ICA; Rong and Contreras-Vidal, 2006). MEG signals from three subjects (one male, two females) were removed from further analysis due to incomplete experiments or excessive artifact contamination. The noise-reduced and artifacts-cleaned datasets of the remaining nine subjects (four females) were then partitioned on a single-trial basis for further analysis. For each task trial, a 3.7-s epoch time-locked to the onset of S1 was extracted (Figure 1B). The epoch includes a 0.5-s baseline period at the beginning, followed by the first sound stimulus (S1, 0.35 s), the delay period (1 s), the second sound stimulus (S2, 0.35 s), and the response period/ITI (1.5 s). For each of the click-counting trials, the epoch was 1.05 s time-locked to the stimulus onset with a 0.5-s baseline.

Quantification and analysis of auditory evoked response and modulation effect

In this study, we were particularly interested in task-related modulation of the M100. The M100 response is usually seen as a deflection in the epochs of the averaged field strength with its peak at ∼100 ms after sound stimulus onset (Figure A1A in Appendix: Sensor Space Analysis of the Modulation Effect). At the peak latency, it usually shows a bilateral dipole-like contour pattern of the magnetic field with a “source” and a “sink” located at fronto-temporal and parieto-temporal regions (Figure A1B in Appendix: Sensor Space Analysis of the Modulation Effect). We used data from the click-counting session, which is independent to the task sessions, to determine a subset of representative sensors for M100 analysis in each participant. By examining the averaged epochs from the click-counting session, 20 sensors (10 per hemisphere) surrounding the centers of the “sources” and “sinks” of the peak M100 contour were selected as the representative sensors for the participant (c.f., Luo et al., 2005). Based upon the signal from these representative sensors, we calculated the root mean squares (RMS) of the averaged magnetic field time course, identified the peak RMS value at ∼100 ms after stimulus onset, and defined the time point as the peak latency of the AER to each stimulus in each experimental condition. In addition, a 50-ms time window centered at the peak latency is defined as the window of AER. Therefore, we obtained one peak latency and one corresponding AER window for each stimulus under each experimental condition. Analysis of the sensor space data is presented in Section “Sensor Space Analysis of the Modulation Effect” in Appendix.

In addition to determination of peak latencies and AER windows in sensor space, we estimated the multiple source activities distributed across the brain using the all-sensor MEG epochs. The sources were imaged with an event-related beamformer algorithm based on the linearly constrained minimum variance (LCMV) method (Van Veen et al., 1997), for which the forward source–sensor relationship was modeled by a multiple local-sphere head model (Huang et al., 1999). Each model was a 20 cm × 20 cm × 17 cm spatial grid composed of 5 mm × 5 mm × 5 mm cubic voxels covering the participant’s head. The integrated intracellular synaptic current of the neuronal population inside each voxel was estimated by a source dipole whose origin was located at the center of the cube. Each source dipole’s activity was quantified by a measure of normalized power (“neural activity indices” – NAI). Using this imaging method, we took the following steps to quantify the AER and modulation effects in each source: (1) we computed a time course of NAI values for each source on a single-trial basis; (2) AER to S1 and S2 in each trial were quantified as integrated NAI in the AER window of the corresponding experimental condition, then normalized to baseline by subtracting the averaged NAI during the baseline period; (3) the modulation effect of each experimental condition was measured as an modulation index (MI) value calculated from the normalized AER values:

where AER1 and AER2 represent the normalized quantification of AER to S1 and S2, respectively. The MI values range from −1 to 1, where the positive values indicate decreased evoked response to S2 as compared to evoked response to S1, and the negative values indicate the opposite effect. Hence, if the mean MI value from one condition is significantly greater than zero, a significant suppressive modulation effect is inferred.

With the quantified AER and MI values, we took two independent approaches to test the hypothesis that the modulation of the AER in the DMS tasks is significantly different from the effect in the control tasks. One approach applied within-participant analysis by using paired t-tests to compare the normalized AER to S1 and S2 for each experimental condition. The sources that showed a significant difference (FDR corrected p < 0.05) were then taken as sources demonstrating within-participant significant modulation of the evoked responses for the corresponding condition. With the resulting probability images, the sources in bilateral temporal cortices with maximal absolute t values in the DMS tasks were selected as the representative sources for further statistical analysis. For each representative source, a MI value was computed using Eq. 1 for each experimental condition. With the MI values from all participants, we tested the hypothesis statistically by applying repeated measures ANOVA with three factors: task (PSL, CNT, DMS), sound type (Tone, TC), and trial type (match, non-match), which was followed by post hoc comparison between the mean MI values of single experimental conditions using the Tukey–Kramer method. We used SAS v9.1 (SAS Institute Inc., Cary, NC, USA) for statistical analyses of this approach.

In addition to assessment of the modulation effects by selecting a single source to represent the auditory cortical cluster showing a significant difference, we employed another approach to visualize the spatial expansion of the DMS-specific modulation effect by analysis of all sources. With this approach, we computed a MI image including all sources for each experimental condition, and used a two-way three-dimensional ANOVA (type 4 3dANOVA3) provided by AFNI (Analysis of Functional NeuroImages; (Cox, 1996); NIMH, Bethesda, MD, USA; also refer to http://afni.nimh.nih.gov/) to analyze the group-level modulation effect across all sources. The variance analysis was performed with two factors: task (PSL and DMS) and sound type (Tone and TC). To correct for statistical comparison of multiple sources, Monte Carlo simulation with estimation of the between-source spatial correlation (Forman et al., 1995) was used to determine the criteria (the threshold cluster size and uncorrected probability value for each source within the cluster) of statistical significance (corrected p < 0.05).

Analysis of functional interactions between brain regions

We focused our interest of interregional functional interactions to coherence between the representative sources and all other sources in the brain. For each participant, we selected the representative source that demonstrates the DMS-specific modulation effect as a reference, and computed the coherence values using the dynamic imaging of coherent sources (DICS) method (Gross et al., 2001). The coherence values were computed on a single frequency based in a broad frequency range from 2 to 50 Hz, with a step size of 2 Hz. We then averaged the coherence values in frequency bands of delta (2∼4 Hz), theta (4∼8 Hz), alpha (8∼12 Hz), beta (12∼20 Hz), high beta (20∼30 Hz), and gamma (30∼50 Hz). For each frequency band, the modulation related changes of the functional interactions were quantified as the ratio of coherence change (RCC) values, which were computed as normalized differences between the coherence values obtained from the late delay period (0.5∼1 s after offset of S1, which is a 500-ms window before onset of S2) and the coherence values obtained from the baseline period (the 500 ms window before onset of S1)

where Ldelay and baseline represent the coherence values in duration of late delay and baseline periods, respectively. The RCC value ranges from −1 to 1, where positive RCC values represent increased late delay period coherence as compared to baseline period. We then used the two-way three-dimensional ANOVA method described in the previous section to analyze the RCC values to test our hypothesis that during the delay period, frontal brain regions related to cognitive functions recruited for performance of the DMS task would show increased functional interaction with the temporal sources that have shown DMS-specific modulation of the evoked responses. The factors included task (PSL and DMS) and sound type (Tone and TC). Monte Carlo simulation was also used to estimate the criteria of statistical significance for both ANOVA and contrast between experimental conditions. Only the clusters showing significant task or task × sound type effect, and significant difference in contrast between PSL and DMS conditions, were considered as clusters demonstrating DMS-specific functional interaction with the reference sources. Threshold statistics for each individual source are F_1,8 > 14.64 for ANOVA and t > 3.826 (df = 8) for simple contrasts, corresponding to uncorrected p < 0.005.

Results

Behavioral Results

In the counting task, all participants recalled the number of sounds they heard with counting error within ±2 in each session. In the DMS task, all participants showed accuracy above 84%. A significant sound type × trial type interaction was observed (two-way ANOVA, F_1,8 = 12.9, p = 0.007), which could be accounted for by the lower performance level on the TC non-match trials (TC_N, 91.1 ± 0.95%, mean ± SEM) than on the other three conditions (Tone_M: 99.8 ± 0.95%; Tone_N: 98.7 ± 0.95%; TC_M: 98.9 ± 0.95%). RT in each trial was measured as the time elapsed from the onset of S2 to the button press in the DMS task. Analysis of variance revealed a significant sound type effect (Figure 2) on RT (one-way ANOVA, F_1,8 = 6.1, p = 0.039), where the RT for TC stimuli (812 ± 36.4 ms, mean ± SEM) was significantly longer than the RT to Tones (754 ± 36.3 ms). No significant effect of trial type or sound type × trial type interaction was observed. Our observation of longer RT for TC is consistent with the results in an fMRI study using the same set of stimuli (Husain et al., 2004).

FIGURE 2

Figure 2. Response times (RT) for DMS task (n = 9). For each trial, response time (mean ± SD) was calculated as the duration elapsed from the onset of S2 to the time the participant pressed the button.

DMS-Specific Suppression of the Left Auditory Evoked Responses

Figure 3A provides an example of the within-participant comparisons between the AERs to S1 and S2 under the three experimental conditions. The data are from the matched trials using TC stimuli for participant #4. Overlaying on a standard anatomical atlas (Talairach and Tournoux, 1988), the three probability maps highlight the clusters of the left hemisphere sources that demonstrate significant differences between the evoked responses to S1 and S2 in the PSL, CNT, and DMS tasks, respectively. The blobs with bright colors indicate the spatial locations of the clusters. In each task condition, the probability map displays multiple clusters of sources with significant difference between AER to S1 and S2: the cluster in the superior temporal region (where auditory cortex is located) shows up in all three tasks, which has more voxels for the DMS task than the control tasks, indicating an expanded suppressive modulation effect during performance of the DMS task than during the control tasks. In contrast, the anterior cluster also showing up in all three conditions contains fewer sources for the DMS task than control tasks, indicating a weaker modulation effect for the frontal sources in the DMS task. Unlike the above two clusters, the posterior clusters appears only in the CNT and DMS tasks. Between them the signs of the modulation effects are opposite (a greater response to S2 than the response to S1), which suggests enhancement of the evoked responses to S2 rather than suppression for these current sources. Though most within-participant analyses display more than one left hemisphere cluster showing significantly different AERs to S1 and S2 among the experimental conditions, only the left temporal cluster showed consistent patterns of task-specific modulation effects. Number and spatial location of the voxels in this cluster are different among participants.

FIGURE 3

Figure 3. Task-specific modulation of the left auditory cortex. (A) Probability maps obtained from the within-participant paired t-test on left hemisphere evoked responses to S1 and S2. Each image displays the clusters of sources in left hemisphere that showed significant contrast (uncorrected p < 10⁻⁸). The color codes represent the negative logarithmic values of the probabilities. The images are mapped over an axial slice (z = 6) of the Talairach anatomical atlas (Talairach and Tournoux, 1988). The analysis was performed on datasets from participant #4. Experimental conditions are the match condition of PSL, CNT, and DMS tasks with TC stimuli, respectively. The location of the representative source in left auditory cortex for this participant is marked by a “*”. (B) Mean time courses of the left representative source activities for all experimental conditions. The waveforms are time-locked to the onset of S1/S2. Mean ± SD Talairach coordinates above the waveforms denote the location of the representative sources. Dash line box highlights the AER window centered at the peak latency ∼100 ms after stimuli onset. (C) Mean MI values computed from the source activity of the left representative sources averaged across all participants. Error bars represents the standard error of means (SEM). (D) Mean ± SD MI values computed from the left representative sources for each individual participant. The data are averaged across single trials during performance of the PSL and DMS tasks with TC stimuli.

In addition to the within-participant analysis, group analysis of the left representative sources demonstrated a DMS-specific suppressed AER to S2, as displayed by the grand mean activity waveforms of the left representative sources averaged across all participants (Figure 3B). The locations of these representative sources (Talairach coordinates: [−52 ± 9.3, −24 ± 7.8, 8 ± 4.7], mean ± SD) are within the vicinity of the left primary auditory cortex (Heschl’s gyrus) and adjacent planum temporale region (Hall et al., 2003), consistent with the distribution of the superior temporal sources for M100 responses that have been described in previous studies (Hari, 1990; Herdman et al., 2003). Variance analysis of the MI values from the representative sensors confirmed this finding. It demonstrated a significant task effect (one-way ANOVA, F_2,16 = 9.64, p = 0.0018). No other main factor or interaction effects were observed. For each experimental condition, the mean MI values for DMS_Tone (t = 4.48, df = 8, p = 0.002) and DMS_TC (t = 7.80, df = 8, p < 0.0001) demonstrated significant suppressive modulations of the AER to S2 as compared to the AER to S1, where none of the mean MI values from the control tasks was significantly different from zero (Figure 3C). Post hoc comparisons of MI values between experimental conditions revealed that the mean MI value of DMS_TC was significantly greater than both PSL_TC (p < 0.01, Tukey–Kramer method) and CNT_TC (p < 0.01), which indicates a greater suppression of the left auditory AER to S2 during performance of the DMS task with TC stimuli than the control tasks. We did not observe any significant difference between the mean MI values with Tone stimuli. Furthermore, the significantly greater mean MI value for DMS_TC than for DMS_Tone (p < 0.05, Tukey–Kramer method) suggests a greater suppression of the AER to TCs than the effect to Tones during performance of the DMS task. Examination of individual data showed consistent task-specific modulation patterns in the left auditory cortex – seven out of nine participants display greater MI values for DMS_TC than PSL_TC condition (Figure 3D).

Modulation of AER in right auditory cortex showed different patterns from modulation effects displayed in the left auditory cortex. As an example, Figure 4A illustrates the cluster(s) of sources in the right hemisphere of participant #4 that showed a significant difference between evoked responses to S1 and S2. The data are from the tasks with TC stimuli. In contrast to the left hemisphere, the cluster in the right temporal region displays a similar modulation pattern across all three tasks for this participant. The locations of the right representative sources are roughly mirror symmetries to the left representative sources (Talairach coordinates: [57 ± 6.5, −24 ± 6.1, 9 ± 7.9], mean ± SD), with the center coordinates falling in the vicinity of the right auditory cortex. While the spatial location of the representative sources in each hemisphere demonstrated a rough symmetry, the averaged activity waveforms from the right representative sources displayed a pattern different from what was seen on the left side: suppression of the AERs to S2 was observed in all three tasks, although for the Tone stimuli, the CNT and DMS tasks showed a reduced suppressive modulation effect (Figure 4B). Group analysis shows no significant difference in the mean MI values across all three tasks (Figure 4C; one-way ANOVA, F_2,16= 2.44, p = 0.12). Individual MI values from the right representative sensors also displayed smaller differences in the MI values between the DMS_TC and PSL_TC conditions than what was demonstrated by the left representative auditory sources (Figure 4D).

FIGURE 4

Figure 4. Task-invariant modulation of the right auditory cortex. (A) Probability maps obtained from the within-participant paired t-test on right hemisphere evoked responses to S1 and S2. Data are also from participant #4. Color code is the same to Figure 3A. Images are overlaid on the anatomical axial slices of z = 13. Identical to the left hemisphere, the representative source in right auditory cortex of this participant is marked by a “*”. (B) Mean time courses of the right representative source activities for all experimental conditions. The waveforms are time-locked to the onset of S1/S2. Mean ± SD Talairach coordinates above the waveforms denote the location of the representative sources. (C) Mean MI values of the right representative sources; the error bars denote SEM. (D) Mean ± SD MI values computed from the right representative sources for each individual participant. The data are averaged across single trials during performance of the PSL and DMS tasks with TC stimuli. Order of participants is the same as in Figure 3D.

Statistical analysis using the MI values across all sources revealed consistent results in left temporal region to the modulation effects demonstrated by the analysis of the representative sources: a cluster of sources in left auditory cortex with significant suppression of the AER to S2 in the DMS tasks as compared to the PSL conditions (Figure 5A) was observed. This cluster extended from left superior temporal gyrus (STG) (BA22) to left insula (BA13). In addition to the left temporal cluster, two other clusters also displayed greater suppressive modulation effect during performance of the DMS task than during the PSL conditions: one was located in the left orbital frontal region (Figure 5B) and the other one in the premotor area of the right middle frontal cortex (Figure 5C). These additional clusters suggest involvement of corresponding regions in the network dynamics specifically correlated with performing the auditory DMS tasks.

FIGURE 5

Figure 5. Grand-analysis of MI values across all sources. 3dANOVA analysis of the MI values from all sources reveals three clusters of sources displaying significant task effect for the modulation of the evoked responses to S2. Monte Carlo simulation is used to determine the criterion of significance. Each subplot displays axial, sagittal, and coronal views of the clusters. Color codes of each image represent the F-values obtained from the 3dANOVA analysis. (A) The cluster in left auditory cortex, which includes the sources in both superior temporal gyrus (BA41/22) and insula (BA13). (B) The cluster in left medial frontal gyrus (BA10). (C) the cluster in right middle frontal gyrus (BA6).

DMS-Specific Enhancement of Temporal-Frontal Functional Interactions

Analysis of the modulation effect in cortical source activities demonstrated a DMS-specific suppressive modulation of the AER in response to S2 in the left auditory cortex. We asked the question whether there existed correlated task-specific functional interaction changes between the left auditory cortex and other brain regions. Among the frequency bands from delta to gamma that had been covered by analysis of RCC values, a single cluster of sources showed stronger functional interaction during the DMS task than the PSL task in the beta band (12∼20 Hz). The cluster had 176 voxels expanding from BA 44 to BA 46 in the left inferior frontal gyrus (IFG; Figure 6A). Analysis of the RCC values demonstrated a significant task effect (FWE corrected p < 0.05, with threshold cluster size of 21), and post hoc comparison (FWE corrected p < 0.05, with threshold cluster size of 101) showed a significant difference between the PSL_TC and DMS_TC conditions (Figure 6B). Increased RCC values in the DMS tasks suggest enhanced functional interaction between the frontal cluster and the left temporal cortical sources during the late delay period of the DMS task, as compared to the PSL conditions. Examination of the coherence values in each frequency showed greater late delay vs. baseline differences in the beta band for the DMS tasks with TC stimuli, in which the greatest difference was found at 18 Hz (Figure 6C). Furthermore, delay period activity of the sources in the frontal cluster showed a greater magnitude in DMS tasks than in the PSL conditions (Figure 6D), indicating its involvement in DMS-specific memory processing.

FIGURE 6

Figure 6. Delayed-match-to-sample-specific functional interaction. The cluster of sources in left inferior frontal gyrus displaying DMS-specific functional interaction with left auditory cortical representative sources in the beta band (12∼20 Hz). (A) Axial, sagittal, and coronal views of the cluster. (B) Mean ± SD of the RCC value in each experimental condition averaged across all sources in the cluster. (C) Mean ± SD coherence values between the sources in the cluster and the left representative sources during baseline and late delay period for each calculated frequency. The mean coherence values from PSLTC conditions are depicted as diamonds (blue for baseline and red for late delay). The mean coherence values from DMSTC conditions are depicted as triangles. (D) Mean source activity averaged across the left inferior frontal sources in the cluster and across all participant. The waveforms are time-locked to onset of S1. Durations of stimuli presentation are labeled by black bars under x-axis.

Discussion

Summary of Results

The current experiment investigated the task-specific modulation of human auditory cortex during performance of an auditory DMS task, which specifically emphasized the maintenance of STM during the delay period and decision-making/motor response based on comparison between the STM trace and perception of the acoustic stimuli (Posner, 1967). In comparison to the control tasks, the observed DMS-specific modulation effect involved a suppression of the AER with latency around 100 ms. The auditory current sources showing this effect were lateralized to the left hemisphere. The cluster of the significant sources covered the region extending from primary to association auditory cortices (Figures 3A and 5A) with the center sources located in the STG. Furthermore, this effect was greater in the DMS task for sounds with multiple frequency components (TC) than for sounds with only one frequency component (pure tones), indicating a close relationship between STM load and the observed modulation effect. Along with the observed modulation effect, enhanced functional interactions between left auditory reference sources and sources in left inferior frontal regions were observed during the late delay period of the DMS task in the beta band (12∼20 Hz), suggesting involvement of a DMS-specific frontal–temporal interaction in the observed modulation effect.

These results provide experimental evidence in humans that support the hypothesis of task-specific top-down modulation to auditory information processing during the DMS task (for a recent review, see Scheich et al., 2007). With measurements and analyses of the temporally sensitive MEG signals, our findings reveal two important aspects of this modulation effect: (a) left lateralization of the observed DMS-specific suppression of the transient early cortical AERs, and (b) close relationship with STM processing as revealed by significant stronger modulation of AER to TC stimuli than Tones, and greater beta-band functional interaction between the left auditory cortical sources and the left IFG.

Task-Specific Cognitive Modulation of Auditory Evoked Responses

Measured by MEG/EEG, with peak latency around 100 ms after stimulus onset, the M100/N1 response is believed to be involved in detection of changes in the acoustic environment, and can be influenced by both upstream and downstream auditory subcortical/cortical regions (Näätänen and Picton, 1987; Hari, 1990). Suppression of this response has been observed by passive listening to repetitively presented stimuli (Näätänen and Picton, 1987) and by active auditory perception during task performance (Hillyard et al., 1973; Woldorff et al., 1993; Luo et al., 2005; Martikainen et al., 2005). Recent studies have correlated the modulation effect with improved performance in healthy adults (SanMiguel et al., 2008; Lijffijt et al., 2009; Alain et al., 2010; Navarro Cebrian and Janata, 2010), and dampened or diminished modulation with behavioral deficits in schizophrenia patients (Heinks-Maldonado et al., 2007; Lukhanina et al., 2009; Dale et al., 2010).

To account for these observations, a broad spectrum of interpretations from pre-attentive habituation (Tiitinen et al., 1994) to cognition related top-down modulation (Fritz et al., 2007; Scheich et al., 2007) has been proposed. With supportive experimental results mainly obtained from mismatch negativity (MMN) studies (Näätänen, 1990), the habituation hypothesis postulates that stimulus-specific adaptation to repetitively presentes sounds suppresses the evoked response to an upcoming stimulus, given the upcoming one has similar salient features. This hypothesis suggests hierarchical, gradual, and implicit procedures of memory establishment and a pre-attentive intrinsic adaptation mechanism underlie the observed suppressive effect. Consequently, this view indicates that the suppression should not differ between PSL and active task performance.

In contrast, active performance of cognitive tasks also displays suppression of the AER without reliance on repetitively presenting identical sounds/features. Examples include relative suppression of M100 responses to unattended stimuli (Hillyard et al., 1973; Woldorff et al., 1993; Poghosyan and Ioannides, 2008; Atiani et al., 2009), features (Ahveninen et al., 2006; Kaiser et al., 2009), and modality (Oatman, 1976; Alho et al., 1994; Eimer et al., 2004) in selective attention tasks, suppression of M100/N1 responses to self-initiated tones (Schafer and Marcus, 1973; Martikainen et al., 2005) or speech sounds (Houde et al., 2002), and suppression of the M100/N1 response to the second sound of the pair in behavioral paradigms employing the DMS task with a broad spectrum sound stimuli from simple sounds such as tones and TC (Lu et al., 1992) to complex speech sounds such as vowels and consonant vowel syllables (Luo et al., 2005; Lijffijt et al., 2009). It is believed that the prediction of the afferent sensory signal by the top-down attentive, motor, or memory related efference signals is involved in the observed inhibitory modulation effect (Blakemore et al., 1998; Heinks-Maldonado et al., 2006; Fritz et al., 2007). This evidence suggests an explicit, active, and task-specific mechanism underlies the observed suppression effects: the cognitive task-specific neural processing selectively modulates the sensory-evoked responses.

In this study, we focused our research on the task-specific AER suppression during performance of an auditory DMS task and hypothesized that the DMS-related cognitive functions play active roles in the observed modulation effect. By comparing to control tasks such as PSL and counting, the results revealed both task-specific and non-specific suppressive modulation effects to the early cortical AER. In the right auditory cortex, a similar suppressive modulation to AER among the tasks agrees with the habituation hypothesis. In the left auditory cortex, by controlling the habituation effect with identical timelines for each trial (a sound pair separated by a 1-s silent delay period) and the attention effect by instructing subjects to listen to the sounds during both control and DMS conditions, we have demonstrated a suppressive AER modulation effect specifically correlated to performance of a DMS task that involved overt STM maintenance and manipulation. Furthermore, the relatively greater suppression effect in the DMS task than in the counting task not only strengthens the task-specificity of this effect, but also suggests that this effect is specifically related to the STM processing of the acoustic features of the sound stimuli, given that performing the counting task also required participants to hold a simpler format (numbering) of the STM trace of each sound stimulus (Nieder and Dehaene, 2009).

Hemisphere and Stimulus Specificity of the Observed Modulation Effect

In addition to the task-specificity, we observed left lateralization and selectivity to TC stimuli of this modulation effect. Task-specific hemispheric asymmetry has also been shown in previous MEG studies using other task paradigms (Poeppel et al., 1996; Chait et al., 2004). Furthermore, a recent fMRI study demonstrated that BOLD activation related to working memory of FM tones was lateralized to the left auditory cortex (Brechmann et al., 2007), which overlapped with the location of the significant sources observed in our study. For interpretation of the lateralization phenomenon, both hemispheric functional specificity (Zatorre and Belin, 2001; Grimm et al., 2006) and temporal scale sensitivity (Poeppel et al., 2004) have been proposed. Because we did not design this experiment to investigate the functional asymmetry of auditory information processing between the two hemispheres, further investigation is needed to explore these hypotheses.

Functional Interactions between Brain Regions Underlie the Task-Specific Modulation

In our results the DMS-specific suppression is to the AER of the second stimulus in a sound pair (Figure 3B), suggesting that the neural dynamics during the delay period and the first 100 ms of S2 presentation are most likely behind this modulation effect. Therefore, we focused on the late delay period to investigate the DMS-specific functional interactions between the left auditory cortex sources and other sources in the brain. 3dANOVA analysis on the baseline-corrected coherence during the late delay period (the RCC values) revealed a cluster of sources in left inferior frontal cortex that showed significantly enhanced functional interaction with the left auditory reference sources during the late delay period (Figures 6A,B). The effect was observed in the beta band (12∼20 Hz) and peaks at 18 Hz (Figure 6C). Moreover, the sources in the frontal cluster displayed greater delay period activity in the DMS tasks than during PSL (Figure 6D), indicating involvement of this region in memory processing during the DMS task performance.

Correlation of left inferior frontal activity and auditory memory processing has been shown in a wide variety of studies: positron emission tomography (PET; Jonides et al., 1998) and functional MRI (Husain et al., 2004, 2006) studies have shown increased left inferior frontal oxidative metabolism in DMS tasks. Recent event-related fMRI data found involvement of this region in all procedures from coding, maintenance to response periods (Strand et al., 2008). MEG studies found increased frontal activity during the delay period and the following response phase in DMS tasks (Luo et al., 2005; Grimault et al., 2009; Kaiser et al., 2009). Brain disorders, such as schizophrenia (Stevens et al., 1998; Menon et al., 2001) and dyslexia (Dufor et al., 2007) showed decreased left inferior frontal activity in working memory tasks. In addition to functions related to auditory memory processing, studies have also correlated this region with other cognitive functions including response selection (Binder et al., 2004), phonological processing in speech (Hickok and Poeppel, 2007), lexical processing in music (Peretz et al., 2009), and attention control (Ross et al., 2010).

The functional role(s) of beta-band interaction between left IFG and left auditory cortex remains poorly understood. Some may argue that our observation of the DMS-specific functional interaction in the beta band is due to the bottom–up afferent to a memory center in left IFG, not top-down modulation of left auditory cortex as we have hypothesized. However, data from patient studies support our proposal that left IFG can modulate AER by means of frontal–temporal functional interactions. For example, frontal lobe patients show a correlated increase of AER magnitude with the degree of behavioral deficiency during performance of auditory DMS tasks (Knight et al., 1999), and schizophrenia patients display decreased beta-band frontal–temporal coherence along with deficit gating of the N1 response (Rosburg et al., 2009). Furthermore, recent visual-motor studies suggested that enhanced beta-band coherence between frontal and visual cortices plays a functional role in cognitive tasks that require top-down anticipitation of upcoming sensory event (Buschman and Miller, 2007; for a review, see Engel and Fries, 2010). Though there is little direct evidence with healthy participants in previous studies, our results support the left IFG’s role in task-specific top-down suppression of AER.

Conclusion

The current study used an auditory DMS task to investigate the task-specificity of top-down modulation in human auditory cortex and the neural mechanisms underlying the observed modulation effects. Besides the demonstration of a DMS-specific suppressive modulation of the early phase AER, we also observed increased functional interaction between the modulated auditory cortex and left IFG, in which the frontal sources showed increased activity in DMS tasks, indicating their involvement during STM processing. Our results suggest that a task-specific interactive network including both auditory and frontal cortical regions is necessary for successful performance of the auditory DMS task, where the frontal regions can exert influences on the early phase of auditory cortical processing. The latency of these influences could be as early as tens of milliseconds after stimulus onset. Therefore, the findings from this and previous studies lead us to propose that cortical responses to auditory stimuli are affected by task-specific networks in which processing of the relevant information can be enhanced and retained, and processing of the irrelevant information can be suppressed through task-specific frontal–temporal functional interactions.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This work was supported by the NIDCD Intramural Research Program. We thank Dr. Hung Thai-Van for help with data collection, Dr. Gang Chen for help with data analysis, and Drs Jane Clark, Todd Troyer, and Jonathan Simon for their thoughtful comments on the work.

References

Ahveninen, J., Jääskeläinen, I. P., Raij, T., Bonmassar, G., Devore, S., Hämäläinen, M., Levänen, S., Lin, F., Sams, M., Shinn-Cunningham, B. G., Witzel, T., and Belliveau, B. G. (2006). Task-modulated “what” and “where” pathways in human auditory cortex. Proc. Natl. Acad. Sci. U.S.A. 103, 14608–14613.