N400 Indexing the Motion Concept Shared by Music and Words

Zhou, Tongquan; Li, Yulu; Liu, Honglei; Zhou, Siruo; Wang, Tao

doi:10.3389/fpsyg.2022.888226

ORIGINAL RESEARCH article

Front. Psychol. , 17 June 2022

Sec. Psychology of Language

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.888226

This article is part of the Research Topic Language Embodiment: Principles, Processes, and Theories for Learning and Teaching Practices in Typical and Atypical Readers View all 13 articles

N400 Indexing the Motion Concept Shared by Music and Words

$\nTongquan Zhou &#x;$ Tongquan Zhou¹^*^†

Yulu Li²^*^†

Honglei Liu³

Siruo Zhou⁴

Tao Wang⁵

¹School of Foreign Languages, Southeast University, Nanjing, China
²College of Chinese Language and Literature, Qufu Normal University, Qufu, China
³School of Music, Qufu Normal University, Rizhao, China
⁴Department of Chinese Language and Literature, Yonsei University, Seoul, South Korea
⁵School of Psychology, Qufu Normal University, Qufu, China

The two event-related potentials (ERP) studies investigated how verbs and nouns were processed in different music priming conditions in order to reveal whether the motion concept via embodiment can be stimulated and evoked across categories. Study 1 (Tasks 1 and 2) tested the processing of verbs (action verbs vs. state verbs) primed by two music types, with tempo changes (accelerating music vs. decelerating music) and without tempo changes (fast music vs. slow music) while Study 2 (Tasks 3 and 4) tested the processing of nouns (animate nouns vs. inanimate nouns) in the same priming condition as adopted in Study 1. During the experiments, participants were required to hear a piece of music prior to judging whether an ensuing word (verb or noun) is semantically congruent with the motion concept conveyed by the music. The results show that in the priming condition of music with tempo changes, state verbs and inanimate nouns elicited larger N400 amplitudes than action verbs and animate nouns, respectively in the anterior regions and anterior to central regions, whereas in the priming condition of music without tempo changes, action verbs elicited larger N400 amplitudes than state verbs and the two categories of nouns revealed no N400 difference, unexpectedly. The interactions between music and words were significant only in Tasks 1, 2, and 3. Taken together, the results demonstrate that firstly, music with tempo changes and music without tempo prime verbs and nouns in different fashions; secondly, action verbs and animate nouns are easier to process than state verbs and inanimate nouns when primed by music with tempo changes due to the shared motion concept across categories; thirdly, bodily experience differentiates between music and words in coding (encoding and decoding) fashion but the motion concept conveyed by the two categories can be subtly extracted on the metaphorical basis, as indicated in the N400 component. Our studies reveal that music tempos can prime different word classes, favoring the notion that embodied motion concept exists across domains and adding evidence to the hypothesis that music and language share the neural mechanism of meaning processing.

Introduction

Music and language as the two important communicating systems for human beings are comparable in multiple dimensions (e.g., acoustic features, emotions, and meanings) but mainly in two aspects, syntax and semantics. Syntactically, music and language have similar hierarchical configurations whereby discrete structural elements are combined into sequences (Patel, 2008, p. 241). As an illustration, a section in music is composed of motifs and phrases while a sentence in a language is composed of words and phrases in a hierarchical fashion. This syntactic comparability has been demonstrated in a couple of neuropsychological and neuroimaging studies (e.g., Jentschke et al., 2014; Chiang et al., 2018). Semantically, both music and language are adopted to convey the information that can be interpreted and comprehended by others, despite the point that “the meaning evoked by music is far less specific than meaning evoked by language” (Slevc and Patel, 2011). In Koelsch et al.'s (2004) event-related potential (ERP) study using different types of contexts to prime the processing of words, N400 was elicited by nouns when preceded by either semantically unrelated musical excerpts or semantically unrelated sentences. Afterward, numerous researches converge to reveal the psychological reality of musical meaning similar to linguistic meaning as indexed by N400, in both music-priming-words conditions and words/ sentences-priming-music conditions (e.g., Steinbeis and Koelsch, 2008; Daltrozzo and Schön, 2009a,b; Koelsch, 2011). So to speak, meanings are encoded diversely by music and language, but their semantics does partly overlap at least from the perspective of meaning processing.

Musical meaning is abundant in kinds. According to Koelsch (2011), musical meaning can arise from extra-musical sign qualities, intra-musical structural relations, musicogenic effects, the establishment of a unified coherent sense out of “lower-level” units, or musical discourse, which together are generalized into three fundamentally different classes of meaning, extra-musical meaning, intra-musical meaning, and musicogenic meaning. Related to our study is the extra-musical meaning¹ which “emerges from the interpretation of musical information with reference to the extra-musical world”, specified as three dimensions—iconic musical meaning, indexical musical meaning, and symbolic musical meaning (Koelsch, 2011). The meaning of musical motion (i.e., motion concept) pertains to iconic musical meaning used to imitate the sounds and qualities of objects or qualities of abstract concepts, in accordance with Eitan and Granot (2006) that music is able to evoke a sense of motion in a listener and with Patel (2008, p. 331) that music can evoke semantic concepts. In light of cognitive linguistics, our understanding of musical motion is completely metaphoric, grounded by our bodily experiences of physical motion (Johnson and Larson, 2003). That is, the motion concept in music can be metaphorically understood via embodiment, in fact, a cross-domain mapping from physical motion to musical motion involving our participation (Sloboda, 1998; Todd, 1999; Larson, 2002, 2004; Johnson and Larson, 2003; Eitan and Granot, 2006; Hedger et al., 2013). On the basis of embodiment, the motion concept is encoded not only by music but also by words in the language, for words are the basic categories used to represent an entity, actions, or their relevant features (Wolter et al., 2015). Verbs and nouns as the two major word classes in language are used to represent dynamic objects and static objects separately in general (Shao and Liu, 2001), yet nouns can communicate dynamic characteristics in some ways (as illustrated below). As a result, it is possible to use music to prime verbs and nouns on the shared motion concept basis.

In Mandarin Chinese, verbs have two types of meaning, static meaning and dynamic meaning (Dai et al., 1995), respectively related to state verbs and action verbs. Based on our bodily experience, action verbs and state verbs represent two different subcategories in terms of the motion features on their own. As a consequence, state verbs are often perceived to encode physical state or property and are hence called low-motion verbs, while action verbs, often called high-motion verbs, embrace more motion information than state verbs (Muraki et al., 2020). This point of view is basically consistent with Grossman et al.'s (2002) found that motion-related verbs involve more sensorimotor experience than state verbs, accordingly yielding the relatively easier processing of action verbs. For nouns, animacy is a good indicator for judging their related motion information. According to Weckerly and Kutas (1999), animate nouns as ideal actors in sentences have a strong possibility to perform actions and consequently contain more motion information compared to inanimate nouns. Experiments show that animacy can be clearly distinguished by infants <1 year old based on the motion clues with dynamic or static information (Pauen and Träuble, 2009; Träuble et al., 2014). All these studies suggest that verbs and nouns can convey motion concept but the concept is differently encoded not only between the two-word classes but also between their subcategories.

Motion concept is differently mapped onto music and words. In music, tempo as an expression of extra-musical meaning is used to convey the motion concept metaphorically (Todd, 1999; Johnson and Larson, 2003; Eitan and Granot, 2006; Zhou et al., 2015), involving physical motion and motion imagery. For instance, music tempos with acceleration and deceleration can elicit images of increasingly and decreasingly speeded motion, respectively (Eitan and Granot, 2006; Savaki and Raos, 2019). In the view of Hedger et al. (2013), relative to statistically fast and slow music, accelerating music and decelerating music can better prime pictures in motion and at rest, respectively. This suggests that the music with tempo changes (accelerating music, decelerating music) can better convey motion concept than music without tempo changes (statistically fast and slow music), for the tempo changes in a single piece with acceleration or deceleration may be more apparent and expectable (Hedger et al., 2013). Different from the explicit mapping of motion concept onto music tempo, the motion concept is encoded by verbs and nouns in language implicitly. That is, one's speaking of a verb or a noun evokes his motion concept covertly and subconsciously. Grounded by this difference, using music tempo to prime verbs and nouns appears more salient than the other way around.

The motivation of music and words able to convey motion concept is well-explained by the embodiment theories and the theory of embodiment semantics. As one of the classical embodiment theories, the perceptual symbols hypothesis (Barsalou, 1999) holds that the symbols are assumed to be the residues of a perceptual experience stored as patterns in the brain for activation. In light of the hypothesis, motion experience is to be activated and simulated as individuals process the motion concept shared by music and words. In music, the acoustic properties of music tempos mimic the properties of physical motion. Similar to the perception of music, the motion concept, which is abstract and implicit in language, is metaphorically encoded by lexical meaning (Hauk et al., 2004; Wolter et al., 2015). To illustrate it, the word –奔跑 “benpao/run” is first understood semantically and then its motion attributes as one of its lexical meanings was decoded metaphorically. Also, the motion concept of words is related to embodiment, our sensory-motor experiences (Barsalou, 1999). Likewise, the theory of embodiment semantics claims that the comprehension of lexical meaning is based on our bodily experience and can activate the brain regions responsible for perception, emotion, and action (de Vega et al., 2008; Horchak et al., 2014). This claim has been justified in an fMRI study that sensorimotor experiential traces are activated while processing words referring to an action, e.g., “kicking” can activate the premotor cortex in the brain as actual kicking movement being performed (Hauk et al., 2004).

As stated above, N400 turned out to be an index of semantic incongruity related to extramusical meaning and linguistic meaning. Yet to date, the literature using music to prime words' meaning or motion concept has been confined to three studies by Koelsch et al. (2004), Hedger et al. (2013), and Zhou et al. (2015). Hedger et al. (2013) conducted a behavioral study to reveal that accelerating music and decelerating music can better prime motion concept than fast music and slow music and the incongruency motion relations between tempos and pictures cost more time than the congruency. In Koelsch et al. (2004)'s ERP study, music excerpts were proved to be as valid as sentences to facilitate semantically congruent words, as revealed by the N400 in both music and sentence conditions. Similarly, Zhou et al. (2015) drew up music excerpts to prime semantically congruent and incongruent pictures in a set of ERP experiments to indicate that incongruent pairs elicited a larger N400 than the congruent pairs over the anterior and central regions, further justifying the role of N400 in revealing the motion concept conveyed by music. The three studies converge to show that music can convey motion concept and other meanings related to words or pictures. Nevertheless, scrutiny of their experiments uncovers some gaps to be filled: for one thing, word types as the target stimuli were not rigidly manipulated–e.g., concrete nouns were confused with abstract nouns, and mono-category words confused with multi-category words in Koelsch et al. (2004) and the heterogeneity of stimuli may fail to reveal the priming effect as anticipated; for the other, the motion concept-based (in)congruency was established between music and pictures as in Hedger et al. (2013) and Zhou et al. (2015), leaving an open question whether music can exert influence on verbal stimuli like verbs and nouns so that motion concept as a putative shared meaning can be evoked across more domains.

Against the above background, the current study referring to Hedger et al. (2013) and Zhou et al. (2015) utilizes two ERP experiments (Studies 1 and 2) to explore (1) whether music tempos can prime Chinese verbs and nouns, (2) how the congruency between music and words is established on their shared motion concept basis, and (3) how the four classes of verbs and nouns are distinguished from the perspective of processing under music priming conditions. Specifically, Study 1 tested how the two types of music (with vs. without tempo changes) affect the processing of two sub-classes of verbs (action verbs vs. state verbs) while Study 2 tested how the same two types of music affected the processing of two sub-classes of nouns (animate nouns vs. inanimate nouns). In association with previous relevant research, we make the predictions as below:

First, music with tempo changes could better facilitate the processing of verbs and nouns than music without tempo changes, yielding a reduced N400 effect by the words;

Second, on the basis of the shared motion concept, the incongruent pairs relative to the congruent pairs between music and words may elicit larger N400 amplitudes in the words;

Third, action verbs and animate nouns should be easier to process than state verbs and inanimate nouns, respectively. This relative ease was due to individuals' sensorimotor experience with more motion information related to the two subclasses of words.

Study 1

As a rule, verbs are used to typically represent actions, and music is characterized by tempo changes, a way of motion representation. Based on this conceptual similarity, the first study investigated whether and how music could activate the processing of verbs by virtue of the cross-modality concept priming paradigm. Two types of music (with or without tempo changes) were selected to prime two types of Chinese verbs (action verbs or state verbs). The whole experiment was composed of two tasks by a within-subjects design, in which Task 1 adopted the music with tempo changes (accelerating music vs. decelerating music) and Task 2 with the music without tempo changes (fast music vs. slow music) to prime verbs (action verbs vs. state verbs).

In light of the above-mentioned music type and verb type, four pairs of priming conditions were designed for Task 1 as shown in Figure 1: accelerating music—action verb pair, decelerating music—state verb pair to constitute a congruent relation whereas decelerating music—action verb pair and accelerating music—state verb pair to constitute an incongruent relation. Task 2 was similar to Task 1, with the only difference that the prime stimuli were music without tempo changes (fast music or slow music). Specifically, fast music—action verb pair and slow music—state verb pair constituted a congruent relation whereas slow music—action verb pair and fast—state verb pair constituted an incongruent relation. Such a design was referred to Hedger et al.'s (2013) finding that accelerating music and decelerating music could better prime pictures in motion and at rest separately, and in our experiment action verbs and state verbs were adopted to replace the pictures in motion and at rest so as to build up congruent and incongruent stimulus pairs.

FIGURE 1

Figure 1. Design of the cross-modality priming paradigm in task 1 of Experiment 1. Music tempos were used as primes, and verbs were used as targets.

Methods

Participants

A total of 40 Qufu Normal University students (age: M = 20.1 years, SD = 1.5 years, ranging from 18 to 24 years of age; gender: 36 women, four men) were recruited to participate in the experiment as paid volunteers. All the participants are native Chinese speakers, right-handed in terms of the Edinburgh Handedness Inventory (Oldfield, 1971), with normal hearing and normal or corrected to normal vision and no history of psychiatric or neurological diseases. They have no musical experience (none had received professional musical training or played any instruments). All participants signed a formal written consent before the experiment. The experiment was approved by the Ethics Committee of Qufu Normal University. The data of two participants were discarded due to excessive drift artifacts during the experiment. Therefore, the data to enter into post-hoc analysis consisted of 38 participants (age: M = 20.03 years, SD = 1.46 years; gender: 34 women, four men).

Stimuli Construction

Priming stimuli were music motifs. In Task 1, 20 music motifs were created as the priming stimuli, which were subdivided into 10 accelerating music motifs and ten decelerating music motifs and reduplicated twice for each. In Task 2, the priming music was changed into the one without tempo-change motifs (fast music and slow music). All the music motifs were created by a MIDI controller and audio editor 3.2.9, 44 kHz, 16-bit resolution, with an average duration of 10 s. These music stimuli consisted of two oscillating notes which were alternatively processed to form a rhythm (see Hedger et al., 2013 for similar manipulations). Participants listened to the music via Sony MRD-XB55AP headphones prior to making (in) congruency judgment between the music and the visualized words on the computer screen. In Task 1, accelerating music (which began with a tempo of 120 BPM and ended with 600 BPM) represented the music with strong motion information while decelerating music (from 600 to 120 BPM) represented the music with static motion information. In Task 2, fast music motifs (at 600 BPM from the beginning of a note to its end) and slow music motifs (at 120 BPM) are linked with music with strong motion and static motion information, respectively. Additionally, 20 music motifs with irregular tempo changes were selected as fillers for each task.

The target stimuli were Chinese verbs, comprised of 20 action verbs and 20 state verbs in each task, yielding 40 target stimuli for each task. No verbs in Task 1 re-occurred in Task 2. Target words were taken from the CCL corpus (Center for Chinese Linguistics, Peking University) with high frequency. According to Hu et al. (1989), action verbs were selected by referring to the following criteria: (1) the verbs signal action other than state; (2) the action is autonomous; (3) the action is concrete other than virtual or abstract; (4) each action doer (agent) has the strong executive ability and individual motivation. In accordance with the definition of state verb by different scholars (Yuan, 1998; Chen, 2002; Ma, 2005), forty state verbs were selected by referring to the following criteria: (1) the verbs are non-autonomous; (2) the verbs signal sustainable state or property other than action; (3) the verbs are not used perfectively; (4) verbs are non-bodily and have no spatial displacement. The stoke number of words were balanced (action verb: M = 17.3, SD = 4.04; state verb: M = 18.7, SD = 4.071), with independent t-test showing that they were not systematically different [t₍₇₈₎ = −1.544, p = 0.127]. A total of 20 conjunctions were selected as fillers for each task.

Normalization of Materials

In order to obtain the stimuli for experiments, we normalized the materials by testing the motion attributes and imaginability of verbs and the relatedness between music and verbs prior to the experiment.

In the motion attributes test and imaginability test, action verbs and state verbs were selected to represent high-motion words and low-motion words, respectively, and all verbs with high imaginability. Before the experiment, 60 action verbs and 60 state verbs on a seven-point scale were tested by 100 participants who did not participate in the formal experiment, ranging from −3 (very low in motion) to +3 (very high in motion) the test motion attribute and −3 (very low imaginability) to +3 (very high imaginability) for the test of imaginability. Finally, 40 action verbs and 40 state verbs were selected as experimental materials. The average score of action verbs with high motion information evaluation was 2.165 (SD = 0.213) and state verbs with low motion information evaluation was −2.105 (SD = 0.214). In terms of imaginability, the average scores of action verbs and state verbs were 2.045 (SD = 0.281) and 2.016 (SD = 0.199). An independent samples t-test showed that the two classes of verbs were significantly different from each other with regard to motion information [t₍₇₈₎ = 89.483, p < 0.01] but not significantly different in imaginability [t₍₇₈₎ = 0.523, p = 0.603].

In order to examine whether music is related to verbs with respect to the motion concept, we tested their congruency relations on the basis of a seven-point scale, from −3 (strong unrelated) to +3 (strong related), with 0 signaling uncertainty. Only the excerpts with a mean rating score of higher than +1 or lower than −1 were selected as experimental materials. Results showed that the mean scores of the congruent groups in Tasks 1 and 2 were 2.011 (SD = 0.246) and 1.876 (SD = 0.246), respectively, and the mean rating scores of the incongruent groups in Tasks 1 and 2 were −1.966 (SD = 0.271) and −1.734 (SD = 0.182), respectively, suggesting that people can well-establish the congruency between music and nouns perceptively. Besides, t-test revealed that the congruent pairs and the incongruent pairs were significantly different in both Task 1 [t₍₃₈₎ = 48.664, p < 0.01] Task 2 [t₍₃₈₎ = 52.819, p < 0.01]. According to the rating results, 40 pairs in each task (10 pairs in each condition) were chosen as final experiment materials for subsequent experiments. Items and conditions were counterbalanced in each task. All the materials were presented in random order and designed via E-prime 3.0 (Psychology Software Tools, Inc.).

Procedures

Participants were seated on a comfortable chair in front of the computer screen approximately one meter away in a soundproof room. In each trial, the participants heard a piece of music via the earphone they were wearing, followed by a verb at the center of the screen on a computer. Their tasks were to as quickly as possible judge whether or not the verb was congruent with the music motif in terms of motion information (pressing the button “F” for congruent or “J” for incongruent), with the procedure as shown in Figure 2. As the experiment started, there appeared a black fixation cross (lasting 1,500 ms) at the center of the screen against a white background. As the cross turned red, the music as the priming stimulus was broadcasted for 10 s. Afterward, a second black fixation cross came out as an interval for 250 ms until each participant made the congruency judgment on the target verb. No response time limitation was set for the judgment on the verb. All participants were required not to blink their eyes as much as possible with the exception of the interval (1,500 ms black fixation cross) between each trail. The order of all the trails was presented randomly, with each trail for once. The entire experiment lasted ~14 min.

FIGURE 2

Figure 2. The procedure of Experiment 1.

EEG Recording and Preprocessing

The EEG data were recorded by AC amplifiers from 64 scalp locations of the International 10–20 system. The electrodes placed on the left and right mastoids served as the reference, and the electrode between Fz and Cz was selected as the ground. To filter the eye movement and eye blink, the data of the horizontal electrodes placed on the canthus of each eye and vertical electrodes above and below the left eye were recorded. Impedances of the electrodes were kept below 5 kΩ. EEG data were digitized with the rate of 1,000 Hz and amplified and filtered within a band-pass of 0.1 and 30 Hz. Trails with artifacts (eye movements, head movements) exceeding the amplitude of ±200 μV at any channel were excluded. Raw EEG data were preprocessed by NeuroScan SynAmps2 8050 (Compumedics Neuroscan) and Curry 8 software (Compumedics Neuroscan). Further processing was carried out by EEGLAB 14.1.1 (Delorme and Makeig, 2004) in MATLAB 2013b (MathWorks).

ERP Data Analysis

All the data were analyzed by computing the mean amplitudes under each condition for each task. ERP data were segmented from 200 ms before to 800 ms after the onset of the target, with a 200 ms pre-stimulus correct baseline. Based on the previous studies on the N400, a time window from 300 to 500 ms after target-stimulus onset was chosen for statistical analysis.

Preliminary analysis of congruency level did not show the processing advantages of congruent pairs or incongruent pairs, suggesting the relations between music and words are more complex. Given that different verbs have different attributes and processing patterns (e.g., Amsel and Cree, 2013; Muraki et al., 2020: nonbodily state abstract verbs can elicit a larger N400 component than concrete verbs), we referred to Hedger et al. (2013) to reanalyze the data by selecting the music tempos and verbs as conditions so as to observe the processing of each type of word. The result shows that the motion concept indexed by N400 can be revealed by different types of music and words but not by their congruency levels.

To test the distribution of the effects, nine-item regions of interest (ROIs) were selected, with details shown in Table 1. For statistical analysis, ERPs were analyzed by repeated-measures ANOVAs to test the effects among music, verb, hemisphere, and regions. Nine regions of interest were divided into lateral electrode regions and midline electrode regions. The mean amplitudes of the lateral and midline electrode regions were computed separately for analysis. For the lateral electrodes, music (accelerating music and decelerating music in Task 1, fast music and slow music in Task 2), verb (action verbs and state verbs), region (anterior, central, and posterior) and hemisphere (left hemisphere and right hemisphere) served as within-subjects factors. For the midline electrodes, music (accelerating music and decelerating music in Task 1, fast music and slow music in Task 2), verb (action verbs and state verbs), and regions (anterior, central, and posterior) served as within-item factors.

TABLE 1

Table 1. Electrode channels in each region of interest (ROI).

In the analysis of the general linear model, data were adjusted by the Bonferroni correction. All p-values reported below were adjusted with the Greenhouse–Geisser correction when the degree of freedom in the numerator was larger than 1. The reported eta squared (η²) was used to measure the effect size for ANOVAs (Olejnik and Algina, 2003).

Results

Behavioral Results

Behavioral results were analyzed in R studio 3.5.1 software (R Development Core Team, 2012). Given the poor performance of participants would influence the results, the data were deleted for the trials in which the RTs were shorter than 500 ms or longer than 3SD above the overall average. As a result, 9.2 and 7.68% of the collected data were discarded in Tasks 1 and 2, respectively. The accuracy (ACC) and reaction time (RT) were recorded by computer automatically. In Tasks 1 and 2, the participants responded with a mean accuracy of 91.55% (SD = 0.258) and 92.45% (SD = 0.25), respectively, indicating that they followed the instructions and attended to the stimuli carefully. The mean reaction time in Tasks 1 and 2 was 1,136.75 ms (SD = 339.5) and 1,137.75 ms (SD = 36), respectively. A main effect of verb type was found in both tasks (Task 1: β = 0,112, SE = 0.041, df = 70.275, t = 2.73, p < 0.01; Task 2: β = 0.155, SE = 0.036, df = 39.450, t = 4.361, p < 0.01), i.e., the participants responded faster in action verbs condition than in state verbs condition.

Electrophysiological Results

The grand averaged ERP waveform elicited by actions verbs and state verbs in different music conditions are shown in Figure 3 (Task 1) and Figure 4 (Task 2) respectively.

FIGURE 3

Figure 3. Grand ERP averages elicited by action and state verbs in tempo-with-change music priming conditions (Task 1).

FIGURE 4

Figure 4. Grand ERP averages elicited by action and state verbs in tempo-without-change music priming conditions (Task 2).

In Task 1, the ERP waves to action verbs and state verbs diverged at ~250 ms after the target verb onset and reached maximality at 400 ms and afterward lasted one more 100 ms (at 500 ms) (as shown in Figure 3). Statistical analysis showed that there was a significant main effect of verb (lateral electrodes: F = 7.557, p = 0.009, η² = 0.17) in that state verbs elicited larger N400 amplitudes than action verbs independent of music condition. In addition, there occurred a significant interaction between music and verb type (middle electrodes: F = 10.593, p = 0.002, η² = 0.223). Further simple effect test on verbs showed that state verbs compared to action verbs evoked enhanced N400 amplitudes in both accelerating music (lateral electrodes: F = 9.309, p = 0.004, η² = 0.201) and decelerating music conditions (middle electrodes: F = 10.591, p = 0.002, η² = 0.223). Another simple effect test on music revealed that action verbs induced larger amplitudes in accelerating music condition than in decelerating music condition around the anterior region (middle electrodes: F = 5.108, p = 0.03, η² = 0.121) (as exhibited in Figure 5). Significant interactions were observed between verb and hemisphere (F = 4.132, p = 0.049, η² = 0.100) but no significant interactions were found between music and hemisphere (F = 0.16, p = 0.692, η² = 0.004), music and region (lateral: F = 2.495, p = 0.119, η² = 0.063; middle: F = 1.347, p = 0.252, η² = 0.036), and between verb and region factor (lateral: F = 2.178, p = 0.142, η² = 0.056; middle: F = 2.349, p = 0.132, η² = 0.060). Moreover, there were not three-way interactions among music, verb, hemisphere (F = 2.455, p = 0.126, η² = 0.062) and music, verb, region factors (lateral: F = 2.518, p = 0.112, η² = 0.064; middle: F = 2.506, p = 0.117, η² = 0.063).

FIGURE 5

Figure 5. Scalp topographic maps of N400 across four conditions in Task 1.

To sum up, electrophysiological results of task 1 showed that state verbs elicited a larger N400 than action verbs, independent of music type over the anterior regions at the array of analyzed electrodes. Moreover, action verbs relative to state verbs were more sensitive to music type, i.e., action verbs elicited a larger amplitude in accelerating music conditions than in decelerating music conditions during processing action verbs.

In Task 2, the results were quite different. Statistical analysis showed that the main effect of verb type was not significant at both lateral electrodes (F = 0.018, p = 0.893, η² = 0.000) and middle electrodes (F = 0.045, p = 0.833, η² = 0.001), indicating that action verbs and state verbs were not so differently processed when primed by the music without tempo changes (fast or slow music). However, there was a significant interaction between music type and verb in the midline (F = 7.985, p = 0.008, η² = 0.177) and lateral electrodes regions (F = 9.22, p = 0.004, η² = 0.199). The simple effect test on verb indicated that action verbs elicited larger N400 amplitudes than state verbs in both fast music condition (lateral electrodes: F = 6.422, p = 0.016, η² = 0.148) and slow music condition (lateral electrodes: F = 5.887, p = 0.02, η² = 0.137; middle electrodes: F = 5.682, p = 0.022, η² = 0.133). There were no significant interactions between music and electrode position factors and between verb and electrode position factors, both for lateral (music × hemisphere: F = 0.101, p = 0.752, η² = 0.003; verb × hemisphere: F = 1.650, p = 0.687, η² = 0.004; music × region: F = 0.237, p = 0.643, η² = 0.006; verb × region: F = 0.749, p = 0.418, η² = 0.02) and middle electrodes (music × region: F = 0.749, p = 0.418, η² = 0.02; verb × region: F = 0.072, p = 0.889, η² = 0.002). No three-way interactions among music, verb, hemisphere (lateral: F = 3.712, p = 0.062, η² = 0.091) and music, verb, region factors (lateral: F = 0.323, p = 0.594, η² = 0.009; middle: F = 0.415, p = 0.662, η² = 0.011) were found.

In brief, the music without tempo changes affected the processing of verbs. Contrary to the situation in Task 1, action verbs induced a larger N400 than state verbs in both music conditions.

Discussion

Study 1 intended to investigate how Chinese verbs were processed in different music priming conditions provided that verbs and music take on the shared motion concepts. The ERP data revealed that the effects of different music tempo types were varied. The N400 was elicited by state verbs in Task 1 but by action verbs in Task 2, implying that music with tempo changes facilitated the processing of action verbs but music without tempo changes inhibited the processing of action verbs. That is, change-in-tempo music was more related to the motion concept and triggered the cognition of action verbs as a result. This result partly justifies the first prediction that music with tempo changes can facilitate the processing of verbs.

Unexpectedly, the motion congruency effect by N400 was not found in the two tasks. In different music conditions, action verbs in music with strong motion information conditions (accelerating music in Task 1 and fast music in Task 2) induced larger ERP amplitudes, contrary to our second prediction that processing congruent pairs (accelerating music-action verbs; fast music-action verbs) should be easier. This result was somewhat comparable to the word-repetition situation in which the resulting ERP wave of a word becomes stronger when it is primed on its own than by a different but relevant word (Michael, 1985). Here, we may assume that it is the repetition of motion concepts shared by music and action verb that leads to the increased ERP amplitudes relating to the processing of action verbs. Another related possibility is that establishing the congruency between music and verbs is similar to the music-picture pairs manipulations in Hedger et al. (2013) but more difficult due to the inherent greater complexity in words than in pictures, for the motion concept is encoded directly by pictures but indirectly by words.

The results also demonstrate that the processing patterns between action verbs and state verbs vary with music types. The attenuated N400 amplitudes of action verbs suggest that the processing of action verbs is easier and compatible with the third prediction. In addition, the significant difference between music conditions observed only in action verbs suggests that the sub-types of music with tempo changes only affect action verbs but not state verbs. That is, action verbs seem more sensitive to music conditions. In our study, state verbs elicited enhanced N400 amplitudes, confirming Muraki et al.'s (2020) finding that non-bodily state verbs elicited larger N400 amplitudes than concrete verbs in a syntactic classification task. The motion information in action verbs appears to be simulated more easily than in state verbs due to the more motion information they share.

In brief, the different effects of music tempo types on different verbs verify the idea that people tend to analogically understand the motion concept in music and language by virtue of the acoustic features and linguistic meaning. In this process, bodily motion experience is supposed to be activated as soon as the music and the verbs with high motion information are presented. Yet this experience does not bring about the absolutely similar activation in music and words because they are two different categories and hence distinctly encoded with regard to motion concept.

Study 2

Verbs and nouns are two important word classes in Chinese. As revealed in Study 1, verbs have motion attributes, but in folk cognition, verbs are assumed to contain more dynamic information while the majority of nouns contain more static information. Neurophysiological studies show that verbs and nouns are differently processed (e.g., Pulvermüller et al., 1999). In addition, nouns (just like verbs) can be subdivided into many subcategories, among which is the dichotomy of animate nouns and inanimate nouns in terms of the trait (±ANIMACY). In linguistics, it is generally agreed that animate nouns are more associated with action than inanimate nouns, for the former is used to signify objects in motion whereas the latter signals entities at rest in general (Weckerly and Kutas, 1999). On this account, Study 2 was conducted to compare how to animate nouns and inanimate nouns were processed in different priming-music conditions, and the music pieces were just the same sets as adopted in Study 1.