A Transient Auditory Signal Shifts the Perceived Offset Position of a Moving Visual Object

Chien, Sung-En; Ono, Fuminori; Watanabe, Katsumi

doi:10.3389/fpsyg.2013.00070

ORIGINAL RESEARCH article

Front. Psychol. , 21 February 2013

Sec. Consciousness Research

Volume 4 - 2013 | https://doi.org/10.3389/fpsyg.2013.00070

This article is part of the Research Topic Awareness shaping or shaped by prediction and postdiction View all 15 articles

A transient auditory signal shifts the perceived offset position of a moving visual object

Sung-en Chien¹*

Fuminori Ono^1,2

Katsumi Watanabe¹

¹Research Center of Advanced Science and Technology (Cognitive Science), The University of Tokyo, Meguro-ku, Tokyo, Japan
²Yamaguchi University, Yamaguchi-shi, Yamaguchi, Japan

Information received from different sensory modalities profoundly influences human perception. For example, changes in the auditory flutter rate induce changes in the apparent flicker rate of a flashing light (Shipley, 1964). In the present study, we investigated whether auditory information would affect the perceived offset position of a moving object. In Experiment 1, a visual object moved toward the center of the computer screen and disappeared abruptly. A transient auditory signal was presented at different times relative to the moment when the object disappeared. The results showed that if the auditory signal was presented before the abrupt offset of the moving object, the perceived final position was shifted backward, implying that the perceived visual offset position was affected by the transient auditory information. In Experiment 2, we presented the transient auditory signal to either the left or the right ear. The results showed that the perceived visual offset shifted backward more strongly when the auditory signal was presented to the same side from which the moving object originated. In Experiment 3, we found that the perceived timing of the visual offset was not affected by the spatial relation between the auditory signal and the visual offset. The present results are interpreted as indicating that an auditory signal may influence the offset position of a moving object through both spatial and temporal processes.

Introduction

Tracking the trajectory and localizing the position of a moving visual object are essential abilities for carrying out many tasks in everyday life. Studies have demonstrated that the perceived or remembered position of a moving object is consistently biased in the forward direction of motion. This forward bias is referred as representational momentum (RM) which can be observed in both implied and continuous motion. Studies of RM have also demonstrated that the final perceived position of a moving object is mislocalized in the forward direction of motion (Freyd and Finke, 1984; Hubbard and Bharucha, 1988). RM could result from the mental representation of the object’s motion persisting for a brief period after abrupt offset (Teramoto et al., 2010).

The perceptual system receives information through different, interacting sensory modalities. The inputs from different sensory modalities interact in various ways. In this study, we were interested in whether the perceived position of a visual motion offset would be influenced by a transient auditory signal.

Several previous studies have investigated how visual motion perception is modulated by a transient auditory signal. In the flash-lag effect, the perceived position of a moving object appears to be relatively ahead of a physically aligned flash (e.g., Nijhawan, 1994; Watanabe and Yokoi, 2006, 2007, 2008; Maus and Nijhawan, 2009). This phenomenon seems to be a result of the visual representation of moving objects being spatially shifted forward to counteract delays in the neural system on the perceived position. Vroomen and de Gelder (2004) showed that the magnitude of the flash-lag effect is reduced when a transient auditory signal is presented before or simultaneously with the flash. In addition, Heron et al. (2004) demonstrated that the location of a horizontally moving object that changes its direction against a vertical virtual surface is perceptually displaced forward with respect to the direction of previous motion when a sound is presented after the actual bounce event, and the perceived bounce position is shifted in the direction opposite to previous motion when a sound is presented before the actual bounce. Fendrich and Corballis (2001) asked participants to report the position of a rotating flash when an audible click was heard. The flash was seen earlier when it was preceded by an audible click and later when followed by the click.

These studies indicate the possibility that, when judging the offset position of a moving visual object, our perceptual system may not rely exclusively on visual information, but may also utilize information from other modalities. However, this explanation is not completely consistent with the modality precision hypothesis, which suggests that the modality with the highest precision with regard to the required task tends to be dominant in multimodal interactions (Shipley, 1964; Welch and Warren, 1980, 1986; Welch et al., 1986; Spence and Squire, 2003). The modality precision hypothesis would suggest that when judging the offset position of a moving visual object, the perceived visual offset would be processed exclusively by the visual system rather than also utilize information from other modalities (e.g., audition). Therefore, we hypothesized that in a situation allowing a transient auditory stimulus to be associated with a visual motion offset, the auditory stimulus will influence the perceived final position of the moving object.

Recently, Teramoto et al. (2010) found that the magnitude of RM is influenced by a continuous sound accompanying a moving visual object. They showed that RM is enhanced when the sound terminates after the offset of the visual object, but reduced when the sound terminates before visual offset. However, their results also indicated that transient auditory signals presented at the onset and around the offset of the visual motion had no effect on the perceived offset position of the visual object. On the basis of these observations, they suggest that the sustained sound during visual motion is necessary for the audiovisual integration to have an effect. However, based on studies indicating that visual motion perception can be modulated by a transient auditory signal (Fendrich and Corballis, 2001; Heron et al., 2004; Vroomen and de Gelder, 2004), it is still possible that the visual offset position could be influenced when a transient sound is presented temporally proximal to the offset of the visual stimulus without an auditory signal having been presented at the motion onset. Additionally, in the study by Teramoto et al. (2010), the authors measured RM with a probe-judgment task. However, a mouse-pointing task is typically used with continuous motion target (Hubbard, 2005). In light of this information, we decided to measure the perceived visual offset position by using a mouse-pointing task in the present study.

Multisensory interactions are also affected by the characteristics of the stimuli in different modalities. For example, a single visual flash can be perceived as multiple flashes if accompanied by multiple auditory stimuli (sound-induced illusory flash). Discontinuous stimuli in one modality seem to alter the perception of continuous stimuli in another modality. This indicates that multisensory interaction is at least partly affected by stimulus characteristics: continuous versus discontinuous (Shams et al., 2002). Additionally, Courtney et al. (2007) reported that one flash presented near a visual fixation induces an illusory flash in the periphery. Courtney et al. suggest that the effect of stimulus discontinuity/continuity may also be valid for unisensory stimuli.

The multisensory effect of a transient stimulus is not confined to perceptual alternation between competing incompatible interpretations when the perceptual system is confronted with ambiguous stimuli. The multisensory effect can also be observed when there are no competing incompatible interpretations. Attentional repulsion is described as the perceived displacement of a vernier stimulus in a direction that is opposite to a brief peripheral visual cue. Arnott and Goodale (2006) demonstrated that the repulsion effect could be induced by presenting lateralized sounds as peripheral cues, showing that auditory spatial information can displace the perceived positions of static visual stimuli. This finding indicates the possibility that the location of sound may affect the retinotopic coding. Recently, Teramoto et al. (2012) presented results of a study of visual apparent motion in conjunction with a sound delivered alternately from two loudspeakers aligned horizontally or vertically. Participants reported that the direction of visual apparent motion was consistent with the direction of sound alternation or the auditory stimulus influenced the path of apparent motion. The researchers suggest that auditory spatial information could also modulate the perception of a visual moving object, especially in the peripheral visual field.

Audiovisual interaction is enhanced when visual signals and auditory signals are presented in close proximity spatially. For example, observers are more likely to report that visual stimuli and auditory stimuli are presented simultaneously when they originate from the same spatial position than when they originate from different positions (Zampini et al., 2005). When observers are asked to determine the direction of auditory apparent motion while trying to ignore unrelated visual motion, they perform worse when the auditory motion is in the opposite direction to the visual apparent motion. This audiovisual dynamic capture effect is larger when the auditory and visual stimuli are presented from close spatial locations (Soto-Faraco et al., 2002; Meyer et al., 2005; Spence, 2007).

On the basis of these findings, we hypothesized that it is possible for auditory information to affect perceived visual motion offset in the peripheral visual field, and that this effect will be enhanced when visual stimuli and auditory stimuli are presented to the same hemifield. Because studies have indicated that an auditory transient can alter apparent motion perception (e.g., Heron et al., 2004), we examined whether a transient auditory signal would affect the perceived offset position of a visual moving object, and if so, spatial contingency between auditory signal and visual object would enhance the auditory modulation. To achieve this goal, we presented a transient sound around the time of visual motion offset and asked participants to report the perceived offset position of the visual stimulus (Experiment 1). In addition, we tested whether the auditory spatial information would influence the effect of the auditory stimulus on the perceived visual offset position (Experiment 2). After affirmative results were obtained in both experiments, we examined whether the auditory effects were caused by distortion in the perceived timing of the offset of the visual moving object (Experiment 3).

Experiment 1

In Experiment 1, we examined the possibility that the timing of a transient auditory signal would affect the perceived offset position of a visual moving object. Such an effect would demonstrate that a continuous auditory stimulus during visual motion is not necessary to alter the perceived visual offset position. We conducted Experiment 1A and 1B. The visual target appeared in left visual field and moved rightward (Experiment 1A) or in right visual field and made rightward motion (Experiment 1B), and then the visual target disappeared around the center the display. A transient auditory signal was presented around the visual motion offset of the visual target. We treated the two motion direction conditions as a between-subjects variable to reduce task loads for each participant.

Method

Participants

There were 16 paid volunteers in Experiment 1A (10 males, 6 females) and 1B (11 males, 5 females). Their ages ranged from 20 to 34 years (mean = 25.1) in Experiment 1A and from 19 to 28 years (mean = 21.7) in Experiment 1B. All were right-handed by self-report. All participants had normal or corrected-to-normal vision and audition and were naïve as to the purpose of this study.

Apparatus and stimuli

Participants observed the visual stimuli on a 23′′ CRT monitor at a viewing distance of 60 cm. The monitor’s refresh rate was 100 Hz. The visual and auditory stimuli were presented using the MATLAB operating environment and the Psychtoolbox extensions (Brainard, 1997; Pelli, 1997). The background was divided horizontally into two parts (Figure 1). The upper part was gray (40° × 10.5°, 7.85 cd/m²) and the lower part was black (40° × 19.5°, 0.03 cd/m²). A white fixation cross (1° × 1°, 61.27 cd/m²) was presented at the center of the lower background.

FIGURE 1

Figure 1. Example of the visual display in Experiment 1.

The visual stimulus was a black disk (1° in diameter) that appeared at the bottom of the gray background, 15° to the left (Experiment 1A) or right (Experiment 1B) of the midpoint. The disk moved from left to right (Experiment 1A) or from right to left (Experiment 1B) at a constant speed of 15°/s. The disk disappeared when its center was at the midpoint or randomly jittered from the midpoint by ±0.3°. The auditory stimulus was a transient auditory signal with a 1000-Hz pure tone without onset or offset intensity ramps, presented via headphones to both ears for 10 ms. Note that previous research has shown that a 10-ms-sound could produce effect on audio-visual interaction (e.g., Fujisaki et al., 2004; Ono and Kitazawa, 2011). The approximate range of sound pressure level was 60–65 dB. The sound was presented 120, 80, or 40 ms before the visual motion offset, simultaneously with the visual offset (0 ms), or 40, 80, or 120 ms after the visual offset. As a control condition, we included trials in which the sound was absent.

Procedure

Participants started each trial by pressing the space key. The black disk appeared and stayed stationary at the initial position for 500 ms. Participants were asked to observe the disk while keeping their eyes on the fixation cross. After the initial stationary period, the black disk moved at a constant speed of 15°/s for 1000 ms and then disappeared around the midpoint of the display. A mouse cursor appeared 1° above the fixation cross 200 ms after the disappearance of the visual target. The participants were instructed to move the mouse cursor and click the mouse button at the target’s visual offset position.

Participants performed 10 practice trials to familiarize themselves with the position judgment task. Then, they performed 10 trials in each combination of conditions for a total of 240 trials (8 sound conditions × 3 visual offset positions × 10 trials). Trials of all conditions were randomly ordered.

Statistical analysis

The data were submitted to a two-way mixed-design analysis of variance (ANOVA) followed by post hoc comparisons with the Bonferroni correction with the alpha level set at 0.05.

Results and Discussion

We calculated the average deviation of the perceived visual offset position from the physical visual offset point for each sound condition. Figure 2 shows the combined results of Experiments 1A and 1B. The horizontal axis represents the different sound conditions. The vertical axis represents the perceived deviation from the actual physical visual offset position. A negative value in the deviation from visual offset (Y-axis) means that the perceived visual offset position was behind the actual visual offset position.

FIGURE 2

Figure 2. Results of Experiments 1A and 1B. The horizontal axis represents the experimental conditions for different presentation timings of the auditory signal. The vertical axis represents the perceived deviation from the physical visual offset position in degrees of visual angle. A negative value in the deviation from offset (Y-axis) means that the perceived visual offset position was behind the actual visual offset position. Error bars represent within-participants SEMs (Loftus and Masson, 1994; Cousineau, 2005) for each presentation. Data points with an * mark indicate that the perceived positions differ from 0.

We performed a two-way mixed-design ANOVA, in which the visual field of start position was the between-subjects factor and the timing of the auditory signal was treated as the within-subject factor. The main effect of the visual field of start position was not significant [F(1,30) = 0.499, p = 0.485]. The main effect of the timing of the auditory signal was significant [F(7,210) = 36.261, p < 0.001]. There was no significant interaction between the visual field of start position and the timing of the auditory signal [F(7,210) = 0.48, p = 0.849]. Overall, these results suggest that the earlier the auditory signal was presented, the farther away the visual offset was shifted backward (i.e., the perceived visual offset position shifted backwards).

Then, we compared the cell means of the perceived visual offset position against zero to test whether there was a significant displacement from the actual position in each condition (Table 1). The adjusted alpha level was 0.006 (0.05/8) when comparing the cell means against zero. In Experiment 1A, only the −120, −80, and −40 ms conditions significantly differed from zero [t(15) = 5.69, t(15) = 5.88, and t(15) = 5.89, respectively; all p < 0.006]. In Experiment 1B, the −120 and −80 ms conditions differed significantly from zero [t(15) = 6.57 and t(15) = 5.27, respectively; p < 0.006]. Thus, we confirmed that when the auditory signal was presented before the physical offset of the visual stimulus, the visual offset position tended to be perceived as behind the actual physical visual offset position. Conversely, no significant displacement was found in the 0, 40, 80, and 120 ms conditions, implying that the auditory signal did not produce an effect when presented after or at the moment of the visual motion offset.

TABLE 1

Table 1. Perceived offset position in Experiment 1 in visual degree.

We also compared the cell means of each condition in which an auditory signal was presented to the cell mean of the silent condition. The adjusted alpha level is 0.007 (0.05/7). In Experiment 1A, the perceived visual offset positions in the −120, −80, and −40 ms conditions differed from that in the silent condition [t(15) = 4.46, t(15) = 4.23, and t(15) = 3.34, respectively; all p < 0.007]. In Experiment 1B, the perceived visual offset positions in the −120, −80, and −40 ms conditions differed from that in the silent condition [t(15) = 5.24, t(15) = 5.01, and t(15) = 3.30, respectively; all p < 0.007]. We observed that the silent condition did not differ from the conditions in which the auditory signal was presented after physical visual offset in either Experiment 1A [t(15) < 1.11, p > 0.05] or 1B [t(15) < 1.05, p > 0.05].

The lack of RM in the present experiments is notable, but similar findings have been reported in several previous studies in which observers were given instructions to maintain fixation. Previous research has also indicated that fixation decreases RM for targets with smooth and continuous motion (Kerzel, 2000). It is possible that we did not observe RM in Experiment 1 because we used visual stimuli with smooth and continuous motion. However, RM has also been observed for targets with implied motion and for frozen-action photographs that do not elicit eye movements (Kerzel, 2003; Hubbard, 2005, 2006). Although we emphasized to participants the importance of maintaining focus on the fixation cross, we did not record eye movements. In order to examine whether eye movements might have played a major role in the present experiment, we performed an experiment for a supplementary examination using the same stimuli as in Experiment 1A, in which participants (N = 5) were free to move their eyes during the experiment. The results showed the same pattern as Experiment 1A [F(7,28) = 8.028, p < 0.001]. We observed a tendency toward greater backward displacement when the sound was presented earlier. Therefore, the lack of RM in the present study cannot be explained completely by the instruction to maintain fixation. The lack of RM might be due partially to the shorter delay from the target offset to the appearance of the mouse cursor. Kerzel et al. (2001) showed that RM was larger with the longer delay between the target and probe. The delay was 200 ms in our study while it was 500 ms in Teramoto et al.’s study.

In the present study, it is more likely that the perceived timing of visual motion offset was attracted toward the timing of the presentation of the transient sound when the sound was presented before the physical visual motion offset, which resulted in the decreased magnitude of RM and consequently induced backward displacement. When the transient sound was presented after the physical visual motion offset, the perceived visual offset position of the visual target did not differ from the condition in which the sound was absent. In addition, our results also imply that this effect might not be confined to a visual stimulus presented at the periphery that moves to the foveal region. However, these issues require further empirical examination.

We also analyzed the average response times for completing the mouse-pointing task in each trial. Response times were not affected by different auditory stimulus timings [visual field of start position, F(1,30) = 0.967; timing of the auditory signal, F(7,210) = 0.873; interaction, F(7,210) = 0.250; all p > 0.05].

Experiment 2

In Experiment 2, we investigated whether the spatial contingency between auditory signals and visual events would modulate the auditory influence on the perceived offset position of the visual motion. We presented a lateralized transient auditory signal to either the left or the right ear with the same visual stimuli used in Experiment 1. The visual target appeared at left visual field and moved rightward in Experiment 2A. In Experiment 2B, the visual target appeared at right visual field and moved leftward. The visual field of start position was treated as a between-subjects variable to reduce the task load for each participant.