Ostensive signals support learning from novel attention cues during infancy

Wu, Rachel; Tummeltshammer, Kristen Swan; Gliga, Teodora; Kirkham, Natasha Zoe

doi:10.3389/fpsyg.2014.00251

ORIGINAL RESEARCH article

Front. Psychol. , 25 March 2014

Sec. Developmental Psychology

Volume 5 - 2014 | https://doi.org/10.3389/fpsyg.2014.00251

Ostensive signals support learning from novel attention cues during infancy

$\r\nRachel Wu*$ Rachel Wu¹^*

Kristen S. Tummeltshammer²

Teodora Gliga^2†

Natasha Z. Kirkham^2†

¹Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
²Department of Psychological Sciences, Centre for Brain and Cognitive Development, Birkbeck, University of London, London, UK

Social attention cues (e.g., head turning, gaze direction) highlight which events young infants should attend to in a busy environment and, recently, have been shown to shape infants' likelihood of learning about objects and events. Although studies have documented which social cues guide attention and learning during early infancy, few have investigated how infants learn to learn from attention cues. Ostensive signals, such as a face addressing the infant, often precede social attention cues. Therefore, it is possible that infants can use ostensive signals to learn from other novel attention cues. In this training study, 8-month-olds were cued to the location of an event by a novel non-social attention cue (i.e., flashing square) that was preceded by an ostensive signal (i.e., a face addressing the infant). At test, infants predicted the appearance of specific multimodal events cued by the flashing squares, which were previously shown to guide attention to but not inform specific predictions about the multimodal events (Wu and Kirkham, 2010). Importantly, during the generalization phase, the attention cue continued to guide learning of these events in the absence of the ostensive signal. Subsequent experiments showed that learning was less successful when the ostensive signal was absent even if an interesting but non-ostensive social stimulus preceded the same cued events.

Introduction

By the first few months of life, infants follow social cues (i.e., head turn and gaze direction, D'Entremont, 2000; Senju and Csibra, 2008) to isolate events in a busy multimodal environment. While there is a large literature documenting when young infants begin to follow these social cues, recent work has demonstrated that social cues not only direct infants' attention, but also their subsequent learning about objects in cued locations (e.g., Yoon et al., 2008; Wu and Kirkham, 2010; Wu et al., 2011). In Wu and Kirkham (2010), 8-month-olds were presented with two identical audio-visual events simultaneously in two different locations on a computer screen. Infants' attention was oriented to one of the events using either a social cue (a face saying “Hi baby, look at this!” and turning toward the target event) or a non-social cue (a red flashing square that surrounded the target event). Both of these cues directed attention equally, as measured by equal gaze time to cued events. However, only the infants exposed to the social cue predicted the location of the cued events, suggesting that social attention cues shape the likelihood and content of learning about events during infancy.

Social cues are often preceded by ostensive signals (i.e., a smiling face making eye-contact while addressing the infant in infant-directed speech) in both natural and laboratory environments (see Csibra and Gergely, 2009). A number of studies have highlighted the importance and effectiveness of ostensive signals when directing infants' attention and learning. For example, eye contact or infant directed speech are necessary for gaze shifts to successfully orient attention in 4- and 6-month-old infants (Farroni et al., 2003; Senju and Csibra, 2008). A few recent studies suggest that ostensive signals also promote learning from gaze shifts (Wu and Kirkham, 2010; Wu et al., 2011) and pointing (Yoon et al., 2008). Ostensive signals seem to tell infants when to pay attention and work in conjunction with social attention cues (e.g., gaze and head direction) to tell infants what to learn. These signals have been suggested to enhance learning to the attended stimulus (Csibra and Gergely, 2009), although the exact underlying neural mechanisms are not known.

If ostensive signals support learning from social cues such as gaze shifts, it is possible that infants can use ostensive signals to support learning from other novel attention cues. Leekham et al. (2010) showed that by 3 years of age children were able to use a replica cue (e.g., a miniature version of a target container) to find stickers hidden underneath the actual target container only if the replica cue was presented with an ostensive signal (i.e., smiling face with eye contact). Perhaps the pairing of an ostensive signal with a novel cue is essential for infants to learn about cued events, as well as the function of the novel cue itself. Although this phenomenon has been documented during early childhood, there has yet to be a study testing whether young infants can also learn in this manner. A few infant studies, however, have shown that pairing familiar auditory social stimuli with unfamiliar auditory stimuli scaffolds learning from the latter. For example, infants are better at extracting statistical rules from sequences of non-social stimuli (e.g., tones) if they first heard those rules instantiated in social stimuli (i.e., speech; Marcus et al., 2007). Also, infants are better at word segmentation if the stimuli are presented with infant-directed rather than adult-directed speech (Thiessen et al., 2005). While these studies show that speech as a social stimulus can boost infants' learning, it is still unclear whether ostensive stimuli can help infants learn about novel visual attention cues and cued events.

We tested this hypothesis by presenting infants with a training and generalization paradigm that involved pairing a visual ostensive signal with a novel attention cue that successfully orients attention but does not produce learning about objects (Wu and Kirkham, 2010). The present eye-tracking study modified the paradigm from Wu and Kirkham (2010) with 8-month-olds. Across three experiments, infants' ability to learn from a novel attention cue (i.e., a red flashing square) following training with or without an ostensive signal was investigated. In the first experiment, infants were trained on the novel cue paired with an ostensive signal (Ostensive Signaling). In the second experiment, infants were given the same exposure to the novel cue in the absence of an ostensive signal (No Signaling). Given the large difference in stimulus presentation between including and omitting ostensive signals, Experiment 3 investigated whether including a stimulus on the screen that was social in nature but not ostensive could account for any benefits found in the Ostensive Signaling condition (Social Non-Ostensive Signaling).

In all three experiments, infants were familiarized with two identical dynamic multimodal objects in opposite corners of the screen and a flashing square that consistently cued the location of one of the objects. The cued familiarization trials were followed by test trials, in which infants heard the sound associated with the two objects without the appearance of the objects. Longer looking toward the previously cued location associated with the appropriate objects was taken as a measure of successful learning. Infants as young as 3 months of age succeed in this paradigm (e.g., Richardson and Kirkham, 2004; Wu and Kirkham, 2010; Kirkham et al., 2012). In the Training phase of each experiment, a different central stimulus preceded the cued events: (1) an engaging face smiling and speaking to the infant (Ostensive Signaling), (2) no central stimulus (No Signaling), or (3) two puppets speaking to each other (Social Non-Ostensive Signaling). The Generalization phases were identical in all three experiments, displaying only the flashing cue during the audio-visual events.

Can ostensive signals promote learning from novel cues that infants do not learn from otherwise? This study tested whether cued multimodal learning demonstrated during Test trials depended on the presence of ostensive signals during Training. Based on previous findings (e.g., Wu and Kirkham, 2010; Wu et al., 2011), we predicted that: (1) the presence of ostensive signals during Training would help infants learn to locate cued events during Test trials, (2) the presence of novel cues alone would not be sufficient for infants to show learning of the cued events, and (3) the presence of social non-ostensive signals during Training would not facilitate learning of the cued events, given their proposed lack of ability to enhance infants' learning as effectively as ostensive signals (e.g., Csibra and Gergely, 2009). Consistent with our hypothesis, we predicted that during the Generalization phase, infants trained with the ostensive signal preceding the novel cue would continue to show learning on test trials in the absence of the ostensive signal, in contrast to infants who were not exposed to this signaling.

To clarify, ostensive signals (e.g., infant-directed speech, eye-contact, smiling face) differ from social attention cues (e.g., eye gaze, head turn) because the latter directs infants' attention to a specific location. Novel attention cues in this paper refer to the flashing red square, as that was the only cue that directed attention in our study. We paired ostensive signals (that do not direct attention) with novel attention cues (that directed attention to a specific location) to investigate whether such pairing would allow infants to learn about cued events (as is the case with social attention cues, Wu and Kirkham, 2010). The social non-ostensive signal in this study refers to the muppet video used in Experiment 3.

Experiment 1: Ostensive Signaling

Experiment 1 investigated the role of ostensive signals in supporting learning from a novel attention-directing cue. A dynamic face stimulus was paired with a novel flashing cue in a multimodal spatial learning paradigm. Previous research has shown that infants at this age do not learn from this attention-directing cue alone (Wu and Kirkham, 2010).

Methods

Participants

Sixteen 8-month-old infants (5 girls, 11 boys, M = 8 months, 14 days, range: 7;24–9;12) participated in this experiment (e.g., Wu and Kirkham, 2010; Wu et al., 2011). One additional infant was excluded from analyses due to fussiness (i.e., completing only 1 out of 8 blocks). Infants were recruited via local-area advertisements and given t-shirts for participating.

Apparatus

Infants' looks were monitored using a Tobii 1750 eye-tracker. All dynamic stimuli were presented on the 17-inch monitor attached to the Tobii eye-tracking unit using Tobii's ClearView AVI presentation software with sounds played through stereo external speakers. The experimenter monitored whether infants were attending to the screen through an external video camera mounted on top of the Tobii screen. Infants' looks were recorded with the ClearView software. The animated object clips were created using Adobe Photoshop 7 and Macromedia Director MX 2004 (Richardson and Kirkham, 2004), and the live face clip was filmed using Macintosh iMovie (version 4.0.1). All movie clips were assembled using Final Cut Express HD 3 (Apple).

Stimuli and procedure

Infants sat in a car seat 60 cm from the Tobii system and eye-level to the center of the screen, while their caregivers sat behind them. A five-point infant calibration was used, and the experiment started after at least four points were correctly calibrated.

All infants were shown two sequences of stimuli: A Training phase followed by a Generalization phase. Within each phase, infants saw four blocks of stimuli, each of which consisted of six familiarization trials and two test trials (see Figure 1 for a schematic and examples of the stimuli).

FIGURE 1

Figure 1. Schematic of one block of familiarization and test trials from the Ostensive Signaling condition (Training and Generalization phases). The presentation of familiarization events was pseudo-randomized among infants (i.e., ABABBA or BABAAB), and test trial order was counterbalanced. All stimuli were in full color on a black background. The gray box around a frame represents a red flashing cue.

Training phase. The familiarization trials in the Training phase began with a 4-s video clip of an ostensive signal, which was presented in the center of the screen. The video subtended 2.86 × 4.29°, where approximately half of the scene comprised of the face. We used the initial ostensive stimulus from Wu and Kirkham (2010) and Wu et al. (2011) as the ostensive signal in this study: A female face looked at the infant, said, “Hi baby, look at this!,” and froze as a still image with a smile directed at the infant. Then, a pair of identical audio-visual objects appeared inside white square frames, which were located in diagonally opposing corners of the screen (e.g., lower right and upper left) and subtended 2.39 × 2.39°. Following the trial setup of Wu and Kirkham (2010), a red flashing square (i.e., the novel attention cue) appeared simultaneously with the audio-visual events surrounding the lower object. The face did not offer any directional information because it only spoke and looked out at the infant without turning or shifting gaze. The two identical multimodal objects and the frozen face remained on the screen until the end of the trial 5 s later. A stationary kaleidoscopic attention getter with ringing sounds played between each trial to re-engage the infant with the screen. Across the training phase, there were two different pairs of multimodal objects (e.g., two cats making bloop sounds, and two buses making whoosh sounds) with each pair appearing on three out of six familiarization trials per block. One pair appeared in the bottom left and top right frames on half of the trials, while the other pair appeared in the bottom right and top left frames on the other half of the trials.

After six familiarization trials, two test trials were presented. Test trials consisted of a blank screen containing only the empty white frames. During each test trial, the sound associated with a particular pair of objects played for 5 s (e.g., the bloop sound associated with the two cats), while the four white frames remained empty in the corners of the screen. After the test trials, the next block began. Infants saw four blocks of trials, each consisting of a succession of six familiarization trials and two test trials. The same two pairs of audio-visual events were shown for all four blocks within the Training phase. Presentation of the pairs was randomized within subjects, and pair locations were counterbalanced between subjects.

Generalization phase. The Generalization phase immediately followed the Training phase. No central cues were presented during familiarization trials in the Generalization phase, so they were 4 s shorter than the familiarization trials in the Training phase. The familiarization trials in the Generalization phase displayed two new audio-visual pairs (e.g., two ducks making brring sounds and two dogs making boing sounds), which were presented with a single red flashing square surrounding one of the two events on a given trial. The audio-visual events were counterbalanced between participants, such that half the infants saw the pairs of audio-visual animations during Training that the other half saw during Generalization. Mirroring the Training phase sequence, the Generalization phase each consisted of six familiarization trials followed by two test trials repeated over four blocks.

Data reduction and analysis

Data were acquired and analyzed using Tobii's ClearView software. Within each trial, two of four framed locations contained objects that were paired with a particular sound (a bottom corner and the opposite diagonal corner; see Figure 2). Across trials, two different locations were cued (bottom right or bottom left corner of the screen, depending on which pair of animations was present). Thus, four areas of interest (AOIs, see Figure 2) were manually delimited for all trials around the four corner frames. We measured the accumulated looking time within each of these locations for the 5 s during which audio-visual events were visible (or in the case of test trials, the corresponding 5 s during which accompanying sounds were played). The standard temporal filter of 100 ms and spatial filter of 30 pixels were used to define fixations. For each AOI, we reported the proportional looking time, which was calculated in each trial for every infant by dividing the total looking time in that AOI by the total looking time in all four AOIs.

FIGURE 2

Figure 2. Areas of interest (AOIs) delineated for familiarization and test trials. The four AOIs were identical in area.

Given our prediction of ostensive signals supporting learning from novel attention cues, we analyzed the two phases (Training and Generalization) separately. This allowed us to analyse the effect of training (differing only in the central stimulus prior to the learning events) on generalization across experiments. Within each phase we investigated looking behavior during the familiarization trials and the test trials separately. While the analysis of the familiarization trials allowed us to describe the distribution of attention in response to the presence of ostensive signals and flashing cues, the analyses of the test trials contribute the crucial evidence for cued learning. Infants had to integrate two sources of information during test trials: the location of multimodal objects, and the location of cued events. Accordingly, analyses examined the effects of Object (increased looking to the diagonally opposite locations that contained identical objects) and Cue (increased looking to the locations that were surrounded by red flashing square cues). The following outcomes were possible: (1) a significant Cue × Object interaction, indicating that infants learned about cued object-sound pairings, (2) a main effect of Object, such that infants looked more at both cued and non-cued locations of the objects paired with the corresponding sound, or (3) a main effect of Cue, showing that infants looked equally at both cued locations, independent of where the objects had appeared. The absence of any effects would indicate that infants distributed their looking equally to all four locations and did not learn from this paradigm. A significant Cue × Object interaction was followed up by a planned post-hoc t-test that compared looking to the Cued and Non-cued object locations.

Results

Ostensive Signaling condition: training phase

Familiarization trials. A 2-Way (Cued location × Object location) within-subjects ANOVA revealed main effects of Cue [F_{(1, 15)} = 27.98, p < 0.001, partial η² = 0.65] and Object [F_{(1, 15)} = 1578.23, p < 0.001, partial η² = 0.99], and a significant interaction between the two [F_{(1, 15)} = 28.44, p < 0.001, partial η² = 0.66]. As expected, infants followed the cue to the targeted object and spent more time looking at it than at the identical object in the diagonally opposite location, planned post-hoc: t₍₁₅₎ = 5.37, p < 0.001, Cohen's d = 2.56 (Table 1, Figure 3).

TABLE 1

Table 1. Mean proportional looking times during familiarization and test trials to four areas of interest (AOIs) in the Ostensive Signaling, No Signaling, and Social Non-Ostensive Signaling conditions.

FIGURE 3

Figure 3. Familiarization and test trials for all three conditions. All stimuli were in full color on a black background. The bar graphs display the results from the familiarization and test trials. *p < 0.03.

Test trials. A 2 (Cued location) × 2 (Object location) ANOVA yielded a trend toward a significant main effect of Cue [F_{(1, 15)} = 4.19, p = 0.06, partial η² = 0.22], no main effect of Object [F_{(1, 15)} = 2.18, p = 0.16, partial η² = 0.13], and a significant Cue × Object interaction [F_{(1, 15)} = 9.11, p = 0.01, partial η² = 0.38]. Based on the significant interaction, planned post-hoc comparisons [t₍₁₅₎ = 3.37, p = 0.004, Cohen's d = 1.39] revealed that infants looked longer to the correct object location that had previously been cued during test trials.

Ostensive Signaling condition: generalization phase

Familiarization trials. A 2 (Cued location) × 2 (Object location) ANOVA yielded main effects of Cue [F_{(1, 15)} = 26.28, p < 0.001, partial η² = 0.64], and Object [F_{(1, 15)} = 2629.98, p < 0.001, partial η² = 0.99], and a significant interaction between the two [F_{(1, 15)} = 24.44, p < 0.001, partial η² = 0.62]. Again, as expected, infants followed the cues to the targeted object, looking longer at it than at the identical object in the diagonally opposite location, planned post-hoc Cued Object vs. Non-cued Object: t₍₁₅₎ = 5.09, p < 0.001, Cohen's d = 2.13.

Test trials. A 2 (Cued location) × 2 (Object location) ANOVA yielded a trend toward a significant main effect of Cue [F_{(1, 15)} = 3.39, p = 0.086, partial η² = 0.18], a significant main effect of Object [F_{(1, 15)} = 9.64, p = 0.007, partial η² = 0.39], and a significant Cue × Object interaction [F_{(1, 15)} = 6.09, p = 0.03, partial η² = 0.29]. Infants continued to look longer at the correct object location that had previously been cued during familiarization compared to the non-cued correct object, planned post-hoc: t₍₁₅₎ = 2.43, p = 0.03, Cohen's d = 1.04.

Discussion

Infants' attention was successfully directed to the cued audio-visual event during the familiarization trials in both the Training and the Generalization phases. During test trials in the Training phase, when infants were presented with four blank white frames and the sound from one of the audio-visual pairs, infants looked to the appropriate cued locations. They associated the correct sound with the cued locations where the corresponding objects had previously appeared. This result extends previous findings showing the same type of learning with ostensive signals paired with social attention cues (head turn and gaze shift; Wu and Kirkham, 2010). Infants' performance during test trials in the Generalization phase was also consistent with our hypothesis. Infants looked to the cued correct location that had been previously associated with the presented sound. In Wu and Kirkham (2010) infants of a similar age did not learn the multimodal pairing when presented with only the red flashing square as a cue. Therefore, we suggest that the addition of the Training phase that paired the ostensive signal with the flashing cue could have supported learning from the flashing cue during both the Training and Generalization phases. There is, however, an alternative hypothesis: Perhaps just extended exposure to the red flashing square cue could have supported learning at least by the Generalization phase, and the preceding ostensive signal was not necessary for specific multimodal learning. The following No Signaling experiment was undertaken to investigate this alternative hypothesis.

Experiment 2: No Signaling

In Experiment 2, a new group of infants were presented with identical stimuli as in Experiment 1, with one critical difference—the absence of ostensive signals during the Training Phase. Thus, in this experiment, infants saw two similar Training and Generalization phases. This experiment tested the alternative hypothesis that extended exposure to the novel cue is sufficient for infants to learn about the audio-visual events. It is possible that although the ostensive signal can support learning from the flashing squares, mere extended exposure to the novel attention cue could also support this learning.