The influence of scene context on object recognition is independent of attentional focus

Munneke, Jaap; Brentari, Valentina; Peelen, Marius

doi:10.3389/fpsyg.2013.00552

ORIGINAL RESEARCH article

Front. Psychol., 20 August 2013

Sec. Perception Science

Volume 4 - 2013 | https://doi.org/10.3389/fpsyg.2013.00552

The influence of scene context on object recognition is independent of attentional focus

JM
Jaap Munneke ^1,2^*
VB
Valentina Brentari ¹
MV
Marius V. Peelen ¹

1. Center for Mind/Brain Sciences, University of Trento Trento, Italy
2. Department of Cognitive Psychology, Vrije Universiteit Amsterdam, Netherlands

Abstract

Humans can quickly and accurately recognize objects within briefly presented natural scenes. Previous work has provided evidence that scene context contributes to this process, demonstrating improved naming of objects that were presented in semantically consistent scenes (e.g., a sandcastle on a beach) relative to semantically inconsistent scenes (e.g., a sandcastle on a football field). The current study was aimed at investigating which processes underlie the scene consistency effect. Specifically, we tested: (1) whether the effect is due to increased visual feature and/or shape overlap for consistent relative to inconsistent scene-object pairs; and (2) whether the effect is mediated by attention to the background scene. Experiment 1 replicated the scene consistency effect of a previous report (Davenport and Potter, 2004). Using a new, carefully controlled stimulus set, Experiment 2 showed that the scene consistency effect could not be explained by low-level feature or shape overlap between scenes and target objects. Experiments 3a and 3b investigated whether focused attention modulates the scene consistency effect. By using a location cueing manipulation, participants were correctly informed about the location of the target object on a proportion of trials, allowing focused attention to be deployed toward the target object. Importantly, the effect of scene consistency on target object recognition was independent of spatial attention, and was observed both when attention was focused on the target object and when attention was focused on the background scene. These results indicate that a semantically consistent scene context benefits object recognition independently of the focus of attention. We suggest that the scene consistency effect is primarily driven by global scene properties, or “scene gist”, that can be processed with minimal attentional resources.

INTRODUCTION

The human visual system is extraordinarily adept at detecting, categorizing, and naming objects embedded in natural scenes. The properties of this ability have been studied extensively (Henderson and Hollingworth, 1999; Bar, 2004; Torralba et al., 2006; Fabre-Thorpe, 2011; Wolfe et al., 2011). Many objects are usually found in specific contexts: a car is found on a road, a deer in a forest. Prior research has shown that the availability of scene context (i.e., a semantically consistent background) facilitates the detection and recognition of objects within that scene (Biederman et al., 1982; De Graef et al., 1990; Joubert et al., 2007; but see Hollingworth and Henderson, 1998). However, the precise mechanisms responsible for this facilitative effect remain elusive. A better understanding of these mechanisms is crucial for gaining further insight into how objects and scenes are interactively processed by the visual system.

Convincing evidence for the influence of scene context on object processing was provided by studies that manipulated the semantic consistency of target objects and the natural scenes they were presented in. Such studies present target objects in either semantically consistent (e.g., a microwave in a kitchen) or semantically inconsistent (e.g., a microwave in a forest) natural scenes. Effects of semantic consistency between scene and object stimuli have been found using eye movement measurements (Loftus and Mackworth, 1978; Henderson et al., 1999; Brockmole and Henderson, 2008; Vo and Henderson, 2009, 2011), behavioral measurements (Davenport and Potter, 2004; Joubert et al., 2007; Fize et al., 2011), and electrophysiological measures (Ganis and Kutas, 2003; Mudrik et al., 2010). Furthermore, effects of scene context on object processing have been reported for multiple levels of object processing, ranging from the rapid detection of superordinate object categories (e.g., animals; Fize et al., 2011) to the naming of objects at the subordinate level (Davenport and Potter, 2004). These tasks differ in many ways. For example, animal detection likely involves the matching of incoming visual information to an attentional “template” of animal-diagnostic shape features to inform a quick present/absent decision (Duncan and Humphreys, 1989; Treisman, 2006). By contrast, in object naming tasks, the to-be-named object is not known before stimulus onset, and successful task performance relies on detailed recognition of the object (e.g., recognizing a person as a priest; Davenport and Potter, 2004). Given these differences, it is plausible that scene context affects these tasks in different ways. In the present study, we focus on the effect of scene context on the naming of objects.

Important evidence for the facilitative effect of scene context on object naming comes from a study by Davenport and Potter (2004). In their study, participants were presented with natural scenes comprised of a background scene with a foreground object pasted into it. The object could be semantically related to the scene background (e.g., a priest in a church) or could lack this semantic relationship (e.g., a priest on a football field). The scene containing the object was presented for a brief duration (80 ms), ensuring that any effects observed were not due to eye movements. At the end of each trial, participants were asked to type in the name of the perceived foreground object, with the background scene being irrelevant to the task. The results of the study showed that participants responded more accurately to an object when presented in a semantically consistent scene compared to a semantically inconsistent scene, showing that, despite being task irrelevant, the background was processed to an extent such that it influenced processing of the target object. The first aim of the current study was to replicate the findings of Davenport and Potter (2004) to establish the reliability of these findings, and to test whether they generalize to another language and population (Italian).

Despite the convincing results of Davenport and Potter (2004; which were successfully replicated in the current study), it is possible that the effect observed was not due to semantic influences of scene context, but rather to differential overlap of low-level visual features (such as color) and/or object shape between the background scene and the foreground object. For example, when a sandcastle (object) is presented onto a beach background (scene), object and background share a number of low-level features such as color and texture. In contrast, a scene consisting of a sandcastle presented on a field of grass does not contain this overlap in low-level features (see Figure 1A for additional examples). Note that this concern does not apply to the same degree to earlier studies addressing related questions using line drawings (e.g., Biederman et al., 1982). The goal of the current Experiment 2 was to rule out influences of differential visual and shape overlap and thus to provide a more stringent test of whether the semantic consistency of background scene and foreground object influences object naming.

FIGURE 1

A final aim of the current study was to investigate the influence of attentional focus on the scene consistency effect. In the study by Davenport and Potter (2004), and in the current Experiment 1, target objects were presented close to the center of the screen. However, their precise location was not known before stimulus onset. Therefore, participants would have initially attended the background scene while locating the target object. In Experiment 3, we tested whether attentive processing of the background scene is required for the scene consistency effect to emerge. Testing the influence of attention on the scene consistency effect could provide information about the types of scene properties that drive the scene consistency effect. For example, the processing of global scene statistics has been shown to be independent of attentional resources, unlike the identification of more detailed scene properties such as other objects in the scene (Ariely, 2001; Chong and Treisman, 2003).

EXPERIMENT 1

The aim of this experiment was to replicate the consistency effect reported by Davenport and Potter (2004), using their original stimuli. As we employed the same experimental set-up and design, we expected to find that participants would recognize objects with a higher accuracy when presented on a semantically consistent background, compared to when the same object was presented on a semantically inconsistent background.

MATERIALS AND METHODS

Participants

Twelve participants took part in this experiment. Participants’ age ranged from 24 to 35 years old (mean ± SD= 27.8 ± 3.19 years; one male). All participants had normal or corrected-to-normal vision. Written informed consent was obtained prior to the start of the experiment. Participants were rewarded with course credit or a monetary reward.

Stimuli

Experiment 1 utilized the original stimulus set used by Davenport and Potter (2004) along with the same experimental design. Participants were presented with 28 images of natural scenes containing an object pasted into the foreground, in such a way that both the object and the background were clearly visible. On half the trials the natural scene contained an object that was semantically consistent with its background whereas the other half of the trials showed an object inconsistent with its background (see Figures 1A,B). Consistency of the scene-object pairing was counterbalanced over participants in such a way that half the participants would see a certain target object in a semantically consistent setting, whereas the other half of the participants would see the same object in a semantically inconsistent setting. All scenes were presented in the center of the screen and subtended a visual angle of 17.64° by 10.54°. Size and location of the target objects varied over the different natural scenes [average horizontal × vertical dimensions: 7.93° (SD = 3.64) × 7.35° (SD = 1.84)], but the foreground object was always clearly distinguishable as the target object.

Procedure

Participants were seated in a dimly lit room at approximately 60 cm from the computer monitor. All stimuli were presented on a 19″ CRT monitor. Figure 2 shows the time course of a typical trial.

FIGURE 2

The participants started each new trial manually by pressing any key. Once the trial started, participants were presented with a fixation-cross (300 ms) followed by a blank period (200 ms). After the blank period, the natural scene containing the object appeared (80 ms) immediately followed by a mask (200 ms). Masks consisted of a 4-by-5 grid of pieces of a random set of cut-up scenes that were never used as stimuli. Following the mask, the Italian word for “answer” (risposta) would appear and participants typed in the name of the target object. Participants were instructed to be as specific as possible when naming the object (e.g., “priest” rather than “person”). Responses were unspeeded and only checked for accuracy. Prior to the experiment participants performed six practice trials, using scenes and objects that were not used in the main experiment. Different from the original Davenport and Potter (2004) study, the current study was performed in the Italian language (as opposed to English).

RESULTS AND DISCUSSION

All answers provided by the participants were checked for accuracy by five independent raters who only saw the presented object, but not the background, ensuring no influence of semantic background in their ratings. Raters were instructed that only specific names should be considered correct (e.g., for the example in Figure 1A, “priest”, “pastor”, or “clergyman” would all be correct, whereas “person” would be incorrect). If three or more raters concluded that a given answer was correct, the answer was deemed as being correct. Figure 3 shows the average percentage of correct responses for both consistent and inconsistent trials. A paired-samples t-test shows that participants responded more accurately on consistent then on inconsistent trials [77.4% vs. 56.0%; t(11) = 4.377, p = 0.001].

FIGURE 3

The results of Experiment 1 replicated the findings observed by Davenport and Potter (2004) and show that they generalize to another population and to naming objects in another language than English (Italian). When participants attempted to recognize an object in a briefly presented scene, the semantic consistency of the background affected perception of the sought-after object in such a way that a semantically consistent background led to higher accuracy in recognizing the target objects, even though the background was not directly relevant to the task.

EXPERIMENT 2

Experiment 1 showed that scene context influences object processing. It is not clear from these results, however, which parts or properties of the background scene are responsible for the observed consistency effect. As outlined in the introduction, one possible reason for the consistency effect in Experiment 1 could be the greater overlap in shape and low-level features, such as color, between the consistent scene and the target object, as illustrated in Figure 1A. Experiment 2 was designed to investigate this possibility. To do so, a new stimulus set was created, again consisting of backgrounds with target objects pasted in the foreground. In order to control for the influence of overlap in color, all stimuli were converted to gray-scale. Additionally, objects and scenes were chosen in such a way that semantically consistent and inconsistent objects in a given scene shared an overall similar shape (see Figure 4). Finally, in order to enhance the influence of scene gist on object recognition, the inconsistent condition always consisted of an indoor scene paired with an outdoor object or an outdoor scene paired with an indoor object.

FIGURE 4