Visual Search of Mooney Faces

Goold, Jessica E.; Meng, Ming

doi:10.3389/fpsyg.2016.00155

ORIGINAL RESEARCH article

Front. Psychol., 12 February 2016

Sec. Perception Science

Volume 7 - 2016 | https://doi.org/10.3389/fpsyg.2016.00155

Visual Search of Mooney Faces

Jessica E. Goold

Ming Meng^*

Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA

Faces spontaneously capture attention. However, which special attributes of a face underlie this effect is unclear. To address this question, we investigate how gist information, specific visual properties and differing amounts of experience with faces affect the time required to detect a face. Three visual search experiments were conducted investigating the rapidness of human observers to detect Mooney face images. Mooney images are two-toned, ambiguous images. They were used in order to have stimuli that maintain gist information but limit low-level image properties. Results from the experiments show: (1) Although upright Mooney faces were searched inefficiently, they were detected more rapidly than inverted Mooney face targets, demonstrating the important role of gist information in guiding attention toward a face. (2) Several specific Mooney face identities were searched efficiently while others were not, suggesting the involvement of specific visual properties in face detection. (3) By providing participants with unambiguous gray-scale versions of the Mooney face targets prior to the visual search task, the targets were detected significantly more efficiently, suggesting that prior experience with Mooney faces improves the ability to extract gist information for rapid face detection. However, a week of training with Mooney face categorization did not lead to even more efficient visual search of Mooney face targets. In summary, these results reveal that specific local image properties cannot account for how faces capture attention. On the other hand, gist information alone cannot account for how faces capture attention either. Prior experience facilitates the effect of gist on visual search of faces; making faces a special object category for guiding attention.

Introduction

Faces capture our attention. Humans are able to saccade toward a face in as little as 100 ms, whereas it is difficult to saccade away from faces (Crouzet et al., 2010). Faces are also efficiently detected in visual search tasks (Hershler and Hochstein, 2005; Williams et al., 2005; cf. VanRullen, 2006; Doi and Ueda, 2007). It has been postulated that emergent properties in a scene are perceived before more intricate details are processed (Doi and Ueda, 2007). Because faces are detected so quickly, they may contain an emergent property that guides our attention in order to process informative social cues. Previous behavioral testing has demonstrated that humans can correctly detect a face in a scene displayed for as briefly as 12 ms, too quick for attention to be allocated to a specific location (Graham and Meng, 2011). Also, electrophysiological studies reveal that selective brain responses to faces occur at 100ms or less (Crouzet et al., 2010; Cauchoix et al., 2014). Prior research investigating which properties of a face capture our attention has focused on facial expressions (Williams et al., 2005) and direction of eye gaze (Doi and Ueda, 2007) amongst distractors with neutral facial expression or opposing eye gaze. However, category information has also been shown to guide attention (Yang and Zelinsky, 2009). Which properties of the category ‘face’ guide our attention remains highly controversial (Li et al., 2002; Rousselet et al., 2003; Evans and Treisman, 2005; Hershler and Hochstein, 2005; VanRullen, 2006; Palermo and Rhodes, 2007; Rossion and Caharel, 2011). Here we specifically evaluate the effects of gist information, individual features and amount of prior experience with the target faces on efficiency of detection.

The reverse-hierarchical theory of visual processing proposes that overall gist information is processed pre-attentively (Hochstein and Ahissar, 2002). Gist is considered to be the meaningful information one can extract in an instant (Oliva, 2005; Loschky and Larson, 2008) and this information guides attention to emergent properties of an image for further scrutiny (Hochstein and Ahissar, 2002). Mooney images (Mooney, 1957) are two-toned, ambiguous images made my manipulated a gray-scale image (Figure 1A). Although visually degenerated, upright Mooney faces share the same gist and configural information with normal face pictures. Mooney images are also controlled for low-level features and experience, making them an ideal candidate to investigate the role of gist information on face detection. Gist information in upright face images has been shown to be important in face detection. For instance, when a face target is inverted or scrambled, disrupting the gist of a face, search efficiency is destroyed and neural responses in face responsive areas are diminished (Brown et al., 1997; Hershler and Hochstein, 2005). Also, using continuous flash suppression (i.e., a flashing Mondrian pattern is presented to one eye, and a static image is presented to the other eye, causing a suppression effect of the static image), upright faces break through suppression faster than inverted faces (Jiang et al., 2007). Developmental research has further presented evidence that newborns attend to upright face patterns more than their inverted counterpart, suggesting an innate preference for the gist of a face (Morton and Johnson, 1991; Nelson, 2001). It has been hypothesized that face detection may occur through an innate and automatically faster subcortical route (Johnson, 2005). If this is the case, the gist of faces, which includes both social and emotion information, may be rapidly processed through the subcortical pathway. Thus, rapidness of face detection should then be independent of details of specific features.

FIGURE 1

FIGURE 1. Examples of Mooney Stimuli and Paradigm. (A) Top row: Examples of Mooney face targets both upright (first three) and inverted (last three). Bottom row: Examples of distractor stimuli. (B) An example of a visual display containing a Mooney face target (at the top position) and 5 Mooney, non-face distractors.

However, the effects of image-level visual properties, such as spatial frequency and skin color, have also been implicated in affecting the efficiency of face detection. For example, VanRullen (2006) manipulated the amplitude spectrum of face images by replacing them with the amplitude spectrum of car images and destroyed search efficiency for faces, suggesting that the amplitude spectrum of the face underlies pre-attentive processing. It has also been reported that EEG activity correlating with image-level properties, such as face size, could be used to accurately categorize visual stimuli as faces within 94 ms of stimulus onset (Cauchoix et al., 2014). This suggests that individual feature information may be involved in guiding attention to faces for fast processing. Investigating visual search of Mooney faces would allow us to tease apart possible effects of gist information and individual features. If it is the gist information in a face that captures our attention, we should find efficient detection in Mooney face images regardless of manipulations to any residual low-level features.

Using Mooney images also allows us to examine how prior experience may modulate effects of gist information and individual features in rapid face detection. Recognition of Mooney images is known to be heavily modulated by top–down effects of prior experience (Dolan et al., 1997; Hsieh et al., 2010; Gorlin et al., 2012). The influences of being social animals and the tremendous amount of experience humans have with faces have been proposed to underlie the attention grabbing nature of faces (Diamond and Carey, 1986; Gauthier et al., 2000). Based on this hypothesis, it is expected that all categories of which an individual is an expert should have similar processing advantages to faces. Indeed, behavioral and neural effects similar to those found for faces have been found for objects of expertise. Diamond and Carey (1986) found that dog show judges had an inversion effect for dog breed recognition. Moreover, the fusiform face area (Kanwisher et al., 1997), an area of the lateral fusiform gyrus which responds to face stimuli more than other tested non-face stimuli, has been reported to positively respond to categories of expertise (Gauthier et al., 1999). However, it is not clear how visual experience may shape face processing (Le Grand et al., 2001a,b; Fine et al., 2003; Ostrovsky et al., 2006; Lorenzino and Caudek, 2015). Whereas perceptual learning of feature conjunctions is possible (Wang et al., 1994; Carrasco et al., 1998), large amounts of visual experience and eventually expertise with faces may also underlie efficient face detection and rapid face processing by enhancing the extraction of gist information from Mooney face images.

In summary, what properties of a face capture attention remains unclear. To address this question, here we conducted a series of three visual search experiments. Visual search is a classic psychophysical paradigm for investigating visual attention. A search is considered efficient when a target is detected independently of the number of distractors in the display. If a target is searched efficiently, it captures our attention (Treisman and Gelade, 1980). It has been postulated that efficient search is invoked when there is a single-feature difference between target and distractors. However, face images are searched very efficiently, despite the absence of a clear, distinctive single-feature difference between faces and non-face objects (Hershler and Hochstein, 2005; Yang et al., 2011). We further combined visual search with Mooney images. Using Mooney images allows for control of low-level features and experience while maintaining gist information, making it an advantageous tool for investigating the effect of gist on guiding attention. Moreover, based merely on local features, recognizing the object content in Mooney images is impossible. Therefore, holistic processing is necessary for recognizing Mooney faces (McKone, 2004; Farzin et al., 2009). If Mooney faces were searched efficiently, it would suggest that holistic, gist information of a face is enough to guide attention. On the other hand, if observers rely on image-level visual features to rapidly detect faces, searching for a Mooney face among non-face Mooney images would not be efficient. And lastly, if observers rely on conceptual knowledge and experience to rapidly detect faces, all searches would be inefficient unless prior information about the target was provided.

Experiment 1

Methods

Participants

Twenty-eight (18 female) students from Dartmouth College volunteered to participate in Experiment 1. All participants gave written, informed consent and had normal or corrected to normal visual acuity. All participants received course credit or were compensated for their time. Sample sizes were chosen in order to be comparable with that of other similar visual search studies (Wolfe, 1998; Tong and Nakayama, 1999). These procedures were approved by the Committee for the Protection of Human Subjects at Dartmouth College and conducted in accordance with the 1964 Declaration of Helsinki.

Materials and Procedure

A set of 50 gray-scale face images and 100 gray-scale non-face images were transformed into Mooney images for the experiment. The face images consisted of frontward facing, male and female faces, cropped to exclude hair and ears. The non-face images were cropped parts of scenes and objects. To create Mooney images, MATLAB with SHINE toolbox was used (Willenbockel et al., 2010). First, the median luminance of each gray-scale image was found. Next, the images were manipulated such that all of the pixels in the image with the median luminance value or higher were changed to white, and all of the pixels in the image with luminance values lower than the median were changed to black.

The experiment was coded using MATLAB with Psychtoolbox on a 21-inch Dell P1130 CRT monitor with a refresh rate of 85 Hz and spatial resolution of 1280 × 1024 pixels (Brainard, 1997). The visual search display was similar to a previously published design (Tong and Nakayama, 1999), with a black fixation cross in the center of a gray screen and 2, 4, or 6 images, positioned angularly around the fixation point at 30, 90, 150, 210, 270, and 330° (see Figure 1B). For trials with less than 6 stimuli, positions were randomly selected among the six options. Each image was ∼5° from the fixation point and subtended ∼5° of visual angle.

In a randomized mixed design, each participant was tested with 2400 trials composed of 1200 target-present trials (600 upright face targets and 600 inverted face targets) and 1200 target-absent trials (600 upright distractor image displays and 600 inverted distractor image displays). Each condition also had an equal number of trials with 2, 4, or 6 images in the display array. In the target-present trials, the search target was randomly chosen from the 50 Mooney face images (see examples in Figure 1A top row) and distractors were randomly chosen from the 100 Mooney non-face images (see examples in Figure 1A bottom row). In the target-absent trials, all images were randomly chosen from the 100 Mooney non-face images. No image was presented more than once in the same trial. Target and distractors were presented upright in the upright condition, and upside-down in the inverted condition. Participants were instructed to maintain fixation at the center of the screen and search for a face in each trial; pressing the ‘F’ key if a face was present, and the ‘J’ key if there was no face as quickly as possible while maintaining high accuracy. A tone played if an incorrect response was made. Trials ended following the participant’s response and instructions appeared asking to press the spacebar for the next trial in order to minimize possible position aftereffects from the previous trial. This also gave participants a chance to take a break after any trial if needed. Every 600 trials the experiment stopped and participants had to take a break before they could begin another 600 trials.

Data Analysis

Accuracy rates for each condition were computed to examine the possibility of speed-accuracy trade-offs. Data analysis focused on the RTs of correctly answered trials. The trials containing the slowest 2.5% of RTs as well as the quickest 2.5% of RTs were trimmed off to exclude outliers. A three-way ANOVA was conducted with set size, inversion, and target presence as within-subject factors.

Results

Accuracy rates ranged from 85.8 to 93.5% correct across all conditions. Overall, the averaged accuracy rate across subjects for upright trials (92.2%) was greater than inverted trials (88.5%), with no evidence of speed–accuracy trade-offs. For the correct trials, averaged RTs by set size for each condition are plotted in Figure 2. The three-way ANOVA revealed significantly faster RTs for upright than inverted trials (black lines vs. gray lines: F(1,27) = 75.17, p < 10^-10, $η_{p}^{2}$ = 0.74), while target-present trials were significantly faster than target-absent trials (solid lines vs. dotted lines: F(1,27) = 43.91, p < 10^-6, $η_{p}^{2}$ = 0.62). The effect of set size was also highly significant [F(2,54) = 73.84, p < 10^-11, $η_{p}^{2}$ = 0.73], showing that the Mooney face targets were not searched efficiently. Significant interactions were found between inversion and target presence [F(1,27) = 49.29, p < 10^-7, $η_{p}^{2}$ = 0.65], set size and target presence [F(2,54) = 37.31, p < 10^-11, $η_{p}^{2}$ = 0.58] and inversion and set size [F(2,54) = 17.34, p < 10^-5, $η_{p}^{2}$ = 0.39]. The three-way interaction between inversion, set size and target presence was not significant [F(2,54) = 1.06, p = 0.35, $η_{p}^{2}$ = 0.04].

FIGURE 2

FIGURE 2. Mean reaction times as a function of search array set size in Experiment 1. Upright Mooney faces were searched more efficiently than inverted Mooney faces. Black lines = upright conditions; Gray lines = inverted conditions; Solid lines, target-present; dash lines, target-absent. Search reaction time slopes for each condition are shown in ms/item to the right of the corresponding lines. Error bars represent ± 1 SEM.

These results demonstrate that gist information contributes significantly to rapid face detection but does not fully explain how faces capture attention. Upright Mooney face targets were detected more rapidly (635 vs. 672 ms at set size 2) and more efficiently than inverted search targets (45 ms/item vs. 73 ms/item). However, upright Mooney faces were detected with a significant main effect of set size (the black, solid line in Figure 2 is not flat), suggesting the involvement of attention. Indeed, the search speed for Mooney face stimuli is less efficient than previous reports from a study using intact face pictures as search targets (Hershler and Hochstein, 2005). Given that image-level features were equalized to a great extent in Mooney images, it is possible that the presence of features specific to different intact face pictures may underlie faster detection resulting in efficient search in the previous study. If that were the case, some residual, non-equalized features in certain Mooney faces could then potentially enable them to be searched more efficiently than the others. To test this possibility, in Experiment 2 we used a block design with an individual Mooney face target for each block. If search efficiency were different for different Mooney face targets, it would suggest that specific individual-level features guide attention to enhance search efficiency. However, if all upright faces were searched with equal efficiency, it would suggest that those individual-level features are not used to rapidly differentiate face/non-face, since those features would not aid in search speed.