Neuroscience Institute, Princeton University, Princeton, NJ, USA
In general, individuals look where they attend and next intend to act. Many animals, including our own species, use observed gaze as a deictic (“pointing”) cue to guide behavior. Among humans, these responses are reflexive and pervasive: they arise within a fraction of a second, act independently of task relevance, and appear to undergird our initial development of language and theory of mind. Human and nonhuman animals appear to share basic gaze-following behaviors, suggesting the foundations of human social cognition may also be present in nonhuman brains.
When we need to know what another individual thinks, we look to their eyes. In so doing, we learn not just about their visual focus, but also make inferences about their private thoughts and intentions, and about the messages they explicitly communicate to others. It is likely that distinct neural systems have evolved to process two crucial types of gaze information: direct and deictic (“pointing”) gaze.
Direct gaze is associated with predation and with the likelihood that an individual will approach or engage the observer: Because direct gaze is an unambiguous stimulus with tremendous evolutionary significance, neural responses are relatively automatic (von Grunau and Anston, 1995
; Senju and Hasegawa, 2005
), innate (Batki et al., 2000
; Farroni et al., 2002
; see also Grossmann et al., 2007
), and mediated by evolutionarily conserved subcortical systems (Sewards and Sewards, 2002
; Senju and Johnson, 2009
).
By contrast, if observed gaze is averted, its direction is primarily relevant to individuals adapted to life in a social group. Deictic gaze indicates spatial attention, suggests future actions, and defines the target of facial signals (van Hoof, 1967
; Argyle and Cook, 1976
). Our ability to attend the same thing as an observed individual appears to be a foundation for more sophisticated social skills such as a theory of mind (Baron-Cohen, 1994
; Gomez, 2009
); conversely, humans who are unable to or uninterested in sharing attention are understood to suffer symptoms of the autism spectrum disorders (APA, 1994
).
Gaze-following behavior sits at the intersection of several major strains of scientific research, and has thus been reviewed through several lenses over the past two decades: ethological (Emery, 2000
; Itakura, 2004
; Emery and Clayton, 2009
; Rosati and Hare, 2009
), psychological (Frischen et al., 2007
), developmental and clinical (Nation and Penny, 2008
), and neuroscientific (Nummenmaa and Calder, 2009
). I’ll draw across these disciplines to highlight how behavioral research on gaze following informs our understanding of the neural mechanisms of social interaction.
To understand what another individual sees, we must perceive and interpret their body, head and eye posture, extract from these their gaze direction, and then covertly imagine or overtly mimic their perspective and thus relate their observable, physical point-of-view to their private, internal mental state. In this review, I argue that gaze following is foundational to this ability, is present early in development and across many species, and relies upon similar neural systems in humans and related nonhuman animals.
Adult humans attend where others attend, and use gestures and gaze to manipulate the attention of others. At their pinnacle, these skills evoke mutual awareness of shared mental states (Emery, 2000
); at their base, they are founded on a reflexive tendency to follow the gaze of others. The two essential features that describe adult use of deictic gaze are sophistication and automaticity. Typical human adults understand that gaze both constrains what another can see and signals what in the visual field they find most relevant. Our tendency to follow gaze reflects our understanding of one another’s point of view – our understanding of the significance of spatial relationships, our recognition that people can look toward things outside our field of view and that we can see things outside theirs, and our awareness that gaze reflects the mental state of the gazer. Furthermore, we understand that gaze interacts with other communicative signals, and can have explicitly communicative (“ostentive”) significance, either to inform or mislead. For all these reasons, we use gaze to inform our judgments both about our shared environment and also about the person whose gaze we observe (e.g. Bayliss and Tipper, 2006b
; Bayliss et al., 2007
). Nonetheless, we process all this so easily that we barely notice the effort – an automaticity that can be used against us, for example, in the no-look passes of ballgames or magic tricks of illusionists (Kuhn et al., 2009
).
Not only does gaze comprise an important communicative channel among typical adults, it appears to strongly influence our early development (Figure 1
). The first study of gaze following by infants reported onset at 2 months, with steep increases in frequency between 7 and 11 months (Scaife and Bruner, 1975
). Later work pushed this back slightly (reviewed, Butterworth, 1991
), suggesting infants first look in the direction of gaze by 6 months, toward target objects by 12 months, and geometrically to objects beyond their immediate view at 18 months (but c.f. Moll and Tomasello, 2004
). Geometrical gaze following is particularly intriguing, because it implies successful generalization between allocentric and egocentric space. At this point – starting a little over 1 year of age – infants preferentially follow the gaze of individuals whose eyes are open and uncovered (Brooks and Meltzoff, 2002
, 2005
) and who have looked toward interesting things (Chow et al., 2008
). Likewise, year-old infants actively manipulate attention via pointing hand gestures (Brooks and Meltzoff, 2002
; Liszkowski et al., 2004
), generally in tandem with either eye contact or deictic gaze.
Figure 1. Ontogeny of primate gaze following. Humans, apes, and monkeys are sensitive to direct gaze at, or soon after, birth. However, their understanding of deictic gaze develops during childhood. Human gaze following arises early in life, with responses to turned heads and averted eyes arising between 2–6 months; gaze following at 10–12 months predicts language acquisition over the next year. Near the 1-year mark, human gaze following becomes more sophisticated: it is contingent on cue’s eyes being open at 11 months, and on the cue having recently looked at interesting things by 14 months; by 18 months, humans follow gaze geometrically to regions beyond their immediate line of sight. By contrast, much less is known about the development of nonhuman gaze following. Apes and monkeys both appear more sensitive to head direction than to eyes. Both habituate to misleading gaze cues during adolescence, and as adults, follow gaze geometrically and from eye cues; the precise onset of these abilities is uncertain.
Curiously, while humans follow gaze within a year after birth, explicit discrimination of gaze direction arises only in the third year, and remains imprecise for years thereafter (Doherty et al., 2009
). This finding suggests that gaze behaviors may involve multiple substrates with distinct developmental time courses: at minimum a fast-developing pathway for reflexive gaze-following responses and a slower-developing pathway for cognitive gaze comprehension.
Joint attention abilities at 10 months predict the rapidity of subsequent language acquisition (Brooks and Meltzoff, 2005
, 2008
; Mundy et al., 2007
; though c.f. Tomasello and Farrar, 1986
). Conversely, poor joint attention skills predict the severity of impairment from autism spectrum disorders (see also Klin et al., 2002b
; Charman, 2003
; Nation and Penny, 2008
). Some developmental evidence suggests that initiation of and response to joint attention are separable processes which make independent contributions to social and language development (see Mundy et al., 2007
) (note also Brooks and Meltzoff, 2008
; Wellman et al., 2008
; Pyers and Senghas, 2009
). This developmental heterogeneity likely includes independent contributions of social perceptions, reflexes, and motivations to the development of social attention, learning, behavior and cognition (see Klein et al., 2009
). If different differently-developing mechanisms support orienting responses and mentalistic interpretations evoked by gaze, these mechanisms may also differ across phylogeny, and may be detectable and dissectible through psychophysical testing of behavior.
Gaze Following by Nonhuman Species
There is a certain inherent difficulty in generalizing deictic social cues across species. Among humans, cues are readily interpreted and categorized: heads, especially eyes, point toward attended regions; bodies, especially hands, point in the direction of intended movement or action. But human hands and eyes are both rather unique. Humans have a developed sense of vision and distinctive eyes: each has a small, single, circular, well-defined fovea, and is pigmented so as to be easily readably by others (Kobayashi and Kohshima, 2001
). Human hands are similarly specialized – freed from locomotor constraints by our bipedal stance, we use them to the exclusion of other more typical effectors such as the mouth. In many animals, quite different perceptual and motor interfaces rule – consider that unlike a human, the robin turns its head aside the better to see the worm. The deictic social cues that apply to other species may not be readily apparent to us, nor ours to them.
Nonetheless, group-living animals must coordinate their movements with their group-mates, and predators must likewise coordinate with the movements of their prey. It would be surprising if these processes occurred without some minimal awareness and attention to the intended movements of others (Figure 2
). Most research, however, has examined the ability of animals not to understand one another’s gaze, but to interpret our own. After some debate, it now seems likely that apes and monkeys, at least, will shift their attention in response to human gaze and gesture (chimps, Povinelli and Eddy, 1996
; chimps and an orangutan, but not lower primates, Itakura, 1996
; capuchins, Anderson et al., 1996
; macaques but not lemurs, Anderson and Mitchell, 1999
; all great apes Brauer et al., 2005
; and marmosets, Burkart and Heschl, 2006
). Furthermore, some of these primates have been shown to follow gaze geometrically, indicating at least a limited understanding of another’s point of view (spider monkeys and capuchins, Amici et al., 2009
; apes, Brauer et al., 2005
; marmosets, Burkart and Heschl, 2006
). These studies contrast strikingly with primates’ failure to use human gaze or gesture to locate hidden food – a seeming paradox, and important area of comparative research (Hare and Tomasello, 2004
; Miklosi and Soproni, 2006
; Ruiz et al., 2008
) (for further discussion, see Rosati and Hare, 2009
).
Figure 2. Phylogeny of vertebrate gaze following. Though the most robust evidence for gaze following has emerged from primates, other mammals and birds have also been found to follow gaze. Evidence comes primarily from three groups: domestic mammals including goats and dogs, captive cetaceans including dolphins and seals, and birds including corvids and ibises. Better understanding of the evolution of gaze behavior will require more comparative studies, with a particular eye toward distinguishing both the sophistication and species-specificity of gaze cue responses.
Intriguingly, use of human deictic cues is also found in nonprimate species; in particular, certain domestic species succeed in those cooperative tasks primates often fail. Both goats (Kaminski et al., 2005
) and horses (Maros et al., 2008
) have a limited ability to follow pointing cues (however, c.f. Miklosi and Soproni, 2006
). Impressively, dogs both follow (Soproni et al., 2001
; Miklosi et al., 2003
) and direct (Miklosi et al., 2000
, 2003
) human gaze, suggesting a robust ability to share attention with humans. Domestication may select animals partly for their ability to socialize across species boundaries: for example, dogs follow human gaze more readily than do human-reared wolves (Miklosi et al., 2003
; see also Hare et al., 2002
).
Besides primates and domesticated animals, some birds and marine mammals have been shown to respond to deictic signals. Corvids, including crows, magpies and jays, are known for advanced abilities including tool use (reviewed Clayton and Emery, 2005
) and perhaps mirror recognition (Prior et al., 2008
) – and also possess the ability to geometrically follow human gaze cues (Bugnyar et al., 2004
; reviewed, Emery and Clayton, 2009
). Outside the corvid family, ibises have been shown to follow gaze, but lack the ability to follow gaze geometrically (Loretto et al., 2010
). Among cetaceans, dolphins and seals have some ability to follow pointing gestures (Shapiro et al., 2003
), and perhaps also head gaze cues (Pack and Herman, 2004
). These results are somewhat surprising, and suggest an important role for domain-general learning: It’s difficult to imagine marine mammals evolving an intrinsic ability to decode arm gestures.
While the ability of these animals to track human cues is impressive, the more ethologically relevant question is whether animals use deictic gaze cues in interactions with their own species. Even when animals can successfully perceive and respond to the deictic cues of another species, they may not typically be motivated to attend or react. Surprisingly few studies, however, have addressed the use of gaze cues among nonhuman species – it is difficult to observe gaze shifts from a distance, and more difficult still to arrange controlled and naturalistic interaction. Some primate species, notably chimpanzees, bonobos, and perhaps capuchins, may spontaneously attempt to direct others’ attention through gestures (de Waal, 2003
; Zimmermann et al., 2009
). Similar observations have been made in social hunters such as wolves (see Miklosi and Soproni, 2006
). These observations imply their complement, and recent data have supported the idea that diverse animals can read deictic social cues: chimps, mangabeys, and several species of macaque, (Tomasello et al., 1998
), domesticated goats, (Kaminski et al., 2005
), and dogs (Hare and Tomasello, 1999
). Similarly, novel monitoring techniques (e.g. Shepherd and Platt, 2006
) have made it possible to record subtle use of conspecific gaze cues during naturalistic interaction among ringtailed lemurs, who, like all prosimian primates, were not previously known to follow gaze (Shepherd and Platt, 2008
) (Figure 3
).
Figure 3. Lemurs use deictic social cues to guide naturalistic orienting. Upper: New technologies permit human and nonhuman gaze to be recorded during naturalistic interaction. These techniques can reveal subtle patterns of signaling which are difficult to detect in the field or evoke in the laboratory. Michael Platt and I recorded infrared video (A) of a lemur’s eye, as reflected in a dichroic mirror (B), while simultaneously recording (C) the scene in front of the lemur and transmitting (D) both data sets to a computer for extraction of gaze location. Though lemurs reportedly ignore human gaze cues, they nonetheless co-oriented with one another in natural settings, and tended to follow the gaze of individuals they had recently looked at. Lower: Co-orienting statistics: across the analyzed videos, the lemur subject tended to look in the direct of the outward red lines, and avoid the direction of the inward blue lines, relative to observed lemurs’ body (larger circle) and head (smaller circle) axes (methods, Shepherd and Platt, 2006
; results, Shepherd and Platt, 2008
).
The ability of other primates to follow human gaze appears to grow throughout development (Ferrari et al., 2000
, 2008
; Tomasello et al., 2001
; Okamoto et al., 2002
) (Figure 1
). An important theme has emerged that while humans follow eye gaze from an early age, other apes initially follow only head direction, responding to eye gaze alone only in adulthood (Tomasello et al., 2007
). This finding, together with the observation that human eyes are unusually visible (Kobayashi and Kohshima, 2001
), suggests the “cooperative eye” hypothesis: that adaptations promoting joint attention may have evolved among human ancestors as we became more interdependent (Tomasello et al., 2005
, 2007
).
Studies of interacting animals hold three major benefits for humans. First, comparisons between species may help us learn how social communication and orienting evolved prior to the advent of human language. Second, identifying the foundations for human abilities in nonhuman animals increases the tractability with which we investigate how specific genes and neural circuits contribute to social behavior. Third, because humans modify basic gaze behaviors in response to cultural and contextual pressures (Argyle and Cook, 1976
; Kleinke, 1986
) – likely including awareness of research surveillance – nonhuman animals may provide the most straightforward model for studying gaze behavior in natural, spontaneous social interaction.
To use animals as a model for human gaze-following behavior, however, we must also determine whether they follow gaze in the same manner as humans. What stimuli drive responses to averted gaze: head or eye movement, facial feature configuration, ocular contrast? Does gaze-following behavior arise from a simple orienting reflex in two- or three-dimensional space, or is it instead mediated by the fully-fledged representation of another’s subjective viewpoint? Is gaze following a type of innately-specified reflex, a simple learned association, or a conscious behavioral strategy? As we have discussed, at least some animals use both head and eye cues to define gaze and follow gaze geometrically, but even of these, many seem to have surprising difficulty using observed gaze to guide behavior. Definitive answers to these questions will require psychophysical decomposition of animal responses to gaze. The answers to these questions determine not only the suitability of animal models for the study of human social cognition, but have implications for the evolved neural architecture and cognitive processes that shape social behavior.
The human ability to represent other minds, researchers have argued, arose through development of a sequence of explicitly social modules. Specifically, Baron-Cohen (1994)
argued that the four crucial modules were an eye direction detector, intentionality detector, shared attention mechanism, and theory of mind mechanism. Emery and Perrett refined this idea slightly (see Emery, 2000
), proposing two modular detectors: one each for direct and deictic attention. Both proposals were grounded in Fodor’s (1983)
framework, in which modularity “is fundamentally a matter of information encapsulation” and acts to facilitate efficient, speedy, reflexive processing.
Such claims made strong predictions about how deictic gaze should influence attention. Attention had traditionally been dichotomized as either reflexive or voluntary (Jonides, 1981
; Posner and Cohen, 1984
; Muller and Rabbitt, 1989
). Reflexive orienting (exogenous, automatic, bottom-up, stimulus-driven) was evoked locally by abrupt changes in a region of space, with attentional deployments arising and fading quickly. In contrast, voluntary attention (endogenous, conscious, top-down, or goal-directed) was evoked by distant, complex or symbolic cues that make task-relevant predictions, and resulting attentional deployments are slow and sustained.
This framework sets up three key questions about psychophysical responses to observed gaze: First, do they result from a reflexive social module, from voluntary, domain-general orienting decisions, or via multiple pathways? Second, how does reflexive gaze following relate to comprehension of gaze behavior and viewpoint? Third, how does gaze-following behavior vary across individuals, and what can this tell us about the neural mechanisms of gaze response?
Reflexivity
Friesen and Kingstone (1998)
showed that nonpredictive eye gaze cues influenced subjects’ reaction times and accuracy at detecting, localizing, and discriminating peripheral targets. Furthermore, the time course strongly suggested that these attentional effects were reflexive, arising after only 105 ms. The next year, Langton and Bruce (1999)
found parallel results using head-gaze cues, and Driver et al. (1999)
reported that subjects followed gaze even when cues were counterpredictive of eventual target location (Figure 4
). Together, these results strongly supported the existence of a reflexive, informationally-encapsulated module mediating human gaze following.
Figure 4. Humans follow gaze reflexively. (A) Faces gazing left or right were presented for 100, 300, or 700ms, followed by a response target which appeared opposite gaze four times as often as it appeared in the gazed direction. (B) For the first half-second after cue onset, subjects responded faster to targets appearing in the direction of observed gaze -- despite their knowledge these targets were less likely (adapted from Driver et al., 1999
).
Importantly, this gaze-following reflex appears to arise early in development, consistent with its proposed importance in guiding social learning. In the laboratory, young infants followed photographed eye gaze (and not tongue movement), however, gaze following in the youngest infants was masked by a reluctance to disengage the face cue ( Hood et al., 1998
; Farroni et al., 2000
; see also Reid and Striano, 2005
). Biological motion appears to be crucial to these early responses to gaze, and though infants correctly ignored tongue movements (Hood et al., 1998
), their tendency to follow gaze was confounded when gaze shifts were produced by translating the face outline rather than the eyes themselves (Farroni et al., 2000
). Importantly, gaze following by infants may be contingent on ostentive signals: mutual gaze, and especially infant-directed speech, increases the likelihood a gaze cue will be followed (Farroni et al., 2003
; Senju and Csibra, 2008
). These results suggest a gaze-following reflex operates both in adults and in the youngest children ever tested, but that gaze responses may be sensitive to context.
Gaze following by nonhuman animals, too, appears reflexive. Deaner and Platt (2003)
showed that humans and macaques reflexively orient attention in response to nonpredictive gaze cues; furthermore, the shared dynamics of human and monkey responses implied shared mechanisms (Figure 5
) (see also Emery et al., 1997
; Shepherd et al., 2006
; but cf. Shepherd et al., in press
). This further supported the notion that primate gaze-following behavior was mediated by a specialized, reflexive neural module.
Figure 5. Anthropoids primates follow gaze with similar sub-second dynamics. (A) Monkeys and humans performed an identical task, in which they fixated a central face gazing left or right, and then looked toward a peripheral target. The target was not predicted by the gaze direction of the cue. (B) Monkeys and humans were faster to look toward targets appearing in the direction of cue gaze, independent of whether head or eye-only cue images were used. (C) The fixation positions of monkeys and humans shifted slightly in the direction of gaze, and did so with similar time course. Such fixation shifts are thought to result from microsaccadic drift biased in the direction of attention (adapted from Deaner and Platt, 2003
).
However, gaze following can be modulated by social context (reviewed, Frischen et al., 2007
). We have already noted that ostentive cues potentiate gaze following in infants; similarly, emotional expressions may sometimes potentiate reflexive gaze following in both humans ( Mathews et al., 2003
; Hori et al., 2005
; Putman et al., 2006
; but see Hietanen and Leppanen, 2003
) and monkeys ( Goossens et al., 2008
; but see Paukner et al., 2007
), and likewise dominance has been found to influence gaze following in both species (humans, Jones et al., 2010
; monkeys, Shepherd et al., 2006
). Finally, familiarity (Deaner et al., 2007
) and sexual interest (Khurana et al., 2009
) have been suggested to influence human gaze following. These data indicate that gaze following cannot be modular in Fyodor’s strictest sense: if social context can dampen or accentuate gaze-following responses, then the gaze-following mechanism is not informationally encapsulated.
However, contextual variables that enhance gaze following generally appear to operate by increasing social cue saliency (e.g. Ristic and Kingstone, 2005
; Birmingham et al., 2008
). A number of studies have linked joint attention deficits to a failure to fixate the eye region (e.g. Klin et al., 2002a
; Adolphs et al., 2005
; Dalton et al., 2005
). Failure to fixate the eyes deprives the brain of high-resolution visuosocial information, and may indicate a broader insensitivity to (or avoidance of) social stimuli. Eye gaze perception may be especially disrupted when presented outside the fovea (Burton et al., 2009
), and while gaze-following responses have not been explicitly probed in the periphery, several studies suggest that overt attention toward an individual increases the likelihood their gaze will be followed (by humans inspecting photographs, Dukewich et al., 2008
; Fletcher-Watson et al., 2008
; by interacting lemurs, Shepherd and Platt, 2008
). It is clear that saliency modulates gaze following, but less clear whether this is the only means of affecting the gaze-following response. If gaze following is modulated by social context in a manner that does not alter cue saliency – for example, by changing whether viewed individuals are cooperators or competitors – this would strongly militate against a distinct gaze-following module. For example, the recent finding that gaze-following responses may reverse in some salient real-world interactions (Nummenmaa et al., 2009
) further undermines the case for an encapsulated gaze-following module.
Diverse deictic cues are known to quickly and reflexively drive attention, including photographed eye gaze, head gaze, gaze in schematic faces, eyes alone, and ambiguous faces (at least once recognized as such – see Ristic and Kingstone, 2005
) (reviewed, Frischen et al., 2007
). Recently, however, it has been argued that symbolic nonsocial cues may also drive reflexive orienting. Though early studies found nonpredictive arrows to have little or no effect on orienting (e.g. Jonides, 1981
; more recently, Ricciardelli et al., 2002
; Friesen et al., 2004
), it now appears likely that arrows exert fast, reflexive influence on attention (e.g. Tipples, 2002
; Kuhn and Kingstone, 2009
). If nonpredictive symbolic cues reflexively shift attention, then responses to deictic gaze may be wholly or partly mediated by generic, domain-general learning processes (reviewed, Birmingham and Kingstone, 2009
; Kingstone, 2009
). This, too, would strongly militate against an innate, modular mechanism for following gaze.
In summary, fast gaze-following responses appear to be quick and reflexive, consistent with mediation by an encapsulated neural module shared across (at least) all primates. However, the context sensitivity and broad selectivity of sub-second gaze-following responses suggests that they are well-integrated with other social processes, and may be mediated by multiple neuronal pathways.
Gaze Following Vs. Gaze Perception
The idea that gaze-following behaviors might involve separate systems – one fast, innate and reflexive, one “slow” and nuanced – may help resolve seeming contradictions in the developmental, comparative, and psychophysical literature encountered above. However, any such approach must carefully distinguish the cues which effectively stimulate fast and slow gaze responses. In particular, it is relevant whether gaze cues are decoded equivalently when influencing reflexive gaze following and when informing gaze perception.
Two sets of results suggest gaze following and gaze perception might involve dissociable mechanisms: the first regards the precision with which gaze is resolved, the second, how gaze responses integrate conflicting deictic cues. As Doherty et al. (2009)
report in young children, it is possible for gaze following to occur in the absence of precise gaze perception (e.g. via motor contagion, reviewed Blakemore and Frith, 2005
; note also Bayliss and Tipper, 2006a
). However, the precision of gaze-following responses is not typically tested, and in a naturalistic change-detection paradigm, gaze-following responses appeared broadly tuned (Langton et al., 2006
). By contrast, adults can discriminate small differences in both deictic (Bock et al., 2008
) and direct (Gamer and Hecht, 2007
) gaze, treating gaze direction as a cone of approximately 6° width (see also Calder et al., 2008
).
Another argument regards the different gaze cues being used. Our eyes strongly constrain our visual attention, while our head and body orientations pose weaker constraints on attention and action planning: Eyes thus make stronger predictions about the spatial location of objects of interest. Similarly, when humans shift attention during natural behavior, inertia typically causes the eyes to lead the head, which in turn leads the body (Suzuki et al., 2008
): Eyes thus make stronger predictions about the timing of visual stimuli than other somatic cues. In fact, extreme postural conflicts may sharpen this temporal prediction, strongly implying a recent or abrupt gaze shift. These observations pose the question of how gaze responses differ when head and eye orientations conflict. Current evidence suggests both that humans follow eye gaze more reliably than head direction (see Tomasello et al., 2007
), and that gaze following of head cues is greater when a head is turned relative to the body than when at rest (Hietanen, 2002
). By contrast, however, perceptual judgments of head gaze are confounded when head direction conflicts with the eyes (Langton, 2000
; but c.f. Ricciardelli and Driver, 2008
) or with pointing gestures (Langton and Bruce, 2000
).
Eye gaze can be discriminated using crude luminance cues, while parsing head orientation would seem to require more complex and flexible configural processing. For example, manipulating periocular luminance can reverse gaze percepts ( Sinha, 2000
; Ando, 2002
, 2004
; though note Olk et al., 2008
). Discussing their finding that children perform poorly at explicit gaze discrimination long after they reflexively follow gaze, Doherty et al. (2009)
proposed that luminance-based gaze discrimination is innate and crude, while configural gaze discrimination is learned and precise. However, this observation rests uncomfortably alongside (Tomasello et al., 2007)
of a uniquely-human “cooperative eye”: if responses to ocular luminance cues are the more primitive form of gaze following, why is it so difficult for nonhuman animals to follow gaze using eyes cues alone?
Individual Differences in Gaze Following
Gaze cues are ubiquitous in typical human social development, and a failure to respond to gaze cues is associated with pathological linguistic and social development. While congenitally blind humans do develop language and a theory of mind, including relatively normal “mirroring” (Bedny et al., 2009
) and “mentalizing” (Ricciardi et al., 2009
) networks, they often experience developmental delays in language and social behavior (Hobson and Bishop, 2003
). In sighted individuals, an absence of gaze following both eliminates an important developmental cue and may be symptomatic of deeper dysfunction in social perception, motivation, or attention. Changes in visuosocial orienting are associated with a number of mental illnesses, including social anxiety (Bradley et al., 1997
; Compton, 2003
; Horley et al., 2004
), schizophrenia (Kington et al., 2000
; Franck et al., 2002
; Langdon et al., 2006
), and especially autism. Autistic individuals lack the desire and ability “to share enjoyment, interests, or achievements with other people” (APA, 1994
), and while the root causes of autism remain controversial, there is general consensus that failures of joint attention are among the best predictors of autism in early childhood (reviewed, Nation and Penny, 2008
).
Theorists have proposed that social deficits in autism may represent an extreme of natural testosterone-linked variation across individuals (Baron-Cohen, 2002
). On average, women respond more strongly than men to social cues (Geary, 1998
) and fetal testosterone is reported to negatively impact both social attention and social relationships among human juveniles (Knickmeyer and Baron-Cohen, 2006
). In both humans and macaques, females follow gaze more than males (humans, Bayliss et al., 2005
; monkeys, Paukner et al., 2007
). Furthermore, testosterone-linked social dominance appears to suppress gaze following by macaques (Shepherd et al., 2006
). These findings suggest that biological factors, especially those linked to sex hormones, may have a role in the neurobiology of gaze following.
Psychophysical variability in gaze following is interesting not just because of what it tells us about species, developmental, and individual differences in behavior, but because it suggests a framework with which to consider evolutionary, developmental, and individual differences in neural mechanisms. I outline our present state of knowledge about these neural mechanisms below.
Biologically-relevant stimuli are processed through both a fast, crude subcortical stream that is largely conserved across vertebrates, and through a slower, more nuanced cortical network that in primates is highly derived (Sewards and Sewards, 2002
; Vuilleumier, 2002a
; Johnson, 2005
). Gaze sensitivity has been reported in both pathways, though deictic gaze representations are most strongly supported in cortex: in particular, two meta-analyses have identified gaze sensitivity in the superior temporal sulcus (STS) and the dorsal and ventral frontoparietal attention networks (Grosbras et al., 2005
; Nummenmaa and Calder, 2009
). Neither the “simple” question of how these perceptions influence orienting, nor the more complex question of how they interact with cognitive processes such as shared attention and theory of mind, are definitively answered. We have, however, begun to trace pathways by which deictic gaze signals are processed in the brain (Figure 6
).
Figure 6. Potential mechanisms for gaze-following behavior. Two general pathways (shown here, schematically) could relate observed gaze to visual attention. At left, in blue, visual information travels from the retina to the lateral geniculate (1), the early visual areas (2), the social processing areas along the superior temporal sulcus (3), and finally toward attention control circuitry including the lateral intraparietal area, frontal eye fields, and superior colliculus (4). At right, in red, a hypothesized subcortical pathway travels directly from the retina to the superior colliculus (1), to the pulvinar nucleus of the thalamus (2), and to the amygdala (3). The subcortical pathway could influence attention locally in the superior colliculus or pulvinar, or via projections from amygdala to the early visual areas (4).
The Subcortical Pathway
The subcortical visual pathway, in humans, is hypothesized to flow from the retina to the superior colliculus, the pulvinar, and the amygdala (Morris et al., 1999
; Johnson, 2005
; Jiang and He, 2006
). Each of these regions can modulate processing in other parts of the brain and thus influence attention; furthermore, each receives descending projections from socially-activated cortices such as the fusiform gyrus, extrastriate body area and superior temporal sulcus (colliculus, Fries, 1984
; amygdala, Ghashghaei and Barbas, 2002
; pulvinar, Romanski et al., 1997
; Stefanacci and Amaral, 2002
). The amygdala, in particular, is sensitive to observed gaze (monkey electrophysiology, Gothard et al., 2007
; Hoffman et al., 2007
; human imaging, Kawashima et al., 1999
) and is known to play a role in social saliency processing (Morris et al., 1999
; Vuilleumier, 2002a
; Adolphs, 2008
). Intriguingly, the amygdala is sexually dimorphic (Goldstein et al., 2001
), and its dysfunction may contribute to autism (Schultz, 2005
), suggesting it is a nexus through which testosterone could influence gaze responses. The amygdala has not yet been shown to represent deictic gaze, as opposed to threat and flirtation-linked eye contact signals; but lesion studies now implicate amygdala in both intentional and reflexive gaze-following behaviors (Akiyama et al., 2007
; Okada et al., 2008
). Moreover, while the amygdala does not directly project to the visual orienting system, and may only coarsely differentiate regions of visual space, these characteristics are consistent with the fast gaze-following responses discussed above.
The Superior Temporal Sulcus
Social processing areas may be among those cortices homologous across all primates (Tootell et al., 2003
; Rosa and Tweedale, 2005
) and perhaps other mammals as well (Kendrick et al., 2001
). The first neurons sensitive to observed gaze were reported in macaques near the superior temporal sulcus (STS) (Perrett et al., 1982
, 1985
), and imaging subsequently revealed similar gaze-sensitivity near human STS (reviewed, Allison et al., 2000
), especially when observing surprising or incongruous gaze behavior (Pelphrey et al., 2003
).
However, the STS is a large area that contains multiple subregions. Cellular structure and connectivity vary across both its width and length, and while posterior regions communicate both with posterior parietal and frontal areas, anterior regions communicate preferentially with frontal and visual cortices (Seltzer and Pandya, 1989
, 1991
). Neurons in the middle anterior upper bank of the STS represent gaze direction independently of whether it arises through head or eye posture (Perrett et al., 1992
); notably, while more caudal neurons respond symmetrically to gaze averted to either the right or left, anterior neurons differentiate deictic gaze direction ( De Souza et al., 2005
; see also Jellema et al., 2000
). Though imaging studies have shown peak gaze sensitivity in posterior STS (Allison et al., 2000
; c.f. Grosbras et al., 2005
; Nummenmaa and Calder, 2009
), a recent adaptation study by Calder et al. (2007)
showed that human neurons with deictic gaze sensitivity are concentrated in anterior STS, just as in macaques. Two split-brain patients reflexively followed gaze only in one visual hemifield, consistent with cortical mediation by a single hemisphere specialized for face processing (Kingstone et al., 2000
). Lesions of STS are rare, but one patient with a large right superior temporal gyrus lesion had difficulty perceiving (Akiyama et al., 2006a
) and failed to reflexively follow (Akiyama et al., 2006b
) gaze.
The Extended Social Processing Network
Core visuosocial areas in the fusiform gyrus and STS interact with an extended face processing network, integrating body and face perceptions with contextual signals from areas including hippocampus, amygdala, and orbitofrontal cortext (OFC) (Ishai et al., 2005
; Vuilleumier and Pourtois, 2007
). These interlinked areas comprise a functional circuit that modulates perceptual and sensorimotor processing based on emotional and mnemonic associations (Sabbagh, 2004
; Smith et al., 2006
; Vuilleumier and Pourtois, 2007
); interestingly, each of these structures is sexually dimorphic (Goldstein et al., 2001
). Perception of averted gaze has been reported to activate neurons in the dorsal medial prefrontal cortex (Calder et al., 2002
; see also Nummenmaa and Calder, 2009
), possibly joined by the left superior frontal gyrus during bouts of coordinated joint attention (Williams et al., 2005
). Lesion data in one patient suggests that frontal areas F7, F10 and F11 may be necessary for social and symbolic attention (Vecera and Rizzo, 2004
). The extended social processing network is particularly interesting in that it comprises a network through which individual and contextual variables might modulate gaze-following responses.
The Frontoparietal Attention Networks
Cortical social perception areas project to frontoparietal attention areas (c.f. Corbetta and Shulman, 2002
) including macaque areas 7A and the lateral intraparietal area (LIP) (Seltzer and Pandya, 1991
) and the supplementary and frontal eye fields (SEF and FEF) (Seltzer and Pandya, 1989
). To mediate orienting decisions, these areas must balance the costs and benefits of attention shifts, including those resulting from either social or nonsocial stimuli (reviewed, Klein et al., 2009
). Of these areas, LIP is especially intriguing, because it receives projections from the STS and integrates oculomotor rewards (intrinsic social, Klein et al., 2008
; instructed nonsocial, Platt and Glimcher, 1999
) into a unified saliency map (Colby and Goldberg, 1999
).
My colleagues and I have recently reported (Shepherd et al., 2009
) that some neurons in LIP act as mirror neurons (reviewed, Rizzolatti and Craighero, 2004
) for attentional states, responding both when orienting toward a region of space and when observing another individual oriented toward this same location. Strikingly, however, these neurons were outnumbered, in our study, by neurons that became less active when nonpredictive gaze cues were oriented toward their response field. These findings indicate that LIP neurons weigh deictic social information when computing visual saliency, even when this information arises from outside their local response fields. However, they further suggest that the fastest reflexive gaze-following responses may arise outside LIP, and that, under these conditions, LIP acted primarily to tamp down a prepotent but task-irrelevant gaze-following response. In this account, LIP primarily mediates those deictic gaze responses which are modulated by intrinsic social reward (c.f. Shepherd et al., 2006
) and predictive value (c.f. Friesen et al., 2004
).
It is important to note that human frontoparietal expansion has resulted in uncertain homology between human and nonhuman parietal lobes (Orban et al., 2004
), nonetheless, two lines of inquiry have suggested that human parietal lobes, like those of the monkey, play an important role in joint attention. First, the same study that revealed deictic gaze processing in anterior STS also identified directionally-selective neurons in the human inferior parietal lobule (Calder et al., 2007
). Second, several studies by Saxe and colleagues have strongly implicated the right temporoparietal junction in mentalizing ( Saxe and Kanwisher, 2003
; Saxe and Wexler, 2005
; note also Mitchell, 2008
), and it is quite possible that these abstract perspective-taking abilities require similar neuronal computations to their more concrete variants. Both sets of findings are consonant with an fMRI meta-analysis finding overlapping activations during orienting shifts and gaze perception (Grosbras et al., 2005
; c.f. Nummenmaa and Calder, 2009
). However, in four patients for whom parietal lesions had caused neglect, gaze following was unaffected – and moreover, compensated for the lesions by reflexively directing attention into the otherwise neglected hemifield (Vuilleumier, 2002b
). These findings seem strikingly at odds with electrophysiological and imaging results suggesting involvement of posterior parietal cortex in joint attention, and warrant further investigation.
Though neuroscience has made great strides over the past century, we have only a rudimentary understanding of how the brain mediates the social behaviors that fill – and give meaning to – our lives. Our tendency to follow gaze is a relatively simple, stereotyped, measurable window into the operation and development of the social brain, and one that interacts with diverse aspects of social cognition.
In adult humans, gaze following is both automatic and sophisticated. We are sensitive to direct gaze from birth, a sensitivity that appears widespread among vertebrates, and deictic gaze responses are observed in infant brains and behavior by 4 months of age. Infant participation in shared attention correlates with later language learning and theory of mind development, likely both through direct contributions and because it indexes underlying social interest and motivation. Geometrical gaze following, in particular, involves a generalization from egocentric to allocentric space thus seems like an important foundation for perspective-taking abilities and the attribution of mental states. Curiously, however, human toddlers capable of geometrical gaze-following nonetheless have difficulty when asked to explicitly report gaze direction. Gaze following is widespread among primates and may also be evident in nonprimate species including dogs, marine mammals, and some birds, but even species that follow gaze geometrically often have difficulty using gaze to guide behavioral decisions. Such cooperative gaze behavior may have limited adaptive utility in species that lack cooperative social interactions. Social partnerships, including those between humans and between species during domestication, likely act to facilitate the evolution of joint attention abilities.
Gaze-following behavior appears partially reflexive, but is, at minimum, modulated by factors effecting cue saliency. Though gaze signals are parsed and attended through socially-specific processing mechanisms, it remains uncertain whether gaze following operates via a specialized module, via domain-general learning mechanisms, or both. It is quite possible that multiple mechanisms exist, and are differentially active across developmental timepoints, species, or pathologies. Distinct visual cues, such as periocular luminance or facial feature configuration, may drive these mechanisms. Across both normal and pathological populations, variant levels of joint attention suggest that underlying neural mechanisms are sensitive to biological factors, notably including suppression by testosterone.
The pathway by which deictic gaze cues influence orienting remain unknown, but likely includes a fast and crude subcortical pathway as well as a slower, more nuanced cortical pathway. Important questions include whether amygdala neurons differentiate between averted gaze directions; how posterior and anterior superior temporal sulcus regions differ in their contribution to social behavior; and, definitively, whether gaze-following responses can be suppressed through reversible inactivation of amygdala, LIP or FEF, or posterior or anterior STS.
More generally, both neural and behavioral studies should note two facts: first, deictic gaze responses are spatially selective and are not adequately investigated by contrasting averted vs. direct gaze; second, at least some deictic gaze responses are reflexive and occur independent of task relevance or behavioral goals. Whether gaze following arises through a specialized module or an integrated facet of social processing, it clearly lies at an important hinge in evolution and development, crucially influencing our interactions with our peers. Further research will illuminate these issues and potentially suggests means of intervening when individuals fail to exhibit typical joint attention behaviors.
Social interactions are difficult to evoke in a laboratory setting. For this reason, it is important we identify simple and robust behaviors through which we can probe the neural mechanisms of social interaction. We are only now beginning to relate this one relatively simple and well-defined behavior – gaze following – to its possible neural underpinnings. Already, a fascinating picture is emerging of how we, as a species and as individuals, learn to understand one another. To gain insight into others’ beliefs and desires, we need a window into the mind. To find one, we look to the eyes.
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This text reflects the innerving influence of Michael Platt’s and Asif Ghazanfar’s laboratories, the support of a NAAR predoctoral training grant and Princeton University’s Quantitative and Computational Neuroscience T32 training grant (NIH R90 DA023419-02). Contents are solely the responsibility of the author and do not necessarily represent the official views of NIH. Special thanks to Ipek Kulahci, Darshana Narayanan, Asif Ghazanfar, Arwen Long, and Chandramouli Chandrasekaran. Chimpanzee photo in Figure 1
used under (cc) license courtesy photos8.com.