Department of Physiology, College of Medicine, University of Arizona, Tucson, AZ, USA
Scanpaths (the succession of fixations and saccades during spontaneous viewing) contain information about the image but also about the viewer. To determine the viewer-dependent factors in the scanpaths of monkeys, we trained three adult males (Macaca mulatta) to look for 3 s at images of conspecific facial expressions with either direct or averted gaze. The subjects showed significant differences on four basic scanpath parameters (number of fixations, fixation duration, saccade length, and total scanpath length) when viewing the same facial expression/gaze direction combinations. Furthermore, we found differences between monkeys in feature preference and in the temporal order in which features were visited on different facial expressions. Overall, the between-subject variability was larger than the within- subject variability, suggesting that scanpaths reflect individual preferences in allocating visual attention to various features in aggressive, neutral, and appeasing facial expressions. Individual scanpath characteristics were brought into register with the genotype for the serotonin transporter regulatory gene (5-HTTLPR) and with behavioral characteristics such as expression of anticipatory anxiety and impulsiveness/hesitation in approaching food in the presence of a potentially dangerous object.
Every day, primates make hundreds of thousands of eye movements to locate and explore the most relevant details of their visual world. The succession of saccades and fixations carried out while exploring an image (the scanpath) results from successive re-allocation of attention from one detail to another (Yarbus, 1967
; Burman and Segraves, 1994
). Some scanpath components can be predictably attributed to properties of the visual scene. High light intensity, contrast, color, orientation, flicker, and movement have all been shown to be capable of predicting attended locations (Mackworth and Morandi, 1967
; Itti and Koch, 2000
; Guo et al., 2003
; Foulsham and Underwood, 2008
). In addition, it is clear that factors internal to the individual determine the importance of regions in a visual scene; e.g., expertise, memory, social knowledge, or emotional state can predictably modify scanpaths (e.g., Buswell, 1935
; Yarbus, 1967
; Zangemeister et al., 1995
; Adolphs et al., 2005
; Hannula et al., 2007
; Humphrey and Underwood, 2009
).
Primates explore the faces of others to extract information about identity, sex, age, health, reproductive status, emotional state, and relative social status. Scanpaths over faces contain a degree of order or predictability that suggests a hierarchy between facial features. The eyes are the first and the most extensively explored feature (Walker-Smith et al., 1977
; Keating and Keating, 1982
; Nahm et al., 1997
; Parr et al., 2000
; Pelphrey et al., 2002
; Gothard et al., 2004
, 2009
; Dahl et al., 2007
, 2009
). The saliency of the eyes appears to result from top-down processes given that monkeys saccade towards the eye region even if the eyes are removed (Guo, 2007
) or low-pass filtered (Gothard et al., 2009
). The relative proportion of time spent looking at the eyes, mouth, ears, etc., depends on facial expression (Gothard et al., 2004
), however, the scanpaths on various facial expressions are not sufficiently expression-specific to reliably predict each expression. Clearly there are other factors that contribute to the order in which features are fixated, the time spent exploring each feature, and the tendency to revisit them after the initial fixation.
Recent observations suggest that grouping individuals based on personality measures can add a degree of predictability to scanpaths. Along these lines, scanpath patterns in healthy subjects have been correlated with optimism/pessimism (Isaacowitz, 2005
) or neuroticism (Perlman et al., 2009
), and the majority of neuropsychiatric disorders are accompanied by scanpath abnormalities, many of which relate to viewing faces. Autistic individuals show decreased attention to facial features, especially the eyes, and a deficit in recognizing facial emotion (Hobson, 1986
; Pelphrey et al., 2002
). Social withdrawal in schizophrenia and social phobia are often accompanied by a deficit of looking at the eyes (e.g., Williams et al., 1999
; Horley et al., 2004
).
The goal of this study was to test for similar relationships between scanpaths and individual characteristics in monkeys. We hypothesized that individual behavioral and genetic differences between monkeys would manifest in significantly different visual scanning patterns of images with high socio-emotional value. The first step toward this goal was to quantify and document differences in scanpaths of monkeys looking at facial expressions and relate the individual differences (if any) to behavioral tendencies and the genotype for the serotonin transporter gene.
Our approach takes into account the similarities and differences between humans and monkeys with regard to the information contained in images of facial expressions. In monkeys, the overt or furtive gaze at another individual is linked to the dominance hierarchy; e.g., direct eye contact constitutes a social threat (Chevalier-Skolnikoff, 1973
; Redican, 1975
) and, in patas monkeys (Erythrocebus patas), an individual’s dominance is reflected in the number of looks it receives from other individuals (McNelis and Boatright-Horowitz, 1998
). Furthermore, rhesus macaques are willing to forgo reward to look at a dominant individual and have to be “paid” extra reward to look at low-status conspecifics (Deaner et al., 2005
). These preferences reflect an individual perception of social reward and are linked to the length polymorphisms in serotonin transporter gene (Watson et al., 2009
). It appears likely that in monkeys, as in humans, scanpaths are related to temperament, rank, age, and possibly social competence, and that these traits are, at least in part, genetically determined.
All experimental procedures were performed in compliance with the guidelines of the National Institutes of Health for the use of primates in research and were approved by the Institutional Animal Care and Use Committee at the University of Arizona. In preparation for recording eye movements, each monkey was fitted with a head fixation device attached to the skull under isoflurane anesthesia.
Subjects
Scanpaths from three adult macaques, T, Q, and H weighing between 10–14 kg, were used in this study. All three animals were born at the California National Primate Research Center (Davis, CA, USA). Q and H were mother-reared in outdoor enclosures and were brought indoors at 2 years of age, whereas T was peer-reared in indoor housing and moved to an outdoor community after 1 year.
Stimuli
Subject monkeys viewed stimulus images on an LCD computer monitor that spanned 40 × 30 degrees of visual angle (dva) with a refresh rate of 60 Hz. Face stimuli subtended 11.5 dva. All subject monkeys viewed images of conspecific faces on a black background extracted from a large library of monkey face images described in Gothard et al. (2004)
. Stimulus monkeys included males and females displaying one of four standard facial expressions: lip-smack, fear-grimace, threat, and neutral (Chevalier-Skolnikoff, 1973
; Redican, 1975
) with either direct or averted gaze (Figure 1
). Subject monkeys had neither seen nor interacted with the stimulus monkeys outside of this experimental setup. Monkeys T, Q, and H viewed multiple faces of 9, 12, and 12 individuals, respectively. The pool of individuals viewed by each monkey was only partially overlapping.
Figure 1. Examples of face stimuli. Columns and rows indicate facial expression and direction of gaze, respectively. Each colored contour encloses one of four main facial regions, eye (and brow), mouth, nose, and ears. For each feature, visitation was quantified by scoring time spent looking within the contour.
Behavioral Task and Training
Subject monkeys were seated in a primate chair with their eyes at 57 cm from the monitor. They were trained to fixate on a 0.5 dva white square (henceforth called “fixspot”). Successful fixation (maintaining gaze for 100 ms in an area of 2 × 2 dva surrounding the fixspot) was followed by image presentation. The monkeys were then free to view the image for 3 s, but without moving their eyes outside a 15.5 × 15.5 dva area centered on the image. Monkeys T, Q, and H successfully completed 2629, 993 and 927 trials over 9, 12, and 11 experimental days, respectively.
Recording Eye Position and Analysis
Eye position was recorded using an infrared camera sampling at 120 Hz (ISCAN, Inc) and collected as an analog signal through a CED Power1401 data acquisition system (Cambridge Electronic Devices, UK). Fixations and saccades were extracted from the eye position data using custom MATLAB scripts (The Mathworks Inc, MA, USA). First, eye velocity for each point was computed as the distance traversed by the eye within a 40-ms moving window. Next, a velocity threshold was set at three times the median during a 3-s moving window. Data points that exceeded this threshold were considered saccades. Fixation time midpoints were defined as the center millisecond timepoint between saccades, and fixation locations were defined as the mean eye location in this sub-threshold period. Subsequent fixations could not occur less than 0.3 dva away from each other. Due to the temporal smoothing inherent in the moving-window technique, saccade start and end times were found separately from the fixation locations for the sake of increased precision. Using the fixation locations, we transformed absolute eye speed into eye speed relative to the final location for each fixation. Saccade start and end times were then defined as times that the eye speed relative to the final fixation destination went above or below a velocity threshold of 30 dva/s. Trials with noisy eye data were excluded from fixation-based analyses, but were used for region-based analysis, where the exact timing of the saccades was not critical.
The number of fixations per trial reported here exclude the first fixation (triggered by the fixspot) and the final fixation of the trial (which was cut short by the end of the trial). Fixation durations were defined as the period of time between the end of one saccade and the beginning of the next. Total scanpath length was calculated as the sum of the distances between successive fixations.
In order to quantify spatial allocation of attention, an unbiased rater drew polygons defining four regions: the eye region (eyes proper and brow), the midface (including the nose), mouth (upper lip, lower lip, and chin), and ears. Eyes, midface, and mouth regions generally adjacent to each other, and all regions extended roughly 1 dva into the surrounding non-feature regions (see Figure 1
).
For a quantitative scanpath analysis, we used four basic measures: (a) number of fixations, (b) fixation duration, (c) average saccade length, and (d) total scanpath length. The data used for the quantitative analysis contains scanpaths from all three monkeys, who viewed fear-grimace, threat, and neutral expressions with both direct and averted gaze. Equal ratios of these expression-gaze combinations were used for between-monkey comparisons. Equal ratios of these expression-gaze combinations were used for between-monkey comparisons (1326, 501, 657 randomly selected trials of each expression for T, Q, and H, respectively). Nine individuals with three facial expressions and two gaze directions each were viewed therefore by all three subjects. Due to the non-normality of the data, we determined statistical significance using the Kruskal-Wallis ANOVA. Post-hoc Tukey-Kramer tests were used to differentiate distributions at 95% confidence intervals. All reported statistics were analyzed with these methods and χ2 and p-values are given.
Behavioral Measures
The monkeys were subjected to two tests aimed at highlighting behavioral/temperamental differences. The first test measured behavior in a situation of anticipatory anxiety. A familiar caretaker approached their cage at feeding time with a bucket of biscuits, but instead of delivering them to the monkey, the caretaker remained standing in front of the cage for 5 min without making eye contact. The difference between the responses of the three monkeys in this situation was measured by an ethogram that documented the time and frequency of each behavior observed during the trial (see Figure 2
). This procedure was carried out only once.
Figure 2. Ethogram accounting for all the spontaneous behaviors. These behaviors occurred during 5 min when a familiar caretaker approached the cage of each monkey with a bucket of food but instead of delivering the food into the monkey’s feeding tray stood in the front of the cage without making eye contact with the monkey. The bar graphs on the right show for each monkey the relative ratio of sitting quietly and pacing.
The second test addressed the impulsivity of the monkeys when confronted with a desirable fruit medley in the presence of a potentially dangerous object (naturalistic rubber snake). This test was carried out in three sessions. In the first session the monkey was delivered to a duplex cage that was the only object in an otherwise empty and unfamiliar room. A duplex cage is two adjacent cages with a divider between the two halves that the monkeys know how to manipulate. The divider was left slightly open, just enough for the monkey to see that a large bowl of fruit medley was available in the adjacent cage. The monkey was able to open the divider to retrieve the food. On the second day the same procedure was carried out. On the third day, a rubber snake was placed in the path the monkey had taken to the food in the previous two sessions. The time elapsed between opening the divider and eating the food was measured. Based on similar tests in monkeys (Fairbanks et al., 2001
; Kalin et al., 2001
) impulsivity, or in this case risk-taking was expected to be negatively correlated with the latency of retrieving the food.
DNA Extraction and Genotyping
Buccal cells were collected from subjects using Omni Swabs (Whatman Scientific, VT). Swab heads were ejected into tubes containing 750 μl of lysis buffer (50 mM Tris pH 8.0, 50 mM EDTA, 25 mM Sucrose, 100 mM NaCl, 1% SDS) for storage. Samples were stored at room temperature until a time at when they could be extracted. Immediately prior to extraction the lysis buffer volumes (with the swab head) were normalized to 1 ml, 25 μl of proteinase K (20 mg/ml) was added to all samples, and they were incubated with agitation at 55°C overnight. DNA was isolated from the samples using a protocol modified from the Qiagen BioSprint 96 DNA Blood Kit as follows: The lysate was loaded into two identical 2.2 ml deep well plates. Each plate contained 300 μl lysate, 300 μl Isopropanol, and 300 μl Qiagen buffer AL. The first plate also contained 35 μl of Qiagen MagAttract Suspension G. The BioSprint 96 stared in lysis plate one with a premix followed by a 4-min binding step and one bead collection. The beads were then transferred into the second lysis plate with a premix followed by a 4-min binding step and one bead collection. The beads were then transferred to a wash plate containing 800 μl of Qiagen buffer AW1 where they had a 4-min wash step. The beads were then transferred to a second wash plate containing 500 μl of Qiagen buffer AW1 where they had a 3-min wash step. The beads were then transferred to a third wash plate containing 500 μl of Qiagen buffer AW2 where they had a 2-min wash step. The beads were then transferred to a fourth wash plate containing 500 μl of Qiagen buffer AW2 where they had a 2-min wash step. The beads were then transferred to a fifth wash plate containing 500 μl of molecular grade water with a 0 time dip with no release. The beads were then transferred to an elution plate containing 250 μl of Qiagen buffer AE (pre-incubated at 55oC) with a 10-min elution step. The beads were then collected and transferred into first lysis plate and the above binding, wash, and elution steps were repeated for a second time.
Samples were quantitated using a pico green assay and subsequently normalized to 10 ng/ul for use in PCR. Each sample was run in duplicate in a total reaction volume of 20 ul with a input concentration of 10 ng/ul of genomic DNA, PCR buffer (200 mM Tris–HCl pH 8.4, 500 mM KCl) 4 mM MgCl2, 0.2 mM each dNTPs, and 0.2 ul Applied Biosystems Platinum Taq Polymerase. PCR primers (rhMUT 5’-TCG ACT GGC GTT GCC GCT CTG AAT GC-3’ and rhINT 5’-CAG GGG AGA TCC TGG GAG GGA-3’) sequenced from Wendland et al. (2006)
, and run in the PCR at 5 pmol/ul. PCR reactions were then run on an MJ Research PTC 100 thermocycler at the following conditions; 2-min incubation at 95°C for hot start Taq activation, followed by 40 cycles of 94°C for 10-s denature, 72°C anneal for 30 s, and 72°C extension for 30 s. Samples were then run out on a 3% agarose sizing gel alongside an Invitrogen 1 KB size standard ladder at 90 V for 4 h. Gel was subsequently stained using GelStar Nucleic Acid Stain and imaged on a UVP BioSpectrum imaging system.
Each Monkey had a Different Genotype for the Promoter Region of the Serotonin Transporter Regulatory Gene (5-HTTLPR)
Monkey T was homozygous for the short allele (S/S) of the serotonin transporter gene, monkey Q was homozygous for the long allele (L/L), and monkey H was heterozygous (L/S).
Individual Differences in Temperament and Performance on Behavioral Tests
A summary of individual characteristics of the three monkeys is provided in Table 1
.
The three monkeys performed differently on each behavioral test. The first test characterized their behavior during anticipation of food, the second test measured their impulsivity/hesitance to take food that was placed in the vicinity of a rubber snake. The latter was not intended to measure snake fear, but was used for the sole purpose to compare the behavior of the three monkeys in identical situations.
The ethogram in Figure 2
shows that in response to test one, monkey T spent most of the time pacing. This nervous behavior is reflected to some degree in his scanpaths, which contained stereotypical elements such as regular re-fixation of the eyes and a general tendency to hyperscan the image, e.g., to make frequent, large amplitude saccades. In a control test (not shown) in which the caretaker was empty-handed, T spent only 6.7% of the time pacing, suggesting that his pacing was a response to the anticipatory stress preceding feeding. Monkeys Q and H sat still for the majority of the trial, with the exception of a short bout of pacing at the beginning of the trial in Q, and a short bout of vocalizations (food coo) in H around the end of the first minute of the test. Aggressive behaviors (head bob) or covert aggression (yawn) were rare in all three monkeys.
In response to the second test (snake exposure), Monkey T retrieved the food after 4 s in the first two trials in which there was no snake in the adjacent cage. When the snake was present, T took 11 s to brush the snake aside and retrieve the food. Monkey Q walked over and ate the food after 20 and 12 s in the trials without the snake and 75 s when the snake was present. Q was the only monkey who, both before and after eating the food, manipulated (poked, grabbed, sniffed, and mouthed) the snake. Monkey H walked into the adjacent cage when the snake was not present after 15 and 6 s in the first and second day, respectively. In the presence of the snake, H jumped back upon noticing the snake and avoided the adjacent cage for the entire remainder of the trial (30 min).
Large Individual Differences in Basic Scanpath Measures
The scanpaths of the three subjects showed both qualitative and quantitative individual differences. An example of the qualitative differences between the scanpaths of monkeys T and Q looking at the same face are shown in Figure 3
. Note that the scanpaths of each monkey are relatively reliable across trials.
Figure 3. Example scanpaths generated by the three subjects over the same image. (A) Eye position at each millisecond is represented by a line whose color indicates the time according to the color scheme on the left (blue is eye location at image onset and red is 3000 ms later, immediately before image removal). Note that monkey T looked at the eyes in the first 500 ms, then at the mouth or midface around 1000 ms, and reliably visited the ear region toward the end of the scanning time. Monkey Q also preferred to look at the eyes, but spent more time exploring the other features and his scanpaths were less stereotypical. Monkey H spent more time exploring the mouth, but glanced at least once at the eyes. (B) Average looking time over the entire image across 73 trials (monkey T), 12 trials (monkey Q), and 21 trials (monkey H). Note these averaged “heat maps” suggest greater inter-monkey similarity than the scanpaths, because they lack a temporal dimension, however, they suggests that on average T preferred the eyes, H preferred the mouth, and Q distributed his viewing time across all features.
Individual differences in the number of fixations
The number of fixations were significantly different between monkeys (Kruskal–Wallis (K–W) ANOVA, χ2(2,2481) = 1744.49, p < 0.001) (Figure 4
A). The mean number of fixations per trial were 15.20 ± 1.96, 7.24 ± 2.45, and 10.28 ± 2.46 fixations per trial for subjects T, Q, and H respectively. All pairwise differences between monkeys were significant (Tukey-Kramer multiple comparisons procedure at 95% confidence interval).
Figure 4. Individual differences in four basic scanpath measures for 3 s of face scanning. (A) Average number of fixations; (B) Average fixation duration; (C) Average saccade distance; and (D) Total scanpath distance. Significant differences were found among monkeys T (blue), Q (green), and H (magenta) on all four scanpath measures. Bars are median values, whiskers are upper and lower interquartile range. Monkey T made short but frequent fixations. His individual saccades were longer as was the total distance travelled during saccades. In contrast, monkey Q made fewer and longer fixations. Additionally, his saccades were shorter resulting in shorter total scanpath length. Monkey H showed an intermediate, nevertheless significantly different pattern compared to T and Q. (*Tukey-Kramer 95% confidence interval).
Individual differences in fixation duration
The duration of fixations was significantly different between subjects (χ2(2,21671) = 4067.484, p < 0.001). Mean fixation durations were 171 ± 84 ms, 381 ± 312 ms, and 333 ± 227 ms for T, Q, and H, respectively. Median values were 161, 266, and 261 ms, respectively (Figure 4
B). As expected, Q, who made fewer fixations, had longer fixation durations. Q and H had significantly longer fixation durations than T, and Q had significantly longer fixations than H (T-K 95% C.I.). Although the median fixation durations were not very different between Q and H, Q’s tendency to engage in longer fixations (“stares”) made the distributions significantly different.
Individual differences in saccade length
Monkeys T, Q, and H made saccades with mean distances of 3.43 ± 1.73, 2.47 ± 1.43, and 2.21 ± 1.69 dva and medians of 3.26, 2.12, and 1.65 dva, respectively (Figure 4
C). All differences, including pairwise comparisons, were significant (χ2(2,28115) = 3104.56, p < 0.001; T-K 95% C.I.) indicating that average saccade length can also be used as an individual characteristic of scanpaths.
Individual differences in total scanpath distance
As predicted by the differences in number of fixations and saccade length, the total scanpath distances (with means of 48.25 ± 8.2, 15.71 ± 7.30, 20.34 ± 8.88 dva and medians of 47.58, 15.12, and 19.95 dva for T, Q, and H, respectively) were also significantly different among monkeys, including post-hoc pairwise comparisons (χ2(2,2481) = 1793.50, p < 0.001; T-K 95% C.I.) (Figure 4
D).
Monkeys Prefer to Explore Different Facial Features
To determine whether faces were equally salient stimuli for the three subject monkeys looking time outside the face was compared across monkeys. Guo et al. (2003)
successfully used this measure to assess image saliency. Although monkeys were required to maintain gaze within a bounding box surrounding the image to obtain reward, faces occupied only 32.8 ± 4.1% of the total picture area. Monkeys could therefore chose to look outside of the face and still receive reward. T, Q, and H made a mean of 0.40 ± 1.16, 0.44 ± 1.47, 0.10 ± 0.67 fixations outside of the face area per trial. By this measure, H had significantly fewer fixations outside of the face than T and Q who did not differ from each other. (χ2(2,4465) = 89.41, p < 0.001, T-K 95% C.I.). For all monkeys, however, less than 10% of trials had more than one fixation outside of the face area, indicating a high level of interest in the face across all monkeys (see the “out” bar in Figure 5
).
Figure 5. Monkeys differ in amount of time spent looking in each face region. In the left panel, schematic drawings represent the distribution of looking time on four face regions for each of the three monkeys (looking data for these plots were averaged across facial expressions and gaze directions in order to obtain an overall feature preference). The right panel shows the variability of feature preference for each monkey as well as the time spent looking at non-feature areas of the face and outside the face. Times are normalized by the size of each region in dva2. Bars are medians, whiskers are interquartile ranges.
Inside the face, however, monkeys allocated attention differently to different facial features. As hand-drawn polygons enclosing different facial regions differed in size across images due to differences in head size and rotation, we normalized looking time in a region by the region size. Monkeys differed significantly in the normalized time devoted to eye + brow, mouth, midface, ear, head, and non-feature face regions (eye: χ2(2,2481) = 732.7, p < 0.001; mouth: χ2(2,2481) = 871.4, p < 0.001; midface: χ2(2,2481) = 29.3, p < 0.001; ears: χ2(2,2481) = 539.8, p < 0.001; non-feature: χ2(2,2481) = 116.2, p < 0.001; out: χ2(2,2481) = 119.3, p < 0.001) (Figure 5
). As shown in Figure 5
, the normalized amount of time spent looking in the eye region was longest for T and shortest for H. Q ranked between T and H. Contrary to the expectation that the eyes would be the most explored feature in faces, monkey H preferred the mouth over the eyes. H’s preference for the mouth has been previously documented in the context of other tasks (Gothard et al., 2009
). The midface region was more often visited by monkey T and Q than H, who did not differ significantly from each other (T-K 95% C.I.). T also looked at the ears significantly longer than H, while H looked significantly longer at the ears than Q (T-K 95% C.I.).
After confirming the presence of scanpath differences among monkeys over pooled expressions, we next determined how facial expressions and direction of gaze in face images influence of the scanpath of each viewer monkey.
Monkeys Respond Differently to the Gaze Direction in Stimulus Faces
We first compared scanpaths for direct and averted gaze across all facial expressions, then determined whether scanpath differences between faces with averted and direct gaze were also depended on facial expression.
Scanpaths of monkeys T and Q were sensitive to the direction of gaze in the stimulus images. Gaze direction did not affect the scanpaths of monkey H. Monkeys T and Q made more fixations of shorter duration on direct gaze images than on averted-gaze images, whereas monkey H did not show a similar difference (T: χ2(1,2204) = 36.33, p < 0.001; Q: χ2(1,955) = 26.7, p < 0.001; H: χ2(1,521) = 0.7655, p = 0.38 for fixation number and T: χ2(1,10562) = 20.87, p < 0.001; Q: χ2(1,5656) = 12.3, p < 0.001; H: χ2(1,1759), p = 0.48 for fixation duration). Monkeys T and Q made significantly longer scanpaths over averted versus direct gaze stimuli (Figure 6
), whereas H showed no difference between the two gaze directions (T: χ2(1,2204) = 92.12, p < 0.001; Q: χ2(1,955) = 45.6, p < 0.001; H: χ2(1,521) = 1.76, p = 0.18).
Figure 6. The effect of gaze direction on scanpath length. Scanpath length over averted images is increased for all facial expressions in monkey T, three expressions in monkey Q, and none in monkey H. Bars and whiskers represent median scanpath length in dva and interquartile range, respectively.
For T, comparing between direct and averted-gaze variants of each expression revealed that direct lip-smacks (LS), threats (TH), and neutral (NE) facial expressions elicited more fixations than their counterparts with averted gaze. Direct and averted fear-grimaces (FG), however, elicited similar numbers of fixations (direct vs. averted: LS: χ2(1,2204) = 36.33, p < 0.001; TH: χ2(1,731) = 17.31, p < 0.001; NE: χ2(1,738) = 6.84, p = 0.009; FG: χ2(1,307) = 0.021, p = 0.88). For Q, the trend was reversed with averted fear-grimace, threats, and neutral images eliciting more fixations than their direct counterparts (direct vs. averted: LS: χ2(1,158) = 1.61, p = 0.20; FG: χ2(1,160) = 6.93, p = 0.008; TH: χ2(1,313) = 6.59, p = 0.010; NE: χ2(1,318) = 12.2, p < 0.001). T made longer fixations on direct lip-smacks and neutral images but not on direct gaze fear-grimaces and threats (LS: χ2(1,10562) = 20.87, p = 0.01; NE: χ2(1,3554) = 23.70, p < 0.001; FG: χ2(1,1456) = 1.88, p = 0.17; TH: χ2(1,3508) = 0.031, p = 0.86). Q also exhibited a trend towards longer fixations over direct images, with significant effects over fear-grimace and neutral images (LS: χ2(1,934) = 2.22, p = 0.13; FG: χ2(1,948) = 5.31, p = 0.021; TH: χ2(1,1902) = 0.86, p = 0.36; NE: χ2(1,1866) = 5.49, p = 0.019). We did not test H’s gaze preferences individually for expressions since he showed no grouped level differences. It should be noted that for fixation-based measures, H’s analysis did not include any lip-smack images.
Gaze direction shifted attention to different facial features in monkeys T. Comparing each direct expression with the same expression with averted gaze, T spent more time visiting the eye region in direct fear-grimaces and threats and less time in direct neutral images compared to averted counterparts (FG: χ2(1,440) = 12.11, p < 0.001; TH: χ2(1,874) = 24.26, p < 0.001; NE: χ2(1,875) = 24.65, p < 0.001). Changes in eye-looking were, at least in part, compensated by an inverse change in mouth-looking; that is, T spent more time viewing the mouth region in direct lip-smack and neutral images and less in fear-grimace and threat images (LS: χ2(1,432) = 10.63, p = 0.0011, FG: χ2(1,440) = 39.89, p < 0.001, TH: χ2(1,874) = 6.74, p < 0.001). Q showed no differences in eye nor mouth-looking between direct and averted versions of any expression (eye: LS: χ2(1,164) = 2.25, p = 0.13; FG: χ2(1,165) = 0.89, p = 0.35; TH: χ2(1,327) = 2.54, p = 0.11; NE: χ2(1,329) = 0.88, p = 0.35; mouth: LS: χ2(1,164) = 1.41, p = 0.24; FG: χ2(1,165) = 2.56, p = 0.11; TH: χ2(1,327) = 0.26, p = 0.61; NE: χ2(1,329) = 1.73, p = 0.19).
Monkeys Scan Facial Expressions Differently
Scanpath parameters for each monkey changed with facial expression (Figure 7
). Facial expressions were compared separately for direct and averted-gaze images for all monkeys (Recall that H’s fear-grimace trials were excluded).
Figure 7. Allocation of visual attention to face regions as a function of facial expression and direction of gaze. The color for each face region represents normalized looking time. Rows (A–D) contain schematic facial expressions of lip-smack, fear-grimace, threat, and neutral, respectively. The plots in the blue, green, and purple rectangles labeled T, Q and H correspond to data from each monkey. The line graphs below each face show the percent of trials in which the monkey was attending a given face region at each time point of the 3-s trial. In the first 200–400 ms the eye of the viewer was fixated by default at the center of the image (the average time of first saccade for each monkey is indicated by vertical dotted line). Note that the temporal structure of the scanpath of monkey T were more predictable that those of the other two monkeys.
Monkey T
On direct gaze images T made more fixations over lip-smacks and threats than fear-grimaces and neutral faces (χ2(3,1096) = 66.53, p < 0.001). The longest fixations were measured on lip-smacks and neutral images, shorter fixations were seen on fear-grimaces, and the fixations were the shortest on threats (χ2(3,5292) = 106.16, p < 0.001). T’s average saccade distance was longer on fear-grimaces and threats than neutrals followed by lip-smacks, which attracted the shortest saccades. (χ2(3,14237) = 171.5, p < 0.001). On averted-gaze images, T made more fixations over threats than neutral images (χ2(3,1102) = 14.59, p = 0.002). Saccades were the longest on threats followed by fear-grimaces and neutrals, and the shortest on lip-smacks (χ2(3,13796) = 91.3, p < 0.001). T had significantly longer scanpaths over threat images than all others, for both direct- and averted-gaze images (direct: χ2(3,1096) = 75.99, p < 0.001; averted: χ2(3,1102) = 64.13, p < 0.001).
Monkey Q
Q showed remarkably fewer significant differences in scanning facial expressions. No significant differences were found in number of fixations, fixation duration, average saccade length, or total scanpath distance for either direct or averted-gaze images, when compared separately (direct: number of fixations: χ2(3,465) = 2.28, p = 0.52; fixation duration: χ2(3,2571) = 4.17, p = 0.24; average saccade length: χ2(3,3040) = 5.16, p = 0.16; total scanpath distance: χ2(3,465) = 7.11, p = 0.06; averted: number of fixations: χ2(3,484) = 0.49, p = 0.92; fixation duration: χ2(3,3079) = 3.69, p = 0.30; average saccade length: χ2(3,3567) = 5.64, p = 0.13; total scanpath distance: χ2(3,484) = 5.12, p = 0.16).
Monkey H
On direct gaze images, H made the same number of fixations on all facial expressions. We found no significant main effect of expression on fixation duration for H (direct: χ2(2,870) = 1.32, p = 0.52; averted: χ2(2,885) = 3.79, p = 0.15). H made longer saccades on neutrals than threats for direct gaze images (χ2(2,2124) = 10.8, p = 0.004). On averted-gaze images H showed more fixations on fear-grimaces than on threats (χ2(2,260) = 6.57, p = 0.037). H made longer saccades on neutrals than on threats and fear-grimaces (χ2(2,2204) = 17.0, p < 0.001) and longer scanpaths over threats than fear-grimaces with both direct and averted-gaze images (direct: χ2(2,257) = 3.70, p = 0.16; averted: χ2(2,260) = 9.98, p = 0.007).
Feature Preference as a Function of Facial Expression
Monkey T
In general, the scanpaths of monkey T followed the rule that the eyes are the most explored feature in faces. Time spent looking at each feature was calculated separately for direct and averted-gaze images.
On faces with direct gaze (leftmost column of face color maps in Figure 7
), eye-looking was greatest for lip-smacks (darkest color on eye region in Figure 7
A), less for threats and neutrals, and the least for fear-grimace. Mouth-looking was greatest for fear-grimaces and threats and less for lip-smacks and neutrals. Time spent looking at the midface was greatest for fear-grimaces. Ear-looking was greatest for threats and neutrals. For facial expressions with averted gaze, a similar pattern emerged. Eye-looking was increased for lip-smacks and neutral images. Fear-grimaces elicited the most mouth-looking. Midface-looking was greater for lip-smacks and fear-grimaces than threats and neutral images. Ear-looking was greatest for fear-grimaces than threats followed by lip-smacks and also greater for fear-grimaces than neutrals (direct, eyes: χ2(3,1312) = 67.1, p < 0.001; mouth: χ2(3,1312) = 187.2, p < 0.001; midface: χ2(3,1312) = 35.1, p < 0.001; ears: χ2(3,1312) = 36.1, p < 0.001; averted, eyes: χ2(3,1309) = 173.3, p < 0.001; mouth: χ2(3,1309) = 405.8, p < 0.001; midface: χ2(3,1309) = 187.1, p < 0.001; ears: χ2(3,1309) = 49.1, p < 0.001).
Monkey Q
On both direct and averted images, Q’s eye-looking behavior did not differ significantly across facial expressions. Q spent more time looking at the mouth in lip-smacks, fear-grimaces, and threats than neutral images. No differences were observed for midface-looking, however, for direct images ear-looking was increased in neutral and threat compared to fear-grimace. On averted images, only ear-looking over neutral images was increased compared to fear-grimaces (direct, eyes: χ2(3,486) = 7.99, p = 0.46; mouth: χ2(3,486) = 30.1, p < 0.001; midface: χ2(3,486) = 4.45, p = 0.22; ears: χ2(3,486) = 17.5, p < 0.001; averted, eyes: χ2(3,499) = 0.867, p = 0.83; mouth: χ2(3,499) = 52.2, p < 0.001; midface: χ2(3,499) = 2.76, p = 0.43; ears: χ2(3,499) = 10.6, p = 0.014).
Monkey H
While monkeys T and Q were essentially eye-lookers, monkey H was primarily a mouth-looker. On direct gaze images, H looked longest at the mouth of threats and eyes of lip-smacks and neutrals. There were no significant differences between midface-looking among the different facial expressions. Ear-looking was greater for neutral images than for fear-grimaces and threats. On averted-gaze images, a similar pattern emerged, with eye-looking being greatest for lip-smacks and neutral images, mouth-looking being greatest on fear-grimaces, and neutral images attracting more ear-looking than lip-smacks, fear-grimaces, and threats (direct, eyes: χ2(3,299) = 34.77, p < 0.001; mouth: χ2(3,299) = 26.73, p < 0.001; midface: χ2(3,299) = 6.90, p = 0.075; ears: χ2(3,299) = 15.85, p = 0.001; averted, eyes: χ2(3,620) = 87.40, p < 0.001; mouth: χ2(3,620) = 31.56, p < 0.001; midface: χ2(3,620) = 35.93, p < 0.001; ears: χ2(3,620) = 35.93, p < 0.001).
Individual Differences in the Temporal Structure of Scanpaths
As Figure 3
suggests, the monkeys differ not only in where they look on a face, but also in when and in what order they fixate each facial feature. If a monkey develops a fixed sequence of feature visitation, then his gaze is expected to be directed to a specified region at a specified time in a large fraction of trials scanning the same image. Indeed, monkey T looked first at the eyes of all faces, then he turned reliably to a second preferred feature, alternating with remarkable predictability between the eyes and whatever the second preferred feature was for a particular facial expression. For example, the probability of finding T’s gaze after the first 300 ms of the scanpath on the eyes of a threatening face is approximately 90%; for Q this probability is 70% and for H it is below 50%.
The three subjects showed large individual differences in (1) basic scanpath measures, (2) overall feature preference, (3) feature visitation on each facial expression, (4) the influence of gaze direction on scanpaths, and (5) temporal sequence of feature visitation. These differences are likely within the normal range of scanpath variability in macaques, as these three monkeys do not show pathological behaviors. Given that the between-subject differences were typically larger than the within-subject differences, studies using within-subject design are likely to be more sensitive to experimental effects than studies using between-subject comparisons.
Individual differences in the scanpaths reported here appear to be paralleled by genetic and temperamental differences. Similar observations have been amply documented in human studies (e.g., Bradley et al., 2000
; Mogg and Bradley, 2002
; Isaacowitz, 2005
; Perlman et al., 2009
) and a few monkey studies (e.g., Capitanio, 2002
; Watson et al., 2009
). The significance of carefully characterizing scanpath differences in monkeys lies with the possibility to directly measure the function of the structures that are involved in allocating visual attention to emotionally and socially relevant stimuli. In monkeys, these structures are accessible for direct neurophysiological scrutiny. This then allows a neural/mechanistic account of the observed behaviors and precise predictions with regard to the contribution of structures such as the amygdala, the frontal eyes fields, and posterior parietal cortex to the top-down or bottom-up processes that drive visual scanning of natural images.
We found that monkey T, homozygous for the short allele of the serotonin transporter had a tendency to hyperscan images of other monkeys. Figure 7
shows that compared to the other two monkeys, T’s gaze alternates regularly between features. These tendencies were paralleled by similar behaviors, such as pacing anxiously when taunted with food. Based on 3 years of observation, T could be described as an anxious monkey – he has a history of responding with diarrhea to stressful social situations (pair-housing). T is not expressive; he rarely threatens or appeases the caretakers and his behaviors are usually highly predictable with a tendency for repetitive actions. T regularly “dances” in his cage, a combination of pacing and rocking movements, not unusual for individually housed male monkeys (Lutz et al., 2003
). Non-social, potentially dangerous stimuli, such as the rubber snake did not elicit a fearful reaction, only minor hesitation to approach the food. Some aspects of T’s social development might be attributed to lack of close contact with the mother in his first year of life, which has been showed to compensate for some of the risks inherent with the S/S genotype (Champoux et al., 2002
). T and Q showed different scanpaths for faces with direct vs. averted gaze. All four facial expressions in T and three of four in Q, averted-gaze images triggered longer scanpaths (Figure 6
) suggesting that averted gaze indeed facilitates exploration and engages attention (Emery et al., 1997
; Ferrari, et al., 2000
; Deaner and Platt, 2003
; Shepherd et al., 2006
). This finding correlates with the greater autonomic arousal in monkeys looking at facial expressions with averted gaze, which is likewise correlated with higher activation of the central nucleus of the amygdala (Hoffman et al., 2007
).
In contrast, H’s scanpaths are less dynamic, and less differentiating between facial expressions and gaze direction than the scanpaths of T. H does not pace when taunted with food, he remains relatively still throughout the trial (although he is the most vocal of the three monkeys), and he is highly expressive. Based on our observations, H tends to respond aggressively and unpredictably to gestures of caretakers that he perceives as threatening, and he shows clear fear of the rubber snake. H is the only mouth-looker, which might be related to his fearfulness. H failed to establish bonds with Q and T, and attempts at pair-housing failed because he got involved in violent fights with each of them. The unfortunate need to separate the monkeys means that we have no data available regarding the current dominance hierarchy.
The scanpaths of monkey Q are significantly different from the scanpaths of T and H on several basic measures. Q had the most restricted scanning–per trial, he made the shortest scanpaths with far fewer saccades than both T and H. Previous studies have shown that Q has superior face recognition skills compared to T and H (Gothard et al., 2009
). Of the three monkeys, Q interacts the most with human caretakers. Q was judicious about the snake; after a brief period of avoiding the novel stimulus, he approached and investigated it. Q’s scanpaths suggest efficiency in scanning faces – he explores primarily the eyes, but otherwise samples the other regions of the face over multiple image presentations, as evidenced by his diffuse heatmap.
Finally, we show that the temporal pattern of feature visitation can be a useful aspect of scanpath analysis that differentiates among individuals (oscillating feature visitation in T where every other saccade targets the eyes) and across facial expressions (the second saccade targets the mouth primarily in fear-grimace and threat images where a large part of the display is expressed by the mouth).
Overall these experiments suggest that detailed analysis of the scanpaths can convey information about the image (e.g., the scanpath length is increased for averted gaze) but also about the individual (alternating regularly between features is correlated with tendencies toward anxious and repetitive behaviors). These predictions are valid, however, only in the context of individual variation.
Although we interpret the differences between the scanpaths of the three monkeys as within a normal range of variability, we note that abnormal scanpaths during face viewing have been documented in many psychiatric disorders including schizophrenia, autism, bipolar disorder, and anxiety disorders (Kee et al., 1998
; Williams et al., 1999
; Loughland et al., 2002
; Pelphrey et al., 2002
; Minassian et al., 2005
). Indeed, the majority of mental disorders are associated with changes in eye movements. This observation gives scanpaths the potential to be used as powerful behavioral biomarkers. Recently, individual scanpaths characteristics in humans have been linked to gene polymorphisms. The serotonin transporter linked polymorphic region (5-HTTLPR) functions in transcriptional control for the gene encoding a synaptic serotonin transporter. Champoux et al. (2002)
have shown that certain genotypes interact with averse early-life experiences to cause differences in orientation, affective, and attentional capabilities later in life. Certain genotypes can also be shown to correlate to scanpath parameters, such as eye-looking and willingness to view faces of high status monkeys (Watson et al., 2009
). The small number of monkeys participating in this study not withstanding, it appears that genotype is somewhat predictive of the characteristic scanpath features in monkeys as well. Taken together, further study of scanpaths, especially in experimentally tractable organisms such as rhesus macaques, should be undertaken and should remain informed by the findings of individual variation presented in this study.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work has been supported by NIMH grant MH070836 to K. Gothard and HHMI 52005889 that supported R. Gibboni while a student in the Undergraduate Biological Research Program at the University of Arizona. We thank Dr. Ryan Sprissler, University of Arizona Genetics Core, ARL- Biotechnology, for the expertise, time, and resources devoted to genotype the monkeys. We thank Christopher Laine for help with data analysis, Dr. Kevin Spitler for data collection, Clayton Mosher, for help collecting the data and for editing the manuscript, and the California National Primate Research Center for the allowing us to generate the stimulus set used in these studies. We also thank Dr. Michael Cohen for useful comments on the manuscript and Dr. Michael Rand for helping with the behavioral tests.
Wendland, J., Lesch, K. P., Newman, T. K., Timme, A., Gachot-Neveu, H., Thierry, B., and Suomi, S. J. (2006). Differential functional variability of serotonin transporter and monoamine oxidase A genes in macaque species displaying contrasting levels of aggression-related behavior. Behav. Genet. 36, 163–172.