1
Brain Imaging and Analysis Center, Duke University, Durham, NC, USA
2
Center for Neuroeconomic Studies, Duke University, Durham, NC, USA
3
Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
Contingency learning is fundamental to cognition. Knowledge about environmental contingencies allows behavioral flexibility, as executive control processes accommodate the demands of novel or changing environments. Studies of experiential learning have focused on the relationship between actions and the values of associated outcomes. However, outcome values have often been confounded with the physical changes in the outcomes themselves. Here, we dissociated contingency learning into valuative and non-valuative forms, using a novel version of the two-alternative choice task, while measuring the neural effects of contingency changes using functional magnetic resonance imaging (fMRI). Changes in value-relevant contingencies evoked activation in the lateral prefrontal cortex (LPFC), posterior parietal cortex (PPC), and dorsomedial prefrontal cortex (DMPFC) consistent with prior results (e.g., reversal-learning paradigms). Changes in physical contingencies unrelated to value or to action produced similar activations within the LPFC, indicating that LPFC may engage in generalized contingency learning that is not specific to valuation. In contrast, contingency changes that required behavioral shifts evoked activation localized to the DMPFC, supplementary motor, and precentral cortices, suggesting that these regions play more specific roles within the executive control of behavior.
Contingency learning is a fundamental component of cognition. By identifying the relationships between actions and events, humans and other animals can produce goal-directed and flexible behavior that accounts for changes in their environments. Models of goal-directed behavior, such as those of reinforcement learning, have traditionally examined the contingencies between actions and their rewarding or punishing outcomes (Thorndike, 1898
; Pavlov, 1928
; Skinner, 1938
; Herrnstein, 1970
). Such models can account for simple reward-seeking behaviors (e.g., foraging), and even describe quite complex aspects of behavior and decision making (Sutton and Barto, 1998
). However, effective behavior requires learning not only about relations between actions and received rewards, but also about environmental contingencies that do not themselves influence reward outcomes (Tolman, 1932
), such as information about the available options, state space, or stimulus-stimulus contingencies.
Studies of the neural basis of contingency learning – often using reversal tasks in which reward contingencies change unexpectedly – have identified a host of involved brain areas (Cools et al., 2002
; O’Doherty et al., 2003
; Remijnse et al., 2005
; Xue et al., 2008
). Collectively, many of these regions have been described as constituting the dorsal executive network (Duncan and Owen, 2000
) or central executive (Goldman-Rakic, 1996
). This network may contribute to contingency learning in other non-rewarding contexts, as well. For example, regions of lateral and medial prefrontal cortex exhibit increased activation to events that violate local temporal patterns of stimuli, even when those patterns arise because of random chance and even in the absence of explicit awareness (Squires et al., 1976
; Huettel et al., 2002
; Fan et al., 2007
). Such findings fit the broad theory that the dorsal executive network detects environmental changes and implements cognitive processes necessary for modifying behavior appropriately (Wise et al., 1996
; Botvinick et al., 2001
; Miller and Cohen, 2001
; Ridderinkhof et al., 2004
; Walton et al., 2004
; Mansouri et al., 2009
). The resulting behavioral changes are postulated to reflect biasing signals directed at other brain systems (Botvinick et al., 2001
; Miller and Cohen, 2001
). However, most prior studies of executive control have used tasks with co-occurring contingency changes and engagement of executive control mechanisms.
Here, we adapted the classic two-option reversal-learning paradigm to dissociate multiple forms of contingency learning. Across a series of rapidly presented trials, the environmental contingencies changed in three independent ways: (1) changes in outcome valence, which resulted in a behavioral change on the next trial; (2) changes in outcome magnitude, which affected obtained rewards but did not produce a behavioral change; and (3) changes in the physical effect of an action, through new visual feedback that was completely unrelated to rewards or required actions. We hypothesized that brain areas previously implicated in learning and responding to changes in action-outcome contingencies also process contingency changes that are behaviorally and motivationally irrelevant. If obtained, such results would indicate that these regions support the generalized updating of models about the current environment, including but not limited to the changes in the anticipated value of actions, with other regions contributing to the executive control of action.
Participants
Fourteen healthy, neurologically normal young-adult volunteers (six female; age range: 18–29 years; mean age: 22.4 years) participated in a single fMRI session. All participants acclimated to the testing environment using a mock MRI scanner. Two participants were removed from the analyses, one because of technical issues with stimulus presentation and the other because of excessive head motion. All participants gave written informed consent as part of a protocol approved by the Institutional Review Board of Duke University Medical Center. Subjects’ payment was contingent on the choices made during the experiment (mean payment: $49 out of a possible $50).
Task
Participants engaged in a modified two-alternative choice task (Figure 1
A). As a framing story, each participant was told to act as an investment broker who selects between two factories in which to invest. On each trial, one of the two choice options resulted in a monetary gain, while the other resulted in an equal magnitude loss (range: ±18 to 93�). The outcome comprised two simultaneously presented visual components: the received value and an abstract visual object (described as the product of the factory). A total of eight different objects were presented across all trials, constructed through the factorial combination of two values for each of three dimensions (shape, color, and orientation of a diagonal slash). Participants were given explicit instructions that the abstract objects were not predictive of future outcomes nor of changes in the value of those outcomes; moreover, they were told that outcomes could alter after as few as one trial, but that on average the outcomes would remain stable for several trials at a time. Participants were instructed that their commission (i.e., payment) was proportional to the total amount earned across all decisions.
Figure 1. Task. (A) Example trial structure. Following an initial stimulus display, the onset of the trial was signaled by a change in color of the response circles. When the participant indicated the choice for that trial, the selected option changed color, whereupon the outcome of the trial was indicated by a visual object and a monetary reward. Trials were each 2.8 s, with 1.4 s for response and choice presentation and 1.4 s for outcome presentation. Intertrial intervals ranged between 0.1 and 0.5 s, to prevent subjects from predicting the onset of each trial. (B) Possible outcomes derived from different changes in contingencies. On Standard trials, the outcome was maintained from the prior trial. In Reversal Changes, the action-outcome contingencies switch, and thus the participant loses the amount of money they were expecting to gain. In Value Changes, the participant gains money, but either more or less than they expected. Finally, in Effect Changes, the participant receives the amount of money they were expecting, but the visual stimulus changes.
Value and effect contingencies were altered independently, through four different possible trial types (see Figure 1
B). The first type comprised the Standard trials (76% of total), where no stimuli, rewards, or actions changed from the previous trial. The second was a Reversal Change (6% of trials) in which the associated values of each option switch sign (without changing magnitude); this trial type is equivalent to the critical events in the canonical reversal learning task. Given the complementary relationship between the valences of the two options, Reversal Changes should guide the participants to select the other option on the next trial. The third type of trials involved Value Change (6% of trials), such that the absolute magnitude of the current value was changed, either up or down, without a change of sign. Because the participants still received a positive reward (albeit not the expected amount), they should continue to select the same option on succeeding trials. The fourth type of trials, called Effect Change (12% of trials, split evenly between one- and two-dimensional), involved a behaviorally- and motivationally-irrelevant contingency change in the visual object associated with each action (i.e., the object shown on the screen). This could involve either a change in one dimension of the object (e.g., color alone) or two dimensions of the object (e.g., both color and shape), with equal probability. However, because the reward value remained constant, the participant should continue selecting the same option on future trials.
Participants fixated on a cross at the center of the display throughout the experiment. Failure to respond in the allotted time (1.4 s after the trial start) resulted in a monetary loss equivalent to the value of the worse option on the current trial. To ensure comprehension of instructions and to provide experience with task contingencies, all participants performed a 7-min behavioral training session before the fMRI experiment.
Participants carried out four runs of this task while in the scanner, each consisting of 150 trials including 10 reversal changes, 10 value changes, 10 one-dimensional effect changes, and 10 two-dimensional effect changes. We used a constant hazard probability for contingency shifts of p ∼ 0.28. Each run also included four non-task pauses, in which the inter-trial interval was extended to 10–20 s. We predetermined the timing and order of contingency shifts to maximize the statistical dissociation between the hypothesized hemodynamic responses evoked by each contingency shift type. The order of runs was randomized across participants. Stimuli presentation and behavioral data collection were carried out using the Psychophysics Toolbox for Matlab (Brainard, 1997
), with stimuli presented through MR-Compatible LCD goggles and behavioral selections made using the first two fingers on the right hand on an MR-compatible response box.
fMRI Data Collection and Analysis
We acquired data with a 4T GE scanner using an inverse-spiral pulse sequence with standard imaging parameters [TR = 2000 ms; TE = 30 ms; 34 axial slices parallel to the AC-PC plane; voxel size of 3.75 × 3.75 × 3.8 mm]. High resolution 3D full-brain SPGR anatomical images were acquired and used for normalizing and co-registering individual participants’ data.
Analyses were performed using FEAT (FMRI Expert Analysis Tool) Version 5.92, part of the FSL package (Smith et al., 2004
; Woolrich et al., 2009
). The following pre-statistics processing steps were applied: motion correction using MCFLIRT, slice-timing correction, removal of non-brain voxels using BET (Smith, 2002
), spatial smoothing with a Gaussian kernel of full-width-half-maximum of 8 mm, and 50 ms high-pass temporal filtering. Registration to high resolution and standard images was carried out using FLIRT (Jenkinson and Smith, 2001
).
Our first-level FEAT model contained five factors: three coding the onsets of contingency changes (e.g., Reversal Changes, Value Changes, and Effect Changes) and the fourth a nuisance variable to code for the infrequent missed trials. An impulse of unit duration and unit weight was used for each of these events. The fifth factor coded the timing of the non-task pauses by their duration and a unit weight. Each of these factors was then convolved with a double-gamma hemodynamic response function to create the final regressors within our design matrix. Of note, this model uses the Standard trials as a task-related baseline. Thus, activations associated with the performance of Standard trials, such as stimuli presentation and motor response execution, are controlled by comparison to the baseline fMRI signal.
Second-level FEAT analyses to combine runs within-participants used a fixed-effects model, while third-level, across-participants analyses used FLAME (stages 1 and 2) random-effects analysis, with automatic outlier de-weighting (Woolrich, 2008
). All statistical inferences, including data visualization, use whole-brain-corrected cluster-significance thresholds of p < 0.05 (z > 2.3). Finally, to quantify the percent change in activation across different contingency change types, we created spheres with 8 mm radii around centroids of our functionally-defined regions-of-interest (ROIs) using MRICRON (Rorden et al., 2007
).
Activation cluster peaks presented in Tables 1
, 2
, and 4
were produced using FSL. For each table, 30 peaks were determined for each cluster, and labeled with their Harvard-Oxford designation using FSLview. Only the peak voxel for each anatomical designation is listed. Activation Table 3
was produced using MRICRON to calculate the centroid of each cluster.
Behavioral Data
Following a random guess on the first trial, optimal behavior in this task was to select the option that was rewarded on the previous trial (i.e., follow a win-stay/lose-shift strategy, WSLS). A feature of this task is that subjects should engage in one-trial learning, which minimizes the problems of temporal credit assignment that occur within probabilistic reversal learning tasks. Note that given the low likelihood and unpredictability of the reversal change trials, attempts to predict such shifts in value contingencies would reduce overall payment. Thus, we measured behavioral performance in reference to an optimal WSLS strategy. On average, participants performed the WSLS strategy on 99.4% of trials, with no individual run for any participant below 95% performance. All participants described (in their own words) following a WSLS strategy in a post-study questionnaire. In addition, very few trials were missed due to slow responses (mean: 0.7%), with only one participant missing over 2% of trials in any individual run.
To examine the effects of contingency changes upon subsequent behavior, we evaluated whether participants’ response times were slowed on the trial following each type of contingency change. Following standard trials, the mean response time across participants was 396 ms (SD: 144 ms). Response time increased significantly following each type of contingency change (repeated measures ANOVA, main effect p < 0.05): for reversal changes, 418 ms; for value changes, 411 ms; for one-dimensional effect changes, 410 ms; and for two-dimensional effect changes, 425 ms. Thus, our behavioral data indicate that participants performed at a nearly optimal level throughout the experiment, but were nevertheless influenced by each sort of contingency change.
fMRI Data
We first examined the main effect of Reversal Change, as a contrast between this event regressor and the Standard trials baseline. Significant areas of activation included posterior parietal cortex (PPC), anterior insula cortex (aINS), lateral prefrontal cortex (LPFC), dorsomedial prefrontal cortex (DMPFC), precuneus, supplementary motor cortex (SMC), and precentral cortex (Figure 2
A and Table 1
). This set of regions replicates that described by previous studies in which contingency changes involved behavioral shifts (Cools et al., 2002
; O’Doherty et al., 2003
; Remijnse et al., 2005
; Xue et al., 2008
). For example, Cools et al. (2002)
found greater activation in the LPFC for reversal errors relative to probabilistic errors.
Figure 2. Neural activation in response to contingency changes. (A) On Reversal Change trials, the contingencies between actions and their rewarding outcomes switch, so that the participant should select a different action on the subsequent trial. Such changes evoke activation in a dorsal executive network comprising regions of medial and lateral prefrontal cortex, parietal cortex, and insular cortex, among other regions. (B) On Effect Change trials, only the visual effect of the action changes; the outcome has the same value, and the participant does not need to switch behavioral responses. Activation was again observed in lateral prefrontal and parietal cortices, along with regions of visual cortex.
Next, we examined the main effect of Value Change, again as a contrast between this task-related regressor and the Standard trial baseline. No brain regions survived our standard statistical criterion. This absence of activation could be due to the specific bimodal distribution of value changes in our task, which contained both large negative events (i.e., Reversal changes) and small positive and negative changes (i.e., Value changes).
Finally, we identified the main effect of Effect Change, contrasting this regressor with the Standard trial baseline. We found significant activations in the striatum, PPC, LPFC, aINS, and temporal cortices (Figure 2
B and Table 2
). This pattern of activation contains many of the same regions as observed for Reversal Change trials, consistent with the interpretation that similar networks process each sort of contingency change.
To determine regions whose activation increased significantly for both Reversal Changes (i.e., valuative contingency shifts) and Effect Changes (i.e., non-valuative contingency shifts), we examined the intersection of voxels whose activation increased significantly to both types of change, independently (voxelwise z > 2.3, with whole-brain cluster correction of p < 0.05). This conjunction analysis revealed activations in aINS, PPC, and LPFC (Figure 3 and Table 3
). ROI analyses were performed to compare the levels of activation produced by the reversal and effect changes. No significant differences between the reversal and effect changes were found within these co-activated regions (Figures 3
B–D).
Figure 3. Conjunction of activations for Reversal Changes and Effect Changes. (A) Shown are regions that exhibited significant activation in both the Reversal Change and Effect Change conditions. Subsequent functional ROI analyses extracted the relative signal change evoked for each trial type – Reversal Changes, Value Changes, and Effect Changes – within the (B) right anterior insula cortex, (C) right posterior parietal cortex (PPC), and (D) right middle frontal gyrus (MFG). Horizontal lines indicate pairs of conditions with significant differences in activation amplitude (p < 0.05).
To identify regions whose activation differed based on the nature of contingency change, we examined the contrasts of Effect Change > Reversal Change, and Reversal Change > Effect Change; (Figure 4 and Table 4
). Effect Changes only produced significantly greater activations in a small number of posterior regions (including the middle temporal gyrus) that have been previously implicated in object processing (Martin et al., 1996
; Weisberg et al., 2007
). Reversal Changes produced significantly greater activations in the DMPFC, SMC, and precentral gyrus, regions previously implicated in the selection of actions and the production of motor responses. ROI analyses of these regions are presented in Figures 4
B–D.
Figure 4. Contrasts between Reversal Changes and Effect Changes. (A) Shown are regions that exhibited significant differences in activation between the Reversal Change and Effect Change conditions. Functional ROI analyses extracted the relative signal change evoked for each trial type – Reversal Changes, Value Changes, and Effect Changes – within the (B) right middle temporal gyrus (MTG), (C) right precentral gyrus (PCG), and (D) dorsomedial prefrontal cortex (DMPFC). Horizontal lines indicate pairs of conditions with significant differences in activation amplitude (p < 0.05).
Humans, like many other animals, possess a high degree of behavioral flexibility. We can learn the values of different actions and can select new courses of behavior when those values change. However, human learning extends well beyond action-value contingencies to include learning about the physical effects of our actions, which can be generalized to novel situations with new valuative contingencies. Although numerous studies have explored the neural basis of valuative contingency learning, many have confounded the change in the reward with the physical change in the rewarding stimulus. Here, we show that, when compared within the same task, the brain regions associated with learning reward-action contingencies are also engaged by behaviorally- and reward-irrelevant contingency changes.
Valuative vs. Non-Valuative Contingency Processing
Our data suggest that the prefrontal and parietal cortex activations associated with value learning could be attributable to the co-occurring physical changes in the rewarding stimulus. This strong interpretation is supported by the differential effects of prefrontal lesions across a range of species. Lesions in the ventromedial prefrontal cortex (including orbitofrontal areas) appear to remove the motivation or ability to learn values and respond appropriately, such as in the reversal-learning task (Doar et al., 1987
; Bechara et al., 1994
; Dias et al., 1996
; Hornak et al., 2004
; Izquierdo et al., 2004
; Rudebeck and Murray, 2008
). In contrast, LPFC lesions disrupt the learning or accessing of contingency information, such as in an extradimensional set shifting task (Owen et al., 1991
; Dias et al., 1996
; Hornak et al., 2004
; see also Barcelo et al., 2007
). Alternatively, LPFC could contribute to both sorts of contingencies: the physical effect of our action and the valuation of that effect. These two interpretations cannot be distinguished when examining value learning as a categorical change, as in the reversal learning task, as new value contingencies also reflect newly rewarded stimuli.
The key to determining which of these interpretations is correct is to parametrically dissociate the change in value from the changes in the physical stimulus. Studies which have examined the processing of parametric value signals using fMRI and physiological recording techniques have identified several brain regions which appear to encode parametric value signals, including the medial prefrontal cortex, orbitofrontal cortex, amygdala, dorsal and ventral striatum, nucleus accumbens, and posterior parietal cortex (PPC) (O’Doherty et al., 2001
, 2003
; Plassmann et al., 2007
, 2008
; Delgado et al., 2003
; Dorris and Glimcher, 2004
; Knutson et al., 2005
; Kable and Glimcher, 2007
; Lau and Glimcher, 2007
; Hare et al., 2008
; Schiller et al., 2008
). Notably absent from this list of value-encoding regions are the MFG, IFG, and aINS, in which activation was evoked during both the Reversal Change and Effect Change trials. This suggests these lateral prefrontal and insular cortices encode the non-valuative contingency information related to learning about the environment.
Of the brain areas we identified as processing multiple sorts of contingency information, only the PPC has previously been implicated in processing parametric value information (Platt and Glimcher, 1999
; Dorris and Glimcher, 2004
). This suggests that the PPC may play a role in the integration of actions with both valuative and non-valuative outcomes (Assad, 2003
). This agrees with the hypothesis that the PPC acts as a decision map, relating actions to the expected value of their effects (Platt and Glimcher, 1999
; Beck et al., 2008
; Churchland et al., 2008
). Such an integrative role could provide a unitary framework for the myriad functions supported by PPC subregions, such as in multi-modal integration of sensory input (Cohen and Andersen, 2000
; Toth and Assad, 2002
; Cohen et al., 2004
; Mullette-Gillman et al., 2005
), the attentional-intentional processes relating to motor planning (Colby and Goldberg, 1999
; Snyder et al., 2000
), working memory (Stoet and Snyder, 2004
; Vingerhoets, 2008
), and visuomotor learning (Grafton et al., 2008
).
Contingency vs. Control Processing
An important question is how contingency processing interacts with the executive control processes responsible for producing behavioral changes when necessitated by a contingency change. The LPFC and aINS have long been hypothesized to play a critical role in working memory and other executive functions, based upon converging evidence from single-unit (Goldman-Rakic, 1996
; Chafee and Goldman-Rakic, 1998
), lesion (Doar et al., 1987
; Hornak et al., 2004
), and fMRI studies (Elliott et al., 1997
; Rowe et al., 2000
). Our LPFC and aINS activations during both reversal changes and effect changes agree with the interpretation that the prefrontal cortex is involved in forming, updating, and accessing models that relate stimuli to actions (Passingham, 1975
; Cohen and Servan-Schreiber, 1992
; Grafman et al., 1994
; Wise et al., 1996
; Miller and Cohen, 2001
).
Co-occurring LPFC, aINS, and DMPFC activity has also been observed across a large number of studies examining executive control processes, including auditory detection, pattern detection, working memory, and response selection (Goldman-Rakic, 1996
; McCarthy et al., 1997
; Duncan and Owen, 2000
; Miller and Cohen, 2001
; Huettel et al., 2002
; Robbins, 2007
; Hyafil et al., 2009
). Models of executive control processing have suggested that the DMPFC (referred to as anterior cingulate cortex, ACC) detects changes in contingencies and then activates the LPFC, which exerts executive control over behavior by biasing activity in other brain areas (Botvinick et al., 2001
; Miller and Cohen, 2001
; Ridderinkhof et al., 2004
; Walton et al., 2004
; Behrens et al., 2007
; Mansouri et al., 2009
). However, a recent study by Sridharan et al. (2008)
using Granger causality analysis, found the aINS exhibited a causal influence on the LPFC and DMPFC (referred to as ACC, and anterior to our specific DMPFC activation) (Sridharan et al., 2008
), an inversion of the directionality of influence suggested by the previously mentioned executive function models (see also Markela-Lerenc et al., 2004
).
We suggest that these discrepancies reflect, at least in part, the conflation of contingency detection and control processes in many paradigms. For example, although activations of the LPFC during an oddball task are often described in terms of behavioral control or inhibition, these activations have been shown to be produced by contingency changes in the mapping of stimuli to responses independently of changes in the specific motor response (Huettel and McCarthy, 2004
). Similarly, Carter et al. (2006)
found activity in the LPFC (specifically, MFG) correlated with the trial-by-trial level of explicit contingency knowledge during a classical conditioning paradigm in which there was no rewarded, or ‘correct’, response (Carter et al., 2006
). These studies show that the LPFC processes contingency information in the absence of engaged control processes.
Our task allowed the dissociation of contingency and executive control processing within the same task to examine the functional roles of these brain areas. The contrast of Reversal Change > Effect Change allowed us to determine which brain areas are significantly more activated during the engagement of control and concurrent valuation processes, while controlling for contingency changes. We found increased activations in the posterior dorsomedial prefrontal cortex (DMPFC, including dorsal anterior cingulate), supplementary cortex (SMC), and precentral cortex during reversal changes contrasted with effect changes. As contingency change detection occurs in both Effect and Reversal Changes, this suggests that the aINS and LPFC process the contingency change (and possibly, exert the required cognitive control), with activation of the DMPFC only when control processes are required to produce a change in behavioral response. This is consistent with the directionality found by recent studies (Markela-Lerenc et al., 2004
; Sridharan et al., 2008
), and suggests dissociable functional roles for these brain areas which are an inversion of previous models.
Conclusions
We examined the potentially distinct neural mechanisms underlying behaviorally relevant valuative and behaviorally irrelevant non-valuative contingency learning. We found that the brain areas previously suggested to be involved in valuative contingency learning also contribute to the processing of behaviorally and motivationally irrelevant contingency changes. This suggests two key conclusions. First, the processing of value information may co-opt a more general executive system for contingency learning. Second, because non-valuative contingency changes are behaviorally irrelevant, the executive system may play an informational rather than control role in many tasks.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank McKell Carter, John Clithero, Yale Cohen, Brandi Newell, David Smith, and Lihong Wang for comments on data analysis and the manuscript. This research was supported by the US National Institute of Mental Health (NIMH-70685) and by the US National Institute of Neurological Disease and Stroke (NINDS-41328). SAH is supported by an Incubator Award from the Duke Institute for Brain Sciences.
Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F., Behrens, T. E., Johansen-Berg, H., Bannister, P. R., De Luca, M., Drobnjak, I., Flitney, D. E., Niazy, R. K., Saunders, J., Vickers, J., Zhang, Y., De Stefano, N., Brady, J. M., and Matthews, P. M. (2004). Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23(Suppl. 1), S208–S219.