Testing the somatic marker hypothesis in decisions-from-experience with non-stationary outcome probabilities

Wright, Rebecca J.; Rakow, Tim

doi:10.3389/fpsyg.2023.1195009

ORIGINAL RESEARCH article

Front. Psychol., 28 July 2023

Sec. Decision Neuroscience

Volume 14 - 2023 | https://doi.org/10.3389/fpsyg.2023.1195009

This article is part of the Research TopicSomatic Marker Hypothesis, Iowa Gambling Task, and Neuroeconomics: The Neural Correlates of Rationality and Emotion in Decision Under UncertaintyView all 4 articles

Testing the somatic marker hypothesis in decisions-from-experience with non-stationary outcome probabilities

Rebecca J. Wright

Tim Rakow^*^†

Department of Psychology, University of Essex, Colchester, United Kingdom

Introduction: The Somatic Marker Hypothesis (SMH) posits that in experience-based choice, people develop physiological reactions that mark options as either positive or negative. These somatic markers aid decision making because they differentiate between “good” and “bad” options during pre-choice deliberation.

Methods: We examined this proposed role for somatic states in two decision-from-experience tasks (each N = 36) in which participants selected repeatedly with full feedback (i.e., for obtained and forgone outcomes) between two unlabeled options that returned wins or losses, with half receiving an additional summary of past outcomes. The probabilities of good and bad outcomes changed at an unannounced point. Participants completed a 100-trial game with a switch in the optimal option after trial 40 (Study 1) or a 200-trial game with switch points after trial 40 and trial 120 (Study 2). Skin conductance (SC) was measured continuously as an index of emotional intensity, from which we extracted measures of anticipatory SC (pre-choice) and outcome SC (post-choice).

Results: Participants reliably selected the optimal option prior to any switches. They also altered their choices appropriately when the payoffs changed, though optimal play following payoff switches was reduced. Losses resulted in a greater outcome SC than wins, but only in Study 1, as did the finding that the outcome SC was greater when the forgone outcome was positive. Anticipatory SC did not reliably predict optimal play in either study.

Discussion: These results provide little support for the SMH. Our studies point to the importance of using diverse tasks and measures and very large sample sizes when testing the role of somatic states in decision making.

Introduction

In decisions-from-experience, one learns about the options by observing past outcomes (Rakow and Newell, 2010). The research tasks used to investigate how people make such decisions reflect a variety of everyday decisions. For example, Lejarraga et al. (2016) showed participants the day-by-day outcomes for different treatment options that differed in their (observed) rate of side effects. Lejarraga et al. (2016) compared participants' treatment preferences with choices for structurally equivalent decisions involving monetary outcomes. Liang et al. (2022) used a microworld to examine decisions in response to rare disasters, comparing reactions to disasters experienced first-hand with disasters observed at a distance (either near or far). In the context of interpersonal decisions, Denrell (2005) noted that the impressions we form about others often arise from samples of observed interactions, which then guide our decisions about who to interact with.

One feature of real-world decisions-from-experience is that the world is not static: the climate changes, financial markets boom or bust, and people or organizations change. Thus, while past events can be a guide to what to expect in the future, the world may change such that successful courses of action become suboptimal. In this study, we use a laboratory task to examine whether and how experience-based choices adapt to such changes, with a specific focus on the role of somatic states in such choices. In doing so, we build on previous investigations that have used the Iowa gambling task to examine the somatic marker hypothesis.

Reversal learning in the Iowa gambling task and other decision-from-experience tasks

The Iowa gambling task (IGT; Bechara et al., 1994) is an example of decision-making from experience. In the IGT, wins and losses from card draws from four decks must be experienced to work out which decks are advantageous. The IGT was devised to test the somatic marker hypothesis (SMH), which posits that emotions play a key role in decision-making under uncertainty (Damasio, 1994). Specifically, as a result of their previous choices, people develop physiological reactions that mark options as either positive or negative. These somatic markers then differentiate “good” options from “bad” ones when those options are considered again. In time, this mechanism guides people away from making bad choices.

The original version of the IGT was set up so that the large losses in a disadvantageous deck with infrequent losses occur relatively late in the deck, thereby giving the initial impression of it being an advantageous deck. To play optimally, participants must therefore perform reversal learning—changing their behavior when an ostensibly advantageous deck is seen to be disadvantageous. Poor reversal learning has been found in patients with impairment to the ventromedial prefrontal cortex (VMPFC). This could, therefore, explain their poor performance on the IGT (Dunn et al., 2006).

Successful (reversal) learning in decisions-from-experience requires response inhibition. For example, to succeed at the IGT, one must inhibit a win–stay/lose–shift response pattern (Restle, 1958) when experiencing a loss from an advantageous deck. VMPFC patients' failure to pick from the advantageous decks could reflect the inability to inhibit this response. To investigate this, Fellows and Farah (2005) shuffled the card order of the IGT decks so that the disadvantageous decks were no longer initially the better decks. Fellows and Farah found that the task performance of VMPFC patients became similar to that of control participants. They concluded that poor IGT performance in this patient group was primarily due to impaired reversal learning. Consistent with this, after examining participants' conscious knowledge in the IGT, Maia and McClelland (2004) concluded that—contrary to the SMH—there was no need to use non-conscious somatic markers to guide choice. Therefore, the poor IGT performance of VMPFC patients could be better explained by poor reversal learning.

However, in response to Maia and McClelland (2004), Bechara et al. (2005) argued that reversal learning is not the only requirement for successful performance in the IGT. Rather, they proposed that the SMH could explain reversal learning via the development of a somatic “stop signal” in response to experiencing bad outcomes from card draws. These markers are greater before selecting from the disadvantageous decks, due to the poorer outcomes experienced, and this facilitates the shift to picking more from the advantageous decks. Further support that reversal learning is not the only necessary skill for successful IGT performance comes from Turnbull et al. (2006). People with schizophrenia have been shown to have difficulty with flexible behaviors, but performance in their emotion-based learning has more mixed results. To assess both these types of abilities in this patient group, Turnbull et al. (2006) employed the IGT. Participants initially played 100 trials of the original IGT but the good decks then shifted in three further 40-trial phases: Decks A and D, A and B, and B and C successively became good decks during the three shift periods. The participants with schizophrenia played advantageously (comparable to controls) in the first 100 IGT trials. However, performance during the shift phases was near chance-level for the patient group with the most severe negative symptoms, thereby suggesting that good performance on the IGT can be acquired in the presence of poor reversal learning.

In this study, we use a two-option decision-from-experience task where the probability distribution is non-stationary. Thus, the probabilities of good/poor outcomes change at pre-determined (but unsignalled) points during the task. Additionally, we measure skin conductance responses (SCRs) to examine the role of somatic states in a task that requires reversal learning. Because it is debated how much reversal learning is needed in the IGT (Maia and McClelland, 2004, 2005; Bechara et al., 2005), utilizing a simpler decision task with a defined point at which the optimum option changes should help to identify the utility of somatic states in reversal learning. In doing so, our investigation adds to the relatively small body of research that uses a decision task other than the IGT to test the SMH (e.g., Wright and Rakow, 2017). This is important because any theory of everyday decision-making under uncertainty should undergo tests using a range of tasks so that the generalizability and boundary conditions of the theory are established. Another key feature of our investigation is that we measure anticipatory SCRs to test predictions from the SMH. Given the SMH's proposed role for somatic states in guiding choice prospectively and the importance of the SCR data that—almost 30 years ago—brought the SMH to prominence (e.g., Bechara et al., 1994), it reflects poorly on the field that such data are relatively sparse. For example, a meta-analysis by Simonovic et al. (2019) identified only 20 IGT studies in non-clinical populations (published in 16 articles) that used anticipatory SCRs to test the SMH (this, from 3,999 articles that mentioned the IGT in the title or abstract). Therefore, our paper is valuable because it adds two further studies using anticipatory SCRs to this small but important body of SMH research. A further key feature of our investigation is that we manipulate (between-subjects) the presence of some descriptive information about the option outcomes. This allows us to examine whether and how descriptive information moderates preference and/or the role of somatic states in decisions-from-experience.

How decision processes differ between decisions-from-experience and decisions-from-description

After considering a description of two options, Option A offering a 90% chance of nothing and a 10% of £10 and Option B offering £1 for certain, most individuals choose gamble A. This choice reflects a general pattern: when faced with described gambles having small probabilities for the most extreme outcome, most individuals are risk-seeking for gains and risk-averse for losses. Prospect theory (Kahneman and Tversky, 1979) explains such preferences via as-if decision weights, which are transformations of the probabilities for each outcome. Each decision weight is multiplied by its respective outcomes to compute an overall value for each outcome. According to prospect theory, small probabilities are as-if overweighted. This overweighting makes gambles with a small probability of a large gain highly attractive and makes gambles with a small probability of a large loss very unattractive.

However, this pattern of preference often reverses if the option payoff distributions are experienced as a sequence of observations—such as by drawing cards in the IGT or by selecting from on-screen buttons or “money machines” to reveal outcomes. In such decisions-from-experience, most individuals prefer Option B in the above example. Barron and Erev (2003) highlighted this “description–experience gap,” which implies the underweighting of rare events in decisions-from-experience.¹ This description–experience gap has since been well-documented (Rakow and Newell, 2010; Wulff et al., 2018) including for some choices that do not involve small probabilities (Ludvig and Spetch, 2011). However, the reasons for this description–experience gap remain a point of debate (Wulff et al., 2018). One possibility is that the neural or somatic representation of outcomes and/or options differs between decisions-from-description and decisions-from-experience (Glöckner et al., 2012). Thus, our tests of the distinctive pattern of somatic responses that the SMH predicts for decisions-from-experience have implications for a wider debate about the processes at work in decisions-from-experience.

Decisions-from-experience with non-stationary payoffs

The majority of studies examining repeated decisions-from-experience utilize gambles for which the probabilities of the outcomes remain stable across all trials (i.e., static probabilities). However, some studies have employed non-stationary distributions, where the probabilities of the payoffs change within the task. These non-stationary payoff distributions are akin to what happens in the original IGT, where Deck B initially appears advantageous because no loss is experienced on the first few deck selections.

Biele et al. (2009) examined dynamic decision environments using one-armed restless bandit problems. Participants chose between a stable safe prospect with a constant medium payoff of 0 and an unstable risky prospect with a payoff of either +1 or −1 depending on its state. The state was determined using a two-state Markov process. If the state was positive at trial t, it remained positive at t + 1 with probability p (either 0.95/0.50 in Study 1); and if the state was negative at trial t, it remained negative at t +1 with probability q (0.05/0.50). Participants were highly sensitive to the changing probabilities of the payoffs, achieving near-optimal performance. Similar to decisions-from-experience in static environments, underweighting of small probabilities and payoff variability was also observed, as inferred from models fitted to the data, suggesting a reliance on small samples of experiences.

Rakow and Miler (2009) examined choices in repeated-choice games with non-stationary payoff distributions. Their participants chose between two “money machines” for either 60 or 100 trials, to obtain as many points as possible. The possible win and loss amounts were stated at the start of the game, and participants saw both the obtained and forgone outcomes of each money machine. One game included a stationary option with a 70% of winning 10 points, otherwise a 30% chance of losing 20 points. The non-stationary option started with a 90% of winning 10 points (otherwise losing 20 points), which reduced to 50% at trial 20 onwards. Participants were not informed of the options' probability distributions, although they were informed that the options could change over the course of the game. In each game, one machine's probability changed gradually at a rate of either 0.01 per trial over 40 trials (Study 1) or 0.02 per trial over 20 trials (Study 2). It varied between games when this change began (trial 20 vs. 40) and its direction (increasing vs. decreasing win-probability). In one condition, participants also had a running history for past outcomes; participants were shown cumulative totals for the number of times each machine had delivered a win or loss amount. The results demonstrated reasonably rapid initial learning whereby, after the first few trials, participants generally selected the machine with the better payoffs, but with a slower adaption to pick from the better option after a switch in the payoffs. Providing participants with a history improved the initial learning to pick the optimal machine, but it sometimes hindered the adaption to the subsequent changes in the payoffs compared with when no history was provided.

Our two studies, reported here, follow a similar design to examine the role of somatic markers when, initially, a non-stationary option stochastically dominates a stationary option but changes to be suboptimal relatively early in the game. This mimics what happens with Deck B in the IGT. Study 1 uses a 100-trial task with a switch in the optimal option at trial 40, and Study 2 uses a 200-trial task with switch points at trials 40 and 120. The processes posited by the SMH predict successful initial learning and subsequent reversal learning in such tasks, and this is tested in our two studies.

According to the SMH, people develop “hunches” about the options they experience, and via somatic responses (which we index via SCRs) mark these options as either positive/safe or negative/risky. This promotes successful decision-making by gradually guiding decision-makers away from repeating bad decisions. In our two studies, we examine whether participants develop a preference for the (current) optimal option and whether this is aided by developing greater anticipatory markers for the suboptimal option. We are also interested in whether participants adapt to a change in an option's payoff distribution, and whether this is also reflected in the anticipatory, and outcome, skin conductance (SC). The SMH predicts that somatic markers develop and are greater for the inferior option. Therefore, after a change in the payoff structure, the SMH predicts that somatic markers will develop and be greater for the new, currently worse, option.

The task that we use presents participants with both the outcome obtained from each choice and the forgone outcome for each trial's unselected option. This allows our participants to experience regret because they can see what they would have obtained from a different course of action. This is a valuable feature of our study because experienced regret has been theorized (Loomes and Sugden, 1982) and found to be an important driver of various kinds of decisions (Zeelenberg and Beattie, 1997; Zeelenberg and Pieters, 1999, 2004; Kareev et al., 2014). Zeelenberg (1999) highlighted that the adaptive function of experienced regret may be to exacerbate the misfortune felt from our mistakes, to help us learn from them. This parallels the assumptions of the SMH that emotional signals help to bias us away from repeating previous poor decisions.

Study 1

Study 1 employs a 100-trial non-stationary decisions-from-experience task adapted from Rakow and Miler (2009) with an immediate switch point at trial 40, where the non-stationary option changes from optimal to suboptimal choice. We also provide a running history of past outcomes to half of the participants.

Based on Rakow and Miler (2009), we expect that participants will have developed a clear preference for the optimal non-stationary option within the first 40 trials but will adapt only partially to the change in payoffs by the end of the task:

H₁: There will be a preference for the optimal option by the second block of 20 trials.

H₂: The number of selections from the optimal option for trials 81–100 will be less for trials 21–40.

Rakow and Miler (2009) also found that providing a running history of each option's outcomes sometimes hindered participants' adaption to a change in which option was optimal. We therefore test whether:

H₃: Participants provided with a history of the options' outcomes adapt more slowly following the switch, and have fewer selections of the optimal option in the last 60 trials than participants without the history.

Hypothesis H₁ supports the somatic marker hypothesis (SMH), which also predicts reversal learning (though it is unclear to us whether, as we predict in H₂, this will be slower than the initial learning). Because we collect skin conductance (SC) data, we can also test the SMH's explanation for how learning occurs. The SMH posits that in response to positive and negative outcomes experienced after selecting options, emotional biasing signals mark options (Damasio, 1994). Participants should therefore have outcome SCRs that are greater for losses (“punishments”) than for wins (“rewards”). Therefore, based on the SMH, we predict:

H₄: Outcome SC will be greater for negative outcomes (−10) than for positive outcomes (+10).

As participants will also see the forgone outcome (for the option they did not pick), we will also examine whether the outcome of the chosen option relative to the forgone option's outcome affects the outcome SC. Obtaining a negative outcome (−10) when the forgone option was positive (+10) should result in a higher outcome SC, reflecting an elevated emotional response reflecting the regret of a missed reward (e.g., Astor et al., 2011):

H₅: Obtaining a negative outcome when the forgone outcome is positive will result in greater outcome SC compared with other combinations of obtained and forgone outcomes.

The SMH's key prediction is that over the course of the task, the outcome responses aid the development of anticipatory SCRs that mark options. The SMH predicts greater anticipatory SC before selecting disadvantageous options than before selecting advantageous options. We examine anticipatory SC and its role in optimal selections in the two phases of the task—before and after the switch. We, therefore, test this prediction derived from the SMH:

H₆: Anticipatory SC responses develop and are greater for selections of the suboptimal option (whichever option that is for a given phase of the task).

Method

Participants

Participants were recruited from the University of Essex, Psychology Department's Volunteer list,² which included university students (the majority) or staff. There were 36 participants (23 female) with a mean age of 24.33 years (SD = 3.80, range 19–35, IQR 22–26). Two participants were excluded due to either not following instructions correctly or apparatus error and were replaced. This sample size is typical for studies that test the SMH via SC data; a meta-analysis by Simonovic et al. (2019) reports that IGT studies with anticipatory SC data for non-clinical populations had a median N = 40.5 (IQR 32–70).³

Apparatus

The SC activity was recorded using a Mind Media NeXus-10, a multi-channel physiological monitoring and feedback platform, with a sampling rate of 32 samples per second. SC activity was recorded continuously throughout the study and for critical events in each task, a trigger was sent via a button box to mark the SC reading. The SC data were then analyzed using Ledalab, MATLAB-based analysis software designed for SC data analysis (Benedek and Kaernbach, 2010). Continuous decomposition analysis was used, with no downsampling of the data. All data were optimized (Ledalab optimizes data for each participant individually). The minimum amplitude threshold was set to 0.01 muS. Data were exported using SCR event-related activity, 1–4 s after an event for the outcome SC and from 2 s before up until a trigger for anticipatory SC.

Materials and design

A computerized decisions-from-experience task was created in Real Studio, which sent event triggers compatible with the NeXus-10 during the task. On each trial of the “game,” participants selected between two “money machines” represented as on-screen buttons. Both machines paid out a win of +10 or a loss of −10 but with different probabilities of a win. One machine (stationary option) had a probability of 0.5 of winning, fixed for all 100 trials. The other machine (non-stationary) had a probability of 0.7 of winning for the first 40 trials, which switched to a 0.3 probability at trial 41, at which it remained for all subsequent trials. Participants completed the task after two other risk-taking tasks (not reported here). Participants were randomly assigned to either the experience-only or experience-plus-history condition.

Procedure

On arrival at the laboratory, participants read a consent form explaining the payment for their participation and details of the NeXus-10 equipment. Once consent was obtained, using the participants' non-dominant hand, the electrodes were attached to the distal phalange of the first and third digits. To do so, participants cleaned the palm side of their first and third fingertips with an alcohol wipe and the experimenter applied opaque adhesive paste. The sensors were attached, and then, the participant placed their hand palm upwards on a cushion on the desk aiming to keep it as still as possible throughout the experiment. As recommended, the sensors were given 5 min to settle (Figner and Murphy, 2011). A measure of baseline activity was taken to control for individual differences in SC before the start of a task. This was calculated as a ratio of 1:5 of the average time taken to complete the task, which resulted in 3 min of baseline activity recorded.

Before commencing the task, participants were given standardized instructions (see Appendix) which outlined that points could be won or lost on each “go” of the “game” and that win/loss probability could change. The inter-trial interval of the task was set to 6 s to allow for suitable SC recording. Participants used the computer-mouse to select machines. Both machines displayed the outcome (win or lose) on every trial; the machine selected would display “You have won/lost 10 points.” The unselected machine would display “You would have won/lost 10 points.” In the experience-plus-history condition, the win and loss history for each machine was displayed above each machine. This history displayed the previous number of times the machine had won and lost and updated every trial. Once participants had played all 100 trials, their total points remained on screen. The study session lasted ~50 min, for which participants received UK£5 (or course credit).

Measures and data analysis techniques

The measure of SC reported here, and in Study 2, was the mean phasic driver within the response window. Ledalab documentation states that this variable “represents phasic activity within the response window most accurately, but does not fall back on classic SCR amplitudes” (http://www.ledalab.de/documentation.htm, see also Benedek and Kaernbach, 2010). The recording interval from 1 to 4 s after a trial outcome appeared was used for outcome SC, and the 2-s interval before selecting the option was used for the anticipatory SC. Due to the repeated-measures design, all regressions were run using a multilevel random intercepts regression model (Nezlek et al., 2006). Multilevel models are used to assess data that contain a natural hierarchy or clustering of cases within variables. This is appropriate with the current data because the 100 selections (200 in Study 2) represent a cluster of observations for each participant. Multilevel models differ from standard regression models (e.g., ordinary least squares) due to dividing the error variance into separate components. This allows the model to control for the patterns of the structured data: patterns in the error from the model are assumed to have a reliable structure and are not just noise. This technique allows the examination of trial-by-trial data in a principled fashion (e.g., by not treating trials as independent observations).

Research has shown both inter- and intra-individual variability in the rise and recovery time of SCRs (Edelberg and Muller, 1981; Breault and Ducharme, 1993). The anticipated variation between participants in SC (Figner and Murphy, 2011) was accounted for by entering participants as a level 2 random intercept within the multilevel model. Multilevel modeling was utilized to distinguish within- and between-participant variations in SC (Goldstein, 1995; Hox, 2010). The level 1 variables were at the individual trial level (100 data points in Study 1 and 200 in Study 2) and included participants' selections (e.g., which option they picked and whether the selection was optimal) and SC measures (outcome SC; anticipatory SC). We checked for skew in continuous dependent variables for each participant individually. Outcome SC was found to be positively skewed, so was log₁₀ transformed for all participants, in both Study 1 and Study 2. Any marginal means reported for outcome SC are log₁₀-transformed marginal means. We checked for outliers in all regressions using the Blocked Adaptive Computationally efficient Outlier Nominator (BACON; Billor et al., 2000) procedure, which identifies multivariate outliers in a set of predictor variables and removed those outliers from all regression analyses.

Results

For data presentation and analysis, we grouped trials into five 20-trial blocks. To examine whether participants were picking more from the optimal (non-stationary) option by the second block (21–40 trials), a one-sample t-test was conducted which found that the mean number of selections from the optimal (non-stationary) option in block 2 (M = 13.78) was significantly >10 (chance-level performance), t₍₃₅₎ = 4.44, p < 0.001, d = 0.74. This fairly large effect supports H₁ that participants would develop a preference for the optimal option within the first 40 trials. A paired-samples t-test comparing the number of optimal selections (from the stationary option) in block 5 (M = 10.36) to the optimal selections (from the non-stationary option) in block 2 (M = 13.78) revealed—in support of H₂–significantly fewer optimal selections in block 5 than in block 2, t₍₃₅₎ = 2.56, p = 0.015, d = 0.43.

To examine the effect of having the running history in addition to experiencing the outcomes, the number of times the optimal option was selected was analyzed as five blocks of 20 trials (coding specific to whichever option was optimal in a given block). A repeated-measures ANOVA was performed with the number of times the optimal option was picked as the dependent variable and block and condition (experience-plus-history vs. experience-only) entered as factors, see Figure 1. Mauchly's test for sphericity was statistically significant for block; we report Greenhouse–Geisser corrected degrees of freedom for all ANOVA effects for block (also in Study 2). The main effect of block was medium-to-large and statistically significant, F_{(1.75, 59.55)} = 6.73, p = 0.003, $η_{p}^{2}$ = 0.165. Tukey's post-hoc tests revealed a statistically significant increase in optimal selections from block 1 (M = 11.06) to block 2 (M = 13.78). There was a statistically significant decrease in optimal selections from block 2 (M =13.78) to block 3 (M = 8.17), the first block of trials after the switch had occurred. Selections of the optimal choice remained significantly lower in block 4 (M = 9.31) and block 5 (M = 10.36) compared with block 2 (all p's < 0.025). No other differences were statistically significant. Neither the main effect of condition nor the block-by-condition interaction was statistically significant, for both F < 1. We, therefore, fail to find support for H₃ that providing participants with a history of the options' outcomes leads to slower adaption following the switch.

FIGURE 1

Figure 1. Study 1: mean frequency of selection of the optimal option (non-stationary option in blocks 1 and 2, stationary option in blocks 3–5) for each condition of the task. The error bars show the standard error of the mean. The asterisk denotes that a switch in payoffs occurred at the end of block 2.

Outcome SC

To examine H₄, we first regressed SC outcome on the outcome obtained (0 = −10, 1 = +10) and found that outcome SC was greater following a negative outcome than a positive outcome, b = −0.044, z = −2.38, p = 0.017. This supports the SMH and H₄, which predicted that outcome SC would be greater following a negative outcome compared with a positive outcome. To control for the length of time playing the task, and therefore the amount of information obtained about the options, we included the block and its interaction with the outcome obtained in the next step of the regression. The block (variable centered) was now the only significant predictor, with greater outcome SC as the game progressed, b = 0.045, z = 4.65, p < 0.001. Both outcome obtained and its interaction with the block were non-significant, b = 0.025, z = 0.57, p = 0.568, and b = −0.017, z = −1.25, p = 0.210, respectively. Thus, although losses resulted in greater physiological responses, the effect of SC was no longer significant once task experience was controlled for. Therefore, we find some, but inconsistent, support for the predictions of the SMH that losses result in greater physiological reactions than wins.

To examine hypothesis H₅, that the forgone outcomes can affect the size of outcome SCs, we initially regressed outcome SC on outcome combination, i.e., the combination of obtained and forgone outcomes, see Figure 2. The outcome combinations were dummy coded with a reference category of negative obtained/positive forgone (N_O/P_F), vs. positive obtained/positive forgone (P_O/P_F), positive obtained/negative forgone (P_O/N_F), and negative obtained/negative forgone (N_O/N_F). The outcome SC was lower in all outcome combinations compared with the reference category of negative obtained/positive forgone, indicating greater physiological response when the option selected resulted in a negative outcome and the forgone option resulted in a positive outcome. These differences were significant when N_O/P_F was compared with P_O/P_F, b = −0.073, z = −2.71, p = 0.007, and to P_O/N_F, b = −0.067, z = −2.54, p = 0.011; but not (quite) for comparison with N_O/N_F, b = −0.048, z = −1.84, p = 0.066.⁴ Controlling for the amount of information obtained by adding block (variable centered) and its interaction between outcome combination (dummy coded) in the next step resulted in outcome SC being significantly lower for all outcome combinations compared with the reference category of N_O/P_F: vs. P_O/P_F, b = −0.062, z = −2.24, p = 0.025; vs. P_O/N_F, b = −0.063, z = −2.38, p = 0.017; and vs. N_O/N_F, b = −0.068, z = −2.55, p = 0.011. The main effect of the block was also significant, b = 0.047, z = 3.42, p = 0.001, indicating a tendency for outcome SC to increase across the blocks of the game. The interactions between the outcome-combination dummy variables and block were all non-significant (all z < 1.46, all p > 0.145). Thus, consistent with claims that regret is a powerful emotion, observing that a better outcome could have been obtained leads to increased outcome SC compared with all the other combinations of obtained and forgone outcomes.

FIGURE 2

Figure 2. Study 1: mean outcome skin conductance (μSiemens/second) for obtained and forgone outcome combinations: N_O/P_F, negative obtained/positive forgone (reference category); P_O/P_F, positive obtained/positive forgone; P_O/N_F, positive obtained/negative forgone; N_O/N_F, negative obtained/negative forgone. The error bars show the standard error of the mean.

Anticipatory SC

To examine whether anticipatory SC predicted optimal selections in the task (H₆), we ran two separate multilevel logistic regressions for each of the two different optimal phases of the task: the first phase included blocks 1 and 2 where the non-stationary option was optimal, and the second phase included blocks 3–5 which followed the switch to the stationary option becoming optimal, see Figure 3. For the first phase, we regressed optimal selection (0 = suboptimal, 1 = optimal), on anticipatory SC (variable centered). We found that higher anticipatory SC predicted selecting from the suboptimal option, with odds ratio = 0.60, z = −2.38, and p = 0.017, in support of the SMH and H₆. In the next step, including block (variable centered) and its interaction with anticipatory SC (variable centered) resulted in the effect of anticipatory SC shrinking and becoming non-significant, with odds ratio = 1.11, z = 0.16, and p = 0.871. The effect of the block was significant, indicating that optimal selections increased from block 1 to block 2, odds ratio = 2.09, z = 5.18, p < 0.001. The interaction between anticipatory SC and block was not significant, with odds ratio = 1.40, z = 0.85, and p = 0.396. During the first 40 trials of the task, participants learned to select more from the optimal option, and although anticipatory SC was predictive of optimal selections, controlling for task experience eliminated this effect.

FIGURE 3

Figure 3. Study 1: mean anticipatory skin conductance (μSiemens/second) for optimal and suboptimal selections by block. The error bars show the standard error of the mean. The asterisk denotes that a switch in payoffs occurred at the end of block 2.

In phase 2, we initially regressed optimal selection (0 = suboptimal, 1 = optimal) on anticipatory SC (variable centered), but found no significant effect of anticipatory SC, with odds ratio = 0.93, z = −0.52, and p = 0.601. Following the payoff switch which made the stationary option optimal, there was no difference in anticipatory SC before selecting from either option. Including block and its interaction with anticipatory SC (variable centered) in the next step resulted in the block being the only significant predictor, with optimal selections increasing from block 3 to block 5, odds ratio = 1.30, z = 4.32, p < 0.001—indicative of reversal learning after the switch. Anticipatory SC remained non-significant and the interaction with the block was also non-significant, with odds ratio = 1.17, z = 0.98, and p = 0.327. In the last 60 trials of the task, anticipatory SC did not significantly predict optimal selections.⁵

Discussion

Study 1 examined whether participants could adapt to changes in the probabilities of payoffs and whether somatic states indexed by SC plausibly played a role in aiding reversal learning of the payoff structure. Initial learning in the task was successful, participants' rate of selection of the (non-stationary) optimal option was significantly above chance after 21–40 trials. The optimal option switched at trial 41. Despite the symmetry in our design whereby expected value differences between options were of equal size before and after this switch, participants did not choose as successfully after 60 trials in this new environment as they had after 40 trials in the original task environment. We found no significant effect of showing participants a running history of previous outcomes which has sometimes been found to speed up initial learning but also to increase inertia when the optimal option changes (Rakow and Miler, 2009).

Supporting the SMH, we found significantly greater outcome SC to negative outcomes compared with positive outcomes for the option chosen, though this effect reduced when the time course of the task was taken into account. This limited effect of outcome valence on outcome SC is perhaps surprising given the role that the SMH ascribes to somatic outcome responses in marking options and the subsequent development of successful IGT performance. This could reflect differences between tasks in the relative scale of losses and gains. In the IGT, the largest losses in both disadvantageous decks are either 2.5 or 12.5 times the size of the $100 nominal reward, whereas, in our non-stationary task, the losses and gains were equal. Although the SMH does not predict greater outcome SC to the greater punishments, the SCR data from the IGT reported by Bechara et al. (1996) seem to show that punishment SCR only reliably exceeds reward SCR in decks where the absolute size of rewards exceeds that of punishments (Decks B and D). This is consistent with the possibility that SC marks risk rather than reward (Tomb et al., 2002). Another possibility is that—because our participants observed both obtained and forgone outcomes—obtained losses have a different status in our task because participants see that, sometimes, they would not have done better by selecting the other option. Indeed, consistent with that we found greater outcome SC following a positive forgone outcome when the obtained outcome was negative, compared with all other combinations of positive and negative outcomes, a finding that remained (in fact, strengthened slightly) when controlling for block number.

The SMH's key prediction is that after experiencing different outcomes, anticipatory SC develops and is greater before selecting from the disadvantageous options than advantageous options. This facilitates advantageous decision-making by helping to avoid previously experienced negative outcomes. We found that anticipatory SC was greater before a non-optimal selection (disadvantageous option) by the end of the first 40 trials. This is in the direction the SMH would predict, with a greater physiological response before picking from a disadvantageous option. This effect was no longer significant once the block number was included in the regression. We did not find any support for a difference in anticipatory SC between the optimal and non-optimal options developing after the switch, in the last three blocks of the game with or without controlling for block number. However, we note that, on average, decision-making was not particularly successful in this latter phase that followed the switch.

To further test the role of anticipatory SC to help guide optimal decision-making, we ran a second study. In light of participants' modest adaptation to changes in payoffs, we added more trials post-switch to allow for more opportunity to adapt to change. Additionally, to provide further opportunity to examine reversal learning, we added a second switch-point at which the optimal option changed. To accommodate these design changes, the length of the game was doubled.

Study 2

Study 2 employs a similar non-stationary payoff game to Study 1 but uses a 200-trial version with two instantaneous switches. The switches occur at trial 41 where the non-stationary option changes from being the optimal to the suboptimal choice and at trial 121 when the non-stationary option changes to be the optimal choice again. The payoff distributions for both the stationary and non-stationary options are the same as in Study 1. Based on Rakow and Miler (2009) and the results from Study 1, we expect that participants will have developed a preference for the optimal (non-stationary) option within the first 40 trials:

H₁: There will be a preference for the optimal option by the second block of 20 trials.

Based on Rakow and Miler (2009) and Study 1, we expect initial learning to be more successful than reversal learning. Therefore, even with an extra 20 trials after a switch compared with Study 1, we predict fewer selections for the optimal option in the fourth 20-trial block following a switch than in the second block from the start of the game:

H_2a: The number of selections from the optimal option for trials 101–120 will be less than for trials 21–40.

H_2b: The number of selections from the optimal option for trials 181–200 will be less than for trials 21–40.

We again employ a between-subjects design providing a running history to half of the participants and testing whether it moderates participants' adaptation to a change in the payoff distributions:

H₃: Participants provided with a history of the options' outcomes adapt more slowly following the switches, making fewer selections of the optimal option compared with participants without the history.

We again test whether, as per the SMH, participants have greater outcome SC for the negative outcomes in the task. We also take advantage of our task design to test whether forgone outcomes moderate outcome SC, reflecting affective impulses associated with regret:

H₄: Outcome SC will be greater for negative outcomes (−10) than for positive outcomes (+10).

H₅: Obtaining a negative outcome when the forgone outcome is positive will result in greater outcome SC compared with other combinations of obtained and forgone outcomes.

In Study 1, we found mixed support for the SMH's key prediction that decision makers develop anticipatory SCRs before picking from the options, which are greater for disadvantageous options than advantageous ones. We again examine whether, consistent with the SMH, anticipatory SC predicts optimal selections in each of the three phases of the task:

H₆: Anticipatory SC responses develop and are greater for selections of the suboptimal option (whichever option that is for a given phase of the task).