Adaptive properties of differential learning rates for positive and negative outcomes
-
1
ENS, Group for Neural Theory, France
-
2
University of Minnesota, Department of Neuroscience, United States
A central concept in theories of Pavlovian and instrumental learning alike is the prediction error, which signals how much better or worse than expected an outcome turned out to be. The impact of prediction errors is controlled by a "learning rate" parameter, commonly a single value for positive and negative outcomes. However, a single learning rate may not be an accurate description of how humans actually learn. For instance, Frank et al. (2007) found that subjects learned differentially from positive and negative outcomes on a probabilistic two-choice task. Furthermore, genetic polymorphisms associated with striatal D1 and D2-receptor pathways were independently predictive of learning rates associated with positive and negative outcomes, suggesting that differential learning rates may be dissociable at the neural level. While the computational consequences of such differential learning rates have rarely been studied, a common suggestion is that such biases and related asymmetries such as loss aversion are irrational (e.g. Kahneman and Tversky 1972). In this study we sought to identify conditions in which differential learning rates may be adaptive, by comparing the performance of three reinforcement learning agents: one that learns more from positive than negative outcomes (gain learner), its opposite (loss learner), and one that learns equally from both, on a variety of probabilistic choice tasks. We found that when two choices had a high (but different) probability of reward (0.8 and 0.9), the loss learner performed best, while when both choices had a low probability of reward (0.1 and 0.2) the gain learner performed best. In both situations, differential learning rates enabled a better separation of learned reward probabilities, compared to the normal agent’s convergence near to the true reward probabilities, which are close together and promote instability in the face of stochastic rewards. We derived analytical expressions for the reward obtained in the steady state as a function of the two learning rates and the distribution of rewards, and show that these results hold independently of the action selection mechanism used (epsilon-greedy or softmax). Thus, from a reinforcement learning perspective, there are situations in which an agent with different learning rates for positive and negative outcomes performs better than an agent with a single, symmetric learning rate. These results suggest that having different neural systems independently support learning from positive and negative outcomes with potentially different learning rates can in fact be adaptive even in simple choice situations. However, real world learning situations often involve more complex operations than the processing of prediction errors alone; for instance, in serial reversal learning, subjects do not "unlearn" previously learned associations but learn to switch between different world-states. Nevertheless, there is a wide range of proposals in psychology and economics that suggest an asymmetric impact of positive and negative outcomes, including not only loss aversion but also variations in happiness set-point (Frederick and Loewenstein, 1999) and optimism bias (Sharot et al. 2007) which can be informed by this reinforcement learning approach.
Conference:
Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010.
Presentation Type:
Poster Presentation
Topic:
Poster session II
Citation:
Caze
R and
Van Der Meer
M
(2010). Adaptive properties of differential learning rates for positive and negative outcomes.
Front. Neurosci.
Conference Abstract:
Computational and Systems Neuroscience 2010.
doi: 10.3389/conf.fnins.2010.03.00166
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
02 Mar 2010;
Published Online:
02 Mar 2010.
*
Correspondence:
Matthijs Van Der Meer, University of Minnesota, Department of Neuroscience, Minneapolis, United States, mvdm@uwaterloo.ca