Internal bias controls dopamine perceptual decision-related responses

Mapping Intimacies ◽

10.1101/431387 ◽

2018 ◽

Author(s):

Stefania Sarno ◽

Manuel Beirán ◽

José Vergara ◽

Román Rossi-Pool ◽

Ranulfo Romo ◽

...

Keyword(s):

Delay Period ◽

Dopamine Neurons ◽

Prediction Errors ◽

Phasic Activity ◽

Perceptual Decision ◽

Reward Prediction ◽

Internal Bias ◽

Dopamine Signaling

AbstractDopamine neurons produce reward-related signals that regulate learning and guide behavior. Prior expectations about forthcoming stimuli and internal biases can alter perception and choices and thus could influence dopamine signaling. We tested this hypothesis studying dopamine neurons recorded in monkeys trained to discriminate between two tactile frequencies separated by a delay period, a task affected by the contraction bias. The bias greatly controlled the animals’ choices and confidence on their decisions. During decision formation the phasic activity reflected bias-induced modulations and simultaneously coded reward prediction errors. In contrast, the activity during the delay period was not affected by the bias, was not tuned to the value of the stimuli but was temporally modulated, pointing to a role different from that of the phasic activity.

Download Full-text

A unified framework for dopamine signals across timescales

10.1101/803437 ◽

2019 ◽

Cited By ~ 13

Author(s):

HyungGoo R. Kim ◽

Athar N. Malik ◽

John G. Mikhael ◽

Pol Bech ◽

Iku Tsutsui-Kimura ◽

...

Keyword(s):

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Unified Framework ◽

Phasic Activity ◽

Reward Prediction ◽

Dynamic Stimulus ◽

Midbrain Dopamine ◽

Gradual Approach ◽

Midbrain Dopamine Neurons

ABSTRACTRapid phasic activity of midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. Recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independently from somatic spiking activity. Here, we developed novel experimental paradigms using virtual reality that disambiguate RPEs from values. We examined the dopamine circuit activity at various stages including somatic spiking, axonal calcium signals, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all the stages examined. We further show that ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.

Download Full-text

Dopamine firing plays a dual role in coding reward prediction errors and signaling motivation in a working memory task

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2113311119 ◽

2022 ◽

Vol 119 (2) ◽

pp. e2113311119

Author(s):

Stefania Sarno ◽

Manuel Beirán ◽

Joan Falcó-Roget ◽

Gabriel Diaz-deLeon ◽

Román Rossi-Pool ◽

...

Keyword(s):

Working Memory ◽

Memory Task ◽

Reaction Times ◽

Delay Period ◽

Prediction Errors ◽

Motivated Behavior ◽

Reward Prediction ◽

Stimulus Perception ◽

Trial Basis ◽

Internal Bias

Little is known about how dopamine (DA) neuron firing rates behave in cognitively demanding decision-making tasks. Here, we investigated midbrain DA activity in monkeys performing a discrimination task in which the animal had to use working memory (WM) to report which of two sequentially applied vibrotactile stimuli had the higher frequency. We found that perception was altered by an internal bias, likely generated by deterioration of the representation of the first frequency during the WM period. This bias greatly controlled the DA phasic response during the two stimulation periods, confirming that DA reward prediction errors reflected stimulus perception. In contrast, tonic dopamine activity during WM was not affected by the bias and did not encode the stored frequency. More interestingly, both delay-period activity and phasic responses before the second stimulus negatively correlated with reaction times of the animals after the trial start cue and thus represented motivated behavior on a trial-by-trial basis. During WM, this motivation signal underwent a ramp-like increase. At the same time, motivation positively correlated with accuracy, especially in difficult trials, probably by decreasing the effect of the bias. Overall, our results indicate that DA activity, in addition to encoding reward prediction errors, could at the same time be involved in motivation and WM. In particular, the ramping activity during the delay period suggests a possible DA role in stabilizing sustained cortical activity, hypothetically by increasing the gain communicated to prefrontal neurons in a motivation-dependent way.

Download Full-text

Midbrain dopamine neurons signal aversion in a reward-context-dependent manner

eLife ◽

10.7554/elife.17328 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 47

Author(s):

Hideyuki Matsumoto ◽

Ju Tian ◽

Naoshige Uchida ◽

Mitsuko Watabe-Uchida

Keyword(s):

Dopamine Neurons ◽

Prediction Errors ◽

Dependent Manner ◽

Value Prediction ◽

Signal Value ◽

Aversive Events ◽

High Reward ◽

Midbrain Dopamine ◽

Dopamine Signaling ◽

Reward And Punishment

Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.

Download Full-text

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

Neural Computation ◽

10.1162/neco.2008.11-07-654 ◽

2008 ◽

Vol 20 (12) ◽

pp. 3034-3054 ◽

Cited By ~ 76

Author(s):

Elliot A. Ludvig ◽

Richard S. Sutton ◽

E. James Kehoe

Keyword(s):

Learning Algorithm ◽

Full Range ◽

Dopamine Neurons ◽

Prediction Errors ◽

Dopamine System ◽

Stimulus Representation ◽

Reward Prediction ◽

Future Reward ◽

Temporal Generalization ◽

External Stimuli

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

Download Full-text

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

10.1101/574541 ◽

2019 ◽

Cited By ~ 3

Author(s):

Melissa J. Sharpe ◽

Hannah M. Batchelor ◽

Lauren E. Mueller ◽

Chun Yun Chang ◽

Etienne J.P. Maes ◽

...

Keyword(s):

Reinforcement Learning ◽

Associative Learning ◽

Prediction Error ◽

Error Term ◽

Neural Correlates ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Excess Value

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.

Download Full-text

Dopamine reward prediction error coding

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2016.18.1/wschultz ◽

2016 ◽

Vol 18 (1) ◽

pp. 23-32 ◽

Cited By ~ 71

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Negative Prediction ◽

Baseline Activity ◽

Error Coding ◽

Reward Value ◽

Dopamine Signal

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Download Full-text

How We Learn to Make Decisions: Rapid Propagation of Reinforcement Learning Prediction Errors in Humans

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00509 ◽

2014 ◽

Vol 26 (3) ◽

pp. 635-644 ◽

Cited By ~ 38

Author(s):

Olav E. Krigolson ◽

Cameron D. Hassall ◽

Todd C. Handy

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Human Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Neural Basis ◽

Error Related Negativity ◽

Reward Positivity ◽

Reward Prediction ◽

Feedback Error

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

Download Full-text

Rethinking dopamine as generalized prediction error

10.1101/239731 ◽

2017 ◽

Cited By ~ 2

Author(s):

Matthew P.H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

Mesolimbic dopamine adapts the rate of learning from errors in performance

10.1101/2021.05.31.446464 ◽

2021 ◽

Author(s):

Luke T Coddington ◽

Sarah E Lindo ◽

Joshua T Dudman

Keyword(s):

Explanatory Power ◽

Trace Conditioning ◽

Policy Learning ◽

Prediction Errors ◽

Value Functions ◽

Mesolimbic Dopamine ◽

Learning From Errors ◽

Reward Prediction ◽

Dopamine Signaling ◽

Value Learning

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioral policies and indirect learning via value functions. Policy learning and value learning employ distinct algorithms that depend upon evaluation of errors in performance and reward prediction errors, respectively. In animals, behavioral learning and the role of mesolimbic dopamine signaling have been extensively evaluated with respect to reward prediction errors; however, to date there has been little consideration of how direct policy learning might inform our understanding. Here we used a comprehensive dataset of orofacial and body movements to reveal how behavioral policies evolve as naive, head-restrained mice learned a trace conditioning paradigm. Simultaneous multi-regional measurement of dopamine activity revealed that individual differences in initial reward responses robustly predicted behavioral policy hundreds of trials later, but not variation in reward prediction error encoding. These observations were remarkably well matched to the predictions of a neural network based model of behavioral policy learning. This work provides strong evidence that phasic dopamine activity regulates policy learning from performance errors in addition to its roles in value learning and further expands the explanatory power of reinforcement learning models for animal learning.

Download Full-text