Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum

Yuji K. Takahashi; Angela J. Langdon; Yael Niv; Geoffrey Schoenbaum

doi:10.1016/j.neuron.2016.05.015

A VTA GABAergic computational model of dissociated reward prediction error computation in classical conditioning

10.1101/2020.02.06.936997 ◽

2020 ◽

Author(s):

Pramod Kaushik ◽

Jérémie Naudé ◽

Surampudi Bapi Raju ◽

Frédéric Alexandre

Keyword(s):

Classical Conditioning ◽

Computational Model ◽

Prediction Error ◽

Ventral Striatum ◽

Dopamine Neurons ◽

System Level ◽

Reward Prediction Error ◽

Twin Peaks ◽

Reward Prediction ◽

Error Computation

AbstractClassical Conditioning is a fundamental learning mechanism where the Ventral Striatum is generally thought to be the source of inhibition to Ventral Tegmental Area (VTA) Dopamine neurons when a reward is expected. However, recent evidences point to a new candidate in VTA GABA encoding expectation for computing the reward prediction error in the VTA. In this system-level computational model, the VTA GABA signal is hypothesised to be a combination of magnitude and timing computed in the Peduncolopontine and Ventral Striatum respectively. This dissociation enables the model to explain recent results wherein Ventral Striatum lesions affected the temporal expectation of the reward but the magnitude of the reward was intact. This model also exhibits other features in classical conditioning namely, progressively decreasing firing for early rewards closer to the actual reward, twin peaks of VTA dopamine during training and cancellation of US dopamine after training.

Download Full-text

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

Neural Computation ◽

10.1162/neco.2008.11-07-654 ◽

2008 ◽

Vol 20 (12) ◽

pp. 3034-3054 ◽

Cited By ~ 76

Author(s):

Elliot A. Ludvig ◽

Richard S. Sutton ◽

E. James Kehoe

Keyword(s):

Learning Algorithm ◽

Full Range ◽

Dopamine Neurons ◽

Prediction Errors ◽

Dopamine System ◽

Stimulus Representation ◽

Reward Prediction ◽

Future Reward ◽

Temporal Generalization ◽

External Stimuli

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

Download Full-text

Signed Reward Prediction Errors in the Ventral Striatum Drive Episodic Memory

Journal of Neuroscience ◽

10.1523/jneurosci.1785-20.2020 ◽

2020 ◽

pp. JN-RM-1785-20

Author(s):

Cristian B. Calderon ◽

Esther De Loof ◽

Kate Ergo ◽

Anna Snoeck ◽

Carsten N. Boehler ◽

...

Keyword(s):

Episodic Memory ◽

Ventral Striatum ◽

Prediction Errors ◽

Reward Prediction

Download Full-text

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

10.1101/574541 ◽

2019 ◽

Cited By ~ 3

Author(s):

Melissa J. Sharpe ◽

Hannah M. Batchelor ◽

Lauren E. Mueller ◽

Chun Yun Chang ◽

Etienne J.P. Maes ◽

...

Keyword(s):

Reinforcement Learning ◽

Associative Learning ◽

Prediction Error ◽

Error Term ◽

Neural Correlates ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Excess Value

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.

Download Full-text

Dopamine reward prediction error coding

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2016.18.1/wschultz ◽

2016 ◽

Vol 18 (1) ◽

pp. 23-32 ◽

Cited By ~ 71

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Negative Prediction ◽

Baseline Activity ◽

Error Coding ◽

Reward Value ◽

Dopamine Signal

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Download Full-text

How We Learn to Make Decisions: Rapid Propagation of Reinforcement Learning Prediction Errors in Humans

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00509 ◽

2014 ◽

Vol 26 (3) ◽

pp. 635-644 ◽

Cited By ~ 38

Author(s):

Olav E. Krigolson ◽

Cameron D. Hassall ◽

Todd C. Handy

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Human Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Neural Basis ◽

Error Related Negativity ◽

Reward Positivity ◽

Reward Prediction ◽

Feedback Error

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

Download Full-text

Striatal Dopamine and Reward Prediction Error Signaling in Unmedicated Schizophrenia Patients

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa055 ◽

2020 ◽

Vol 46 (6) ◽

pp. 1535-1546

Author(s):

Teresa Katthagen ◽

Jakob Kaminski ◽

Andreas Heinz ◽

Ralph Buchert ◽

Florian Schlagenhauf

Keyword(s):

Reversal Learning ◽

Prediction Error ◽

Negative Symptoms ◽

Ventral Striatum ◽

Striatal Dopamine ◽

Dopamine Synthesis ◽

Positive Symptoms ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction

Abstract Increased striatal dopamine synthesis capacity has consistently been reported in patients with schizophrenia. However, the mechanism translating this into behavior and symptoms remains unclear. It has been proposed that heightened striatal dopamine may blunt dopaminergic reward prediction error signaling during reinforcement learning. In this study, we investigated striatal dopamine synthesis capacity, reward prediction errors, and their association in unmedicated schizophrenia patients (n = 19) and healthy controls (n = 23). They took part in FDOPA-PET and underwent functional magnetic resonance imaging (fMRI) scanning, where they performed a reversal-learning paradigm. The groups were compared regarding dopamine synthesis capacity (Kicer), fMRI neural prediction error signals, and the correlation of both. Patients did not differ from controls with respect to striatal Kicer. Taking into account, comorbid alcohol abuse revealed that patients without such abuse showed elevated Kicer in the associative striatum, while those with abuse did not differ from controls. Comparing all patients to controls, patients performed worse during reversal learning and displayed reduced prediction error signaling in the ventral striatum. In controls, Kicer in the limbic striatum correlated with higher reward prediction error signaling, while there was no significant association in patients. Kicer in the associative striatum correlated with higher positive symptoms and blunted reward prediction error signaling was associated with negative symptoms. Our results suggest a dissociation between striatal subregions and symptom domains, with elevated dopamine synthesis capacity in the associative striatum contributing to positive symptoms while blunted prediction error signaling in the ventral striatum related to negative symptoms.

Download Full-text

All That Glitters … Dissociating Attention and Outcome Expectancy From Prediction Errors Signals

Journal of Neurophysiology ◽

10.1152/jn.00173.2010 ◽

2010 ◽

Vol 104 (2) ◽

pp. 587-595 ◽

Cited By ~ 39

Author(s):

Matthew R. Roesch ◽

Donna J. Calu ◽

Guillem R. Esber ◽

Geoffrey Schoenbaum

Keyword(s):

Orbitofrontal Cortex ◽

Basolateral Amygdala ◽

Ventral Striatum ◽

Recent Literature ◽

Neural Correlates ◽

Dopamine Neurons ◽

Outcome Expectancy ◽

Prediction Errors ◽

Signal Prediction ◽

Unexpected Outcomes

Initially reported in dopamine neurons, neural correlates of prediction errors have now been shown in a variety of areas, including orbitofrontal cortex, ventral striatum, and amygdala. Yet changes in neural activity to an outcome or cues that precede it can reflect other processes. We review the recent literature and show that although activity in dopamine neurons appears to signal prediction errors, similar activity in orbitofrontal cortex, basolateral amygdala, and ventral striatum does not. Instead, increased firing in basolateral amygdala to unexpected outcomes likely reflects attention, whereas activity in orbitofrontal cortex and ventral striatum is unaffected by prior expectations and may provide information on outcome expectancy. These results have important implications for how these areas interact to facilitate learning and guide behavior.

Download Full-text

Neural Coding of Reward-Prediction Error Signals During Classical Conditioning With Attractive Faces

Journal of Neurophysiology ◽

10.1152/jn.01211.2006 ◽

2007 ◽

Vol 97 (4) ◽

pp. 3036-3045 ◽

Cited By ~ 100

Author(s):

Signe Bray ◽

John O'Doherty

Keyword(s):

Neural Coding ◽

Prediction Error ◽

Ventral Striatum ◽

Human Subjects ◽

Visual Stimuli ◽

Dopamine Neurons ◽

Error Signal ◽

Reward Prediction Error ◽

Reward Prediction ◽

Attractive Female

Attractive faces can be considered to be a form of visual reward. Previous imaging studies have reported activity in reward structures including orbitofrontal cortex and nucleus accumbens during presentation of attractive faces. Given that these stimuli appear to act as rewards, we set out to explore whether it was possible to establish conditioning in human subjects by pairing presentation of arbitrary affectively neutral stimuli with subsequent presentation of attractive and unattractive faces. Furthermore, we scanned human subjects with functional magnetic resonance imaging (fMRI) while they underwent this conditioning procedure to determine whether a reward-prediction error signal is engaged during learning with attractive faces as is known to be the case for learning with other types of reward such as juice and money. Subjects showed changes in behavioral ratings to the conditioned stimuli (CS) when comparing post- to preconditioning evaluations, notably for those CSs paired with attractive female faces. We used a simple Rescorla-Wagner learning model to generate a reward-prediction error signal and entered this into a regression analysis with the fMRI data. We found significant prediction error-related activity in the ventral striatum during conditioning with attractive compared with unattractive faces. These findings suggest that an arbitrary stimulus can acquire conditioned value by being paired with pleasant visual stimuli just as with other types of reward such as money or juice. This learning process elicits a reward-prediction error signal in a main target structure of dopamine neurons: the ventral striatum. The findings we describe here may provide insights into the neural mechanisms tapped into by advertisers seeking to influence behavioral preferences by repeatedly exposing consumers to simple associations between products and rewarding visual stimuli such as pretty faces.

Download Full-text

Internal bias controls dopamine perceptual decision-related responses

10.1101/431387 ◽

2018 ◽

Author(s):

Stefania Sarno ◽

Manuel Beirán ◽

José Vergara ◽

Román Rossi-Pool ◽

Ranulfo Romo ◽

...

Keyword(s):

Delay Period ◽

Dopamine Neurons ◽

Prediction Errors ◽

Phasic Activity ◽

Perceptual Decision ◽

Reward Prediction ◽

Internal Bias ◽

Dopamine Signaling

AbstractDopamine neurons produce reward-related signals that regulate learning and guide behavior. Prior expectations about forthcoming stimuli and internal biases can alter perception and choices and thus could influence dopamine signaling. We tested this hypothesis studying dopamine neurons recorded in monkeys trained to discriminate between two tactile frequencies separated by a delay period, a task affected by the contraction bias. The bias greatly controlled the animals’ choices and confidence on their decisions. During decision formation the phasic activity reflected bias-induced modulations and simultaneously coded reward prediction errors. In contrast, the activity during the delay period was not affected by the bias, was not tuned to the value of the stimuli but was temporally modulated, pointing to a role different from that of the phasic activity.

Download Full-text