Neural basis of learning guided by sensory confidence and reward value

Mapping Intimacies ◽

10.1101/411413 ◽

2018 ◽

Cited By ~ 3

Author(s):

Armin Lak ◽

Michael Okun ◽

Morgane Moss ◽

Harsha Gurnani ◽

Karolina Farrell ◽

...

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Neural Basis ◽

Neural Processes ◽

Reward Value ◽

Sensory Evidence ◽

Midbrain Dopamine ◽

Reinforcement Learning Model ◽

Midbrain Dopamine Neurons

SummaryMaking efficient decisions requires combining present sensory evidence with previous reward values, and learning from the resulting outcome. To establish the underlying neural processes, we trained mice in a task that probed such decisions. Mouse choices conformed to a reinforcement learning model that estimates predicted value (reward value times sensory confidence) and prediction error (outcome minus predicted value). Predicted value was encoded in the pre-outcome activity of prelimbic frontal neurons and midbrain dopamine neurons. Prediction error was encoded in the post-outcome activity of dopamine neurons, which reflected not only reward value but also sensory confidence. Manipulations of these signals spared ongoing choices but profoundly affected subsequent learning. Learning depended on the pre-outcome activity of prelimbic neurons, but not dopamine neurons. Learning also depended on the post-outcome activity of dopamine neurons, but not prelimbic neurons. These results reveal the distinct roles of frontal and dopamine neurons in learning under uncertainty.

Download Full-text

Rethinking dopamine as generalized prediction error

10.1101/239731 ◽

2017 ◽

Cited By ~ 2

Author(s):

Matthew P.H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

A Pallidus-Habenula-Dopamine Pathway Signals Inferred Stimulus Values

Journal of Neurophysiology ◽

10.1152/jn.00158.2010 ◽

2010 ◽

Vol 104 (2) ◽

pp. 1068-1076 ◽

Cited By ~ 100

Author(s):

Ethan S. Bromberg-Martin ◽

Masayuki Matsumoto ◽

Simon Hong ◽

Okihide Hikosaka

Keyword(s):

Reinforcement Learning ◽

Dopamine Neurons ◽

Neural Pathway ◽

Dopamine System ◽

Lateral Habenula ◽

Reward Value ◽

Saccade Task ◽

Midbrain Dopamine ◽

Behavioral Evidence ◽

Midbrain Dopamine Neurons

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.

Download Full-text

Rethinking dopamine as generalized prediction error

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2018.1645 ◽

2018 ◽

Vol 285 (1891) ◽

pp. 20181645 ◽

Cited By ~ 32

Author(s):

Matthew P. H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

The effect of effort on reward prediction error signals in midbrain dopamine neurons

Current Opinion in Behavioral Sciences ◽

10.1016/j.cobeha.2021.07.004 ◽

2021 ◽

Vol 41 ◽

pp. 152-159

Author(s):

Shingo Tanaka ◽

Jessica E Taylor ◽

Masamichi Sakagami

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Download Full-text

Midbrain dopamine neurons encode reward value and its error on heterogeneous time scale

Neuroscience Research ◽

10.1016/j.neures.2011.07.138 ◽

2011 ◽

Vol 71 ◽

pp. e32

Author(s):

Kazuki Enomoto ◽

Naoyuki Matsumoto ◽

Minoru Kimura

Keyword(s):

Time Scale ◽

Dopamine Neurons ◽

Reward Value ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Download Full-text

Scaling of prediction error does not confirm chaotic dynamics underlying irregular firing using interspike intervals from midbrain dopamine neurons

Neuroscience ◽

10.1016/j.neuroscience.2004.08.003 ◽

2004 ◽

Vol 129 (2) ◽

pp. 491-502 ◽

Cited By ~ 11

Author(s):

C.C. Canavier ◽

S.R. Perla ◽

P.D. Shepard

Keyword(s):

Prediction Error ◽

Chaotic Dynamics ◽

Dopamine Neurons ◽

Interspike Intervals ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons ◽

Irregular Firing

Download Full-text

How We Learn to Make Decisions: Rapid Propagation of Reinforcement Learning Prediction Errors in Humans

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00509 ◽

2014 ◽

Vol 26 (3) ◽

pp. 635-644 ◽

Cited By ~ 38

Author(s):

Olav E. Krigolson ◽

Cameron D. Hassall ◽

Todd C. Handy

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Human Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Neural Basis ◽

Error Related Negativity ◽

Reward Positivity ◽

Reward Prediction ◽

Feedback Error

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

Download Full-text

Subthreshold repertoire and threshold dynamics of midbrain dopamine neuron firing in vivo

10.1101/2020.04.06.028829 ◽

2020 ◽

Author(s):

Kanako Otomo ◽

Jessica Perkins ◽

Anand Kulkarni ◽

Strahinja Stojanovic ◽

Jochen Roeper ◽

...

Keyword(s):

Action Potential ◽

Prediction Error ◽

Dopamine Neuron ◽

Dopamine Neurons ◽

Threshold Dynamics ◽

Action Potential Threshold ◽

Midbrain Dopamine ◽

Potential Threshold ◽

Midbrain Dopamine Neurons

AbstractThe firing pattern of ventral midbrain dopamine neurons is controlled by afferent and intrinsic activity to generate prediction error signals that are essential for reward-based learning. Given the absence of intracellular in vivo recordings in the last three decades, the subthreshold membrane potential events that cause changes in dopamine neuron firing patterns remain unknown. By establishing stable in vivo whole-cell recordings of >100 spontaneously active midbrain dopamine neurons in anaesthetized mice, we identified the repertoire of subthreshold membrane potential signatures associated with distinct in vivo firing patterns. We demonstrate that dopamine neuron in vivo activity deviates from a single spike pacemaker pattern by eliciting transient increases in firing rate generated by at least two diametrically opposing biophysical mechanisms: a transient depolarization resulting in high frequency plateau bursts associated with a reactive, depolarizing shift in action potential threshold; and a prolonged hyperpolarization preceding slower rebound bursts characterized by a predictive, hyperpolarizing shift in action potential threshold. Our findings therefore illustrate a framework for the biophysical implementation of prediction error and sensory cue coding in dopamine neurons by tuning action potential threshold dynamics.

Download Full-text

Reward prediction error in the ERP following unconditioned aversive stimuli

Scientific Reports ◽

10.1038/s41598-021-99408-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Harry J. Stewardson ◽

Thomas D. Sambrook

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Human Participants

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.

Download Full-text

Faculty Opinions recommendation of Midbrain dopamine neurons encode a quantitative reward prediction error signal.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1026721.325345 ◽

2005 ◽

Author(s):

Kent Berridge

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Error Signal ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Download Full-text