A unified framework for dopamine signals across timescales

Mapping Intimacies ◽

10.1101/803437 ◽

2019 ◽

Cited By ~ 13

Author(s):

HyungGoo R. Kim ◽

Athar N. Malik ◽

John G. Mikhael ◽

Pol Bech ◽

Iku Tsutsui-Kimura ◽

...

Keyword(s):

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Unified Framework ◽

Phasic Activity ◽

Reward Prediction ◽

Dynamic Stimulus ◽

Midbrain Dopamine ◽

Gradual Approach ◽

Midbrain Dopamine Neurons

ABSTRACTRapid phasic activity of midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. Recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independently from somatic spiking activity. Here, we developed novel experimental paradigms using virtual reality that disambiguate RPEs from values. We examined the dopamine circuit activity at various stages including somatic spiking, axonal calcium signals, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all the stages examined. We further show that ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.

Download Full-text

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework

eLife ◽

10.7554/elife.13665 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 56

Author(s):

Brian F Sadacca ◽

Joshua L Jones ◽

Geoffrey Schoenbaum

Keyword(s):

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Use Value ◽

Value Prediction ◽

Reward Prediction ◽

Common Framework ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons ◽

Do So

Midbrain dopamine neurons have been proposed to signal reward prediction errors as defined in temporal difference (TD) learning algorithms. While these models have been extremely powerful in interpreting dopamine activity, they typically do not use value derived through inference in computing errors. This is important because much real world behavior – and thus many opportunities for error-driven learning – is based on such predictions. Here, we show that error-signaling rat dopamine neurons respond to the inferred, model-based value of cues that have not been paired with reward and do so in the same framework as they track the putative cached value of cues previously paired with reward. This suggests that dopamine neurons access a wider variety of information than contemplated by standard TD models and that, while their firing conforms to predictions of TD models in some cases, they may not be restricted to signaling errors from TD predictions.

Download Full-text

Rethinking dopamine as generalized prediction error

10.1101/239731 ◽

2017 ◽

Cited By ~ 2

Author(s):

Matthew P.H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

Rethinking dopamine as generalized prediction error

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2018.1645 ◽

2018 ◽

Vol 285 (1891) ◽

pp. 20181645 ◽

Cited By ~ 32

Author(s):

Matthew P. H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

Reward prediction error in the ERP following unconditioned aversive stimuli

Scientific Reports ◽

10.1038/s41598-021-99408-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Harry J. Stewardson ◽

Thomas D. Sambrook

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Human Participants

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.

Download Full-text

The effect of effort on reward prediction error signals in midbrain dopamine neurons

Current Opinion in Behavioral Sciences ◽

10.1016/j.cobeha.2021.07.004 ◽

2021 ◽

Vol 41 ◽

pp. 152-159

Author(s):

Shingo Tanaka ◽

Jessica E Taylor ◽

Masamichi Sakagami

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Download Full-text

Midbrain dopamine neurons provide teaching signals for goal-directed navigation

10.1101/2021.02.17.431585 ◽

2021 ◽

Author(s):

Karolina Farrell ◽

Armin Lak ◽

Aman B Saleem

Keyword(s):

Virtual Reality ◽

Visual Cues ◽

Task Engagement ◽

Dopamine Neurons ◽

List Type ◽

Prediction Errors ◽

Learning Stage ◽

Midbrain Dopamine ◽

Improved Performance ◽

Midbrain Dopamine Neurons

SummaryIn naturalistic environments, animals navigate in order to harvest rewards. Successful goal-directed navigation requires learning to accurately estimate location and select optimal state-dependent actions. Midbrain dopamine neurons are known to be involved in reward value learning1–13. They have also been linked to reward location learning, as they play causal roles in place preference14,15 and enhance spatial memory16–21. Dopamine neurons are therefore ideally placed to provide teaching signals for goal-directed navigation. To test this, we imaged dopamine neural activity as mice learned to navigate in a closed-loop virtual reality corridor and lick to report the reward location. Across learning, phasic dopamine responses developed to visual cues and trial outcome that resembled reward prediction errors and indicated the animal’s estimate of the reward location. We also observed the development of pre-reward ramping activity, the slope of which was modulated by both learning stage and task engagement. The slope of the dopamine ramp was correlated with the accuracy of licks in the next trial, suggesting that the ramp sculpted accurate location-specific action during navigation. Our results indicate that midbrain dopamine neurons, through both their phasic and ramping activity, provide teaching signals for improving goal-directed navigation.HighlightsWe investigated midbrain dopamine activity in mice learning a goal-directed navigation task in virtual realityPhasic dopamine signals reflected prediction errors with respect to subjective estimate of reward locationA slow ramp in dopamine activity leading up to reward location developed over learning and was enhanced with task engagementPositive ramp slopes were followed by improved performance on subsequent trials, suggesting a teaching role during goal-directed navigation

Download Full-text

Internal bias controls dopamine perceptual decision-related responses

10.1101/431387 ◽

2018 ◽

Author(s):

Stefania Sarno ◽

Manuel Beirán ◽

José Vergara ◽

Román Rossi-Pool ◽

Ranulfo Romo ◽

...

Keyword(s):

Delay Period ◽

Dopamine Neurons ◽

Prediction Errors ◽

Phasic Activity ◽

Perceptual Decision ◽

Reward Prediction ◽

Internal Bias ◽

Dopamine Signaling

AbstractDopamine neurons produce reward-related signals that regulate learning and guide behavior. Prior expectations about forthcoming stimuli and internal biases can alter perception and choices and thus could influence dopamine signaling. We tested this hypothesis studying dopamine neurons recorded in monkeys trained to discriminate between two tactile frequencies separated by a delay period, a task affected by the contraction bias. The bias greatly controlled the animals’ choices and confidence on their decisions. During decision formation the phasic activity reflected bias-induced modulations and simultaneously coded reward prediction errors. In contrast, the activity during the delay period was not affected by the bias, was not tuned to the value of the stimuli but was temporally modulated, pointing to a role different from that of the phasic activity.

Download Full-text

Tonic firing mode of midbrain dopamine neurons continuously tracks reward values changing moment-by-moment

10.1101/2020.09.16.300723 ◽

2020 ◽

Author(s):

Yawei Wang ◽

Osamu Toyoshima ◽

Jun Kunimatsu ◽

Hiroshi Yamada ◽

Masayuki Matsumoto

Keyword(s):

Continuous Monitoring ◽

Action Selection ◽

Dopamine Neurons ◽

Neuron Activity ◽

Tonic Activity ◽

Tonic Firing ◽

Phasic Activity ◽

Firing Mode ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

AbstractAppropriate actions are taken based on the values of future rewards. The phasic activity of midbrain dopamine neurons signals these values. Because reward values often change over time, even on a subsecond-by-subsecond basis, appropriate action selection requires continuous value monitoring. However, the phasic dopamine activity, which is sporadic and has a short duration, likely fails continuous monitoring. Here, we demonstrate a tonic firing mode of dopamine neurons that effectively tracks changing reward values. We recorded dopamine neuron activity in monkeys during a Pavlovian procedure in which the value of a cued reward gradually increased or decreased. Dopamine neurons tonically increased and decreased their activity as the reward value changed. This tonic activity was evoked more strongly by non-burst spikes than burst spikes producing a conventional phasic activity. Our findings suggest that dopamine neurons change their firing mode to effectively signal reward values, which could underlie action selection in changing environments.

Download Full-text

The timing of action determines reward prediction signals in identified midbrain dopamine neurons

Nature Neuroscience ◽

10.1038/s41593-018-0245-7 ◽

2018 ◽

Vol 21 (11) ◽

pp. 1563-1573 ◽

Cited By ~ 53

Author(s):

Luke T. Coddington ◽

Joshua T. Dudman

Keyword(s):

Dopamine Neurons ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Download Full-text

Tonic firing mode of midbrain dopamine neurons continuously tracks reward values changing moment-by-moment

eLife ◽

10.7554/elife.63166 ◽

2021 ◽

Vol 10 ◽

Author(s):

Yawei Wang ◽

Osamu Toyoshima ◽

Jun Kunimatsu ◽

Hiroshi Yamada ◽

Masayuki Matsumoto

Keyword(s):

Animal Behavior ◽

Continuous Monitoring ◽

Dopamine Neurons ◽

Neuron Activity ◽

Behavioral Regulation ◽

Tonic Firing ◽

Phasic Activity ◽

Firing Mode ◽

Midbrain Dopamine ◽

Midbrain Dopamine Neurons

Animal behavior is regulated based on the values of future rewards. The phasic activity of midbrain dopamine neurons signals these values. Because reward values often change over time, even on a subsecond-by-subsecond basis, appropriate behavioral regulation requires continuous value monitoring. However, the phasic dopamine activity, which is sporadic and has a short duration, likely fails continuous monitoring. Here, we demonstrate a tonic firing mode of dopamine neurons that effectively tracks changing reward values. We recorded dopamine neuron activity in monkeys during a Pavlovian procedure in which the value of a cued reward gradually increased or decreased. Dopamine neurons tonically increased and decreased their activity as the reward value changed. This tonic activity was evoked more strongly by non-burst spikes than burst spikes producing a conventional phasic activity. Our findings suggest that dopamine neurons change their firing mode to effectively signal reward values in a given situation.

Download Full-text