scholarly journals Occasion setters determine responses of putative dopamine neurons to discriminative stimuli

2019 ◽  
Author(s):  
Luca Aquili ◽  
Eric M. Bowman ◽  
Robert Schmidt

AbstractMidbrain dopamine (DA) neurons are involved in the processing of rewards and reward-predicting stimuli, possibly analogous to reinforcement learning reward prediction errors. Here we studied the activity of putative DA neurons (n=41) recorded in the ventral tegmental area of rats (n=6) performing a behavioural task involving occasion setting. In this task an occasion setter (OS) indicated that the relationship between a discriminative stimulus (DS) and reinforcement is in effect, so that reinforcement of bar pressing occurred only after the OS (tone or houselight) was followed by the DS (houselight or tone). We found that responses of putative DA cells to the DS were enhanced when preceded by the OS, as were behavioural responses to obtain rewards. Surprisingly though, we did not find a population response of putative DA neurons to the OS, contrary to predictions of standard temporal-difference models of DA neurons. However, despite the absence of a population response, putative DA neurons exhibited a heterogeneous response on a single unit level, so that some units increased and others decreased their activity as a response to the OS. Similarly, putative non-DA cells did not respond to the DS on a population level, but with heterogeneous responses on a single unit level. The heterogeneity in the responses of putative DA cells may reflect how DA neurons encode context and point to local differences in DA signalling.

2017 ◽  
Author(s):  
Matthew P.H. Gardner ◽  
Geoffrey Schoenbaum ◽  
Samuel J. Gershman

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.


2017 ◽  
Vol 29 (12) ◽  
pp. 3311-3326 ◽  
Author(s):  
Samuel J. Gershman

The hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue preexposure (latent inhibition), and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.


eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Brian F Sadacca ◽  
Joshua L Jones ◽  
Geoffrey Schoenbaum

Midbrain dopamine neurons have been proposed to signal reward prediction errors as defined in temporal difference (TD) learning algorithms. While these models have been extremely powerful in interpreting dopamine activity, they typically do not use value derived through inference in computing errors. This is important because much real world behavior – and thus many opportunities for error-driven learning – is based on such predictions. Here, we show that error-signaling rat dopamine neurons respond to the inferred, model-based value of cues that have not been paired with reward and do so in the same framework as they track the putative cached value of cues previously paired with reward. This suggests that dopamine neurons access a wider variety of information than contemplated by standard TD models and that, while their firing conforms to predictions of TD models in some cases, they may not be restricted to signaling errors from TD predictions.


2018 ◽  
Vol 285 (1891) ◽  
pp. 20181645 ◽  
Author(s):  
Matthew P. H. Gardner ◽  
Geoffrey Schoenbaum ◽  
Samuel J. Gershman

Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.


2019 ◽  
Author(s):  
HyungGoo R. Kim ◽  
Athar N. Malik ◽  
John G. Mikhael ◽  
Pol Bech ◽  
Iku Tsutsui-Kimura ◽  
...  

ABSTRACTRapid phasic activity of midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. Recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independently from somatic spiking activity. Here, we developed novel experimental paradigms using virtual reality that disambiguate RPEs from values. We examined the dopamine circuit activity at various stages including somatic spiking, axonal calcium signals, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all the stages examined. We further show that ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.


2017 ◽  
Author(s):  
Samuel J. Gershman

AbstractThe hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue pre-exposure (latent inhibition) and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.


eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Hideyuki Matsumoto ◽  
Ju Tian ◽  
Naoshige Uchida ◽  
Mitsuko Watabe-Uchida

Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.


2017 ◽  
Vol 129 ◽  
pp. 265-272 ◽  
Author(s):  
Chad C. Williams ◽  
Cameron D. Hassall ◽  
Robert Trska ◽  
Clay B. Holroyd ◽  
Olave E. Krigolson

Sign in / Sign up

Export Citation Format

Share Document