scholarly journals Climbing fibers encode a temporal-difference prediction error during cerebellar learning in mice

2015 ◽  
Vol 18 (12) ◽  
pp. 1798-1803 ◽  
Author(s):  
Shogo Ohmae ◽  
Javier F Medina
2013 ◽  
Vol 33 (33) ◽  
pp. 13436-13440 ◽  
Author(s):  
A. Rasmussen ◽  
D.-A. Jirenhed ◽  
R. Zucca ◽  
F. Johansson ◽  
P. Svensson ◽  
...  

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
William Heffley ◽  
Court Hull

Classical models of cerebellar learning posit that climbing fibers operate according to a supervised learning rule to instruct changes in motor output by signaling the occurrence of movement errors. However, cerebellar output is also associated with non-motor behaviors, and recently with modulating reward association pathways in the VTA. To test how the cerebellum processes reward related signals in the same type of classical conditioning behavior typically studied to evaluate reward processing in the VTA and striatum, we have used calcium imaging to visualize instructional signals carried by climbing fibers across the lateral cerebellum in mice before and after learning. We find distinct climbing fiber responses in three lateral cerebellar regions that can each signal reward prediction. These instructional signals are well suited to guide cerebellar learning based on reward expectation and enable a cerebellar contribution to reward driven behaviors, suggesting a broad role for the lateral cerebellum in reward-based learning.


2019 ◽  
Author(s):  
William Heffley ◽  
Court Hull

AbstractClassical models of cerebellar learning posit that climbing fibers operate according to a supervised learning rule to instruct changes in motor output by signaling the occurrence of movement errors. However, cerebellar output is also associated with non-motor behaviors, and recently with modulating reward association pathways in the VTA. To test how the cerebellum processes reward related signals in the same type of classical conditioning behavior typically studied to evaluate reward processing in the VTA and striatum, we have used calcium imaging to visualize instructional signals carried by climbing fibers across the lateral cerebellum before and after learning. We find distinct climbing fiber responses in three lateral cerebellar regions that can each signal reward prediction, but not reward prediction errors per se. These instructional signals are well suited to guide cerebellar learning based on reward expectation and enable a cerebellar contribution to reward driven behaviors.


2019 ◽  
Author(s):  
Etienne JP Maes ◽  
Melissa J Sharpe ◽  
Matthew P.H. Gardner ◽  
Chun Yun Chang ◽  
Geoffrey Schoenbaum ◽  
...  

Reward-evoked dopamine is well-established as a prediction error. However the central tenet of temporal difference accounts – that similar transients evoked by reward-predictive cues also function as errors – remains untested. To address this, we used two phenomena, second-order conditioning and blocking, in order to examine the role of dopamine in prediction error versus reward prediction. We show that optogenetically-shunting dopamine activity at the start of a reward-predicting cue prevents second-order conditioning without affecting blocking. These results support temporal difference accounts by providing causal evidence that cue-evoked dopamine transients function as prediction errors.


2020 ◽  
Author(s):  
Ryunosuke Amo ◽  
Akihiro Yamanaka ◽  
Kenji F. Tanaka ◽  
Naoshige Uchida ◽  
Mitsuko Watabe-Uchida

AbstractIt has been proposed that the activity of dopamine neurons approximates temporal difference (TD) prediction error, a teaching signal developed in reinforcement learning, a field of machine learning. However, whether this similarity holds true during learning remains elusive. In particular, some TD learning models predict that the error signal gradually shifts backward in time from reward delivery to a reward-predictive cue, but previous experiments failed to observe such a gradual shift in dopamine activity. Here we demonstrate conditions in which such a shift can be detected experimentally. These shared dynamics of TD error and dopamine activity narrow the gap between machine learning theory and biological brains, tightening a long-sought link.


Author(s):  
Thomas Boraud

This chapter presents an upgrade of the neural network by implementing the reward prediction error. It then compares the final product with the actor-critic model and discusses the similarities and differences. Reinforcement learning algorithms, more specifically actor-critic models, are currently very successful in the field of decision-making. They are notably related to properties of dopaminergic neurons which have not yet been addressed in previous chapters. It has been demonstrated that dopaminergic neurons respond when the subject receives a reward or when the subject associates a conditional stimulus with the reward, and that this response to the stimulus is proportional to the utility function of the reward. In fact, dopaminergic neurons behave exactly like a process that computes temporal difference. The amplitude of their response when the reward is administered is proportional to the difference between the expected utility at time and the reward actually obtained at the moment, i.e. the temporal difference. This chapter then assesses whether the telencephalic loop is an actor-critic system.


Risks ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Peter Bossaerts ◽  
Shijie Huang ◽  
Nitin Yadav

In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extension of RL, called distributional RL (disRL), and introduce estimation efficiency, while properly adjusting for differential impact of outliers on the two terms of the RL prediction error in the updating equations. We show that the resulting “efficient distributional RL” (e-disRL) learns much faster, and is robust once it settles on a policy. Our paper also provides a brief, nontechnical overview of machine learning, focusing on RL.


1997 ◽  
Vol 20 (2) ◽  
pp. 249-250 ◽  
Author(s):  
Michel Dufossé ◽  
Arthur Kaladjian ◽  
Philippe Grandguillaume

Prefrontal cerebral areas project to Purkinje cells, located in the most lateral part of the cerebellum, via mossy and climbing fibers. The latter olivary error signals reflect the attentional load of the prefrontal cortex. At the cerebral level, LTP-LTD plasticity allows these Purkinje cells to adaptively reinforce the active pyramidal cells involved in the motor sequence.


Sign in / Sign up

Export Citation Format

Share Document