Climbing fibers encode a temporal-difference prediction error during cerebellar learning in mice

Shogo Ohmae; Javier F Medina

doi:10.1038/nn.4167

Number of Spikes in Climbing Fibers Determines the Direction of Cerebellar Learning

Journal of Neuroscience ◽

10.1523/jneurosci.1527-13.2013 ◽

2013 ◽

Vol 33 (33) ◽

pp. 13436-13440 ◽

Cited By ~ 41

Author(s):

A. Rasmussen ◽

D.-A. Jirenhed ◽

R. Zucca ◽

F. Johansson ◽

P. Svensson ◽

...

Keyword(s):

Climbing Fibers ◽

Cerebellar Learning

Download Full-text

Shifting Attention Using a Temporal Difference Prediction Error and High-Dimensional Input

Adaptive Behavior ◽

10.1177/1059712307078663 ◽

2007 ◽

Vol 15 (2) ◽

pp. 121-133 ◽

Cited By ~ 5

Author(s):

William H. Alexander

Keyword(s):

Prediction Error ◽

High Dimensional ◽

Temporal Difference ◽

Shifting Attention

Download Full-text

Classical conditioning drives learned reward prediction signals in climbing fibers across the lateral cerebellum

eLife ◽

10.7554/elife.46764 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 23

Author(s):

William Heffley ◽

Court Hull

Keyword(s):

Classical Conditioning ◽

Learning Rule ◽

Reward Processing ◽

Climbing Fibers ◽

Motor Behaviors ◽

Reward Prediction ◽

Before And After ◽

Classical Models ◽

Cerebellar Learning ◽

Climbing Fiber Responses

Classical models of cerebellar learning posit that climbing fibers operate according to a supervised learning rule to instruct changes in motor output by signaling the occurrence of movement errors. However, cerebellar output is also associated with non-motor behaviors, and recently with modulating reward association pathways in the VTA. To test how the cerebellum processes reward related signals in the same type of classical conditioning behavior typically studied to evaluate reward processing in the VTA and striatum, we have used calcium imaging to visualize instructional signals carried by climbing fibers across the lateral cerebellum in mice before and after learning. We find distinct climbing fiber responses in three lateral cerebellar regions that can each signal reward prediction. These instructional signals are well suited to guide cerebellar learning based on reward expectation and enable a cerebellar contribution to reward driven behaviors, suggesting a broad role for the lateral cerebellum in reward-based learning.

Download Full-text

Classical conditioning drives learned reward prediction signals in climbing fibers across the lateral cerebellum

10.1101/555508 ◽

2019 ◽

Cited By ~ 1

Author(s):

William Heffley ◽

Court Hull

Keyword(s):

Classical Conditioning ◽

Learning Rule ◽

Reward Processing ◽

Climbing Fibers ◽

Prediction Errors ◽

Reward Prediction ◽

Before And After ◽

Classical Models ◽

Cerebellar Learning ◽

Climbing Fiber Responses

AbstractClassical models of cerebellar learning posit that climbing fibers operate according to a supervised learning rule to instruct changes in motor output by signaling the occurrence of movement errors. However, cerebellar output is also associated with non-motor behaviors, and recently with modulating reward association pathways in the VTA. To test how the cerebellum processes reward related signals in the same type of classical conditioning behavior typically studied to evaluate reward processing in the VTA and striatum, we have used calcium imaging to visualize instructional signals carried by climbing fibers across the lateral cerebellum before and after learning. We find distinct climbing fiber responses in three lateral cerebellar regions that can each signal reward prediction, but not reward prediction errors per se. These instructional signals are well suited to guide cerebellar learning based on reward expectation and enable a cerebellar contribution to reward driven behaviors.

Download Full-text

Causal evidence supporting the proposal that dopamine transients function as a temporal difference prediction error

10.1101/520965 ◽

2019 ◽

Cited By ~ 2

Author(s):

Etienne JP Maes ◽

Melissa J Sharpe ◽

Matthew P.H. Gardner ◽

Chun Yun Chang ◽

Geoffrey Schoenbaum ◽

...

Keyword(s):

Prediction Error ◽

Second Order ◽

Prediction Errors ◽

Temporal Difference ◽

Reward Prediction ◽

Order Conditioning ◽

Central Tenet

Reward-evoked dopamine is well-established as a prediction error. However the central tenet of temporal difference accounts – that similar transients evoked by reward-predictive cues also function as errors – remains untested. To address this, we used two phenomena, second-order conditioning and blocking, in order to examine the role of dopamine in prediction error versus reward prediction. We show that optogenetically-shunting dopamine activity at the start of a reward-predicting cue prevents second-order conditioning without affecting blocking. These results support temporal difference accounts by providing causal evidence that cue-evoked dopamine transients function as prediction errors.

Download Full-text

A gradual backward shift of dopamine responses during associative learning

10.1101/2020.10.04.325324 ◽

2020 ◽

Author(s):

Ryunosuke Amo ◽

Akihiro Yamanaka ◽

Kenji F. Tanaka ◽

Naoshige Uchida ◽

Mitsuko Watabe-Uchida

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Temporal Difference ◽

Error Signal ◽

Learning Models ◽

Reward Delivery ◽

Backward Shift ◽

Teaching Signal

AbstractIt has been proposed that the activity of dopamine neurons approximates temporal difference (TD) prediction error, a teaching signal developed in reinforcement learning, a field of machine learning. However, whether this similarity holds true during learning remains elusive. In particular, some TD learning models predict that the error signal gradually shifts backward in time from reward delivery to a reward-predictive cue, but previous experiments failed to observe such a gradual shift in dopamine activity. Here we demonstrate conditions in which such a shift can be detected experimentally. These shared dynamics of TD error and dopamine activity narrow the gap between machine learning theory and biological brains, tightening a long-sought link.

Download Full-text

The Decision-Making Engine

How the Brain Makes Decisions ◽

10.1093/oso/9780198824367.003.0017 ◽

2020 ◽

pp. 110-116

Author(s):

Thomas Boraud

Keyword(s):

Decision Making ◽

Dopaminergic Neurons ◽

Prediction Error ◽

Temporal Difference ◽

The Neural Network ◽

System P ◽

The Subject ◽

Similarities And Differences ◽

The Difference ◽

The Moment

This chapter presents an upgrade of the neural network by implementing the reward prediction error. It then compares the final product with the actor-critic model and discusses the similarities and differences. Reinforcement learning algorithms, more specifically actor-critic models, are currently very successful in the field of decision-making. They are notably related to properties of dopaminergic neurons which have not yet been addressed in previous chapters. It has been demonstrated that dopaminergic neurons respond when the subject receives a reward or when the subject associates a conditional stimulus with the reward, and that this response to the stimulus is proportional to the utility function of the reward. In fact, dopaminergic neurons behave exactly like a process that computes temporal difference. The amplitude of their response when the reward is administered is proportional to the difference between the expected utility at time and the reward actually obtained at the moment, i.e. the temporal difference. This chapter then assesses whether the telencephalic loop is an actor-critic system.

Download Full-text

Faculty Opinions recommendation of Number of spikes in climbing fibers determines the direction of cerebellar learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718075806.793482777 ◽

2013 ◽

Author(s):

Christopher Yeo

Keyword(s):

Climbing Fibers ◽

Cerebellar Learning

Download Full-text

Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

Risks ◽

10.3390/risks8040113 ◽

2020 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Peter Bossaerts ◽

Shijie Huang ◽

Nitin Yadav

Keyword(s):

Machine Learning ◽

Prediction Error ◽

Recursive Estimation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Differential Impact ◽

Tail Risk ◽

Estimation Efficiency ◽

Expected Values ◽

Recent Extension

In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extension of RL, called distributional RL (disRL), and introduce estimation efficiency, while properly adjusting for differential impact of outliers on the two terms of the RL prediction error in the updating equations. We show that the resulting “efficient distributional RL” (e-disRL) learns much faster, and is robust once it settles on a policy. Our paper also provides a brief, nontechnical overview of machine learning, focusing on RL.

Download Full-text

Origin of error signals during cerebellar learning of motor sequences

Behavioral and Brain Sciences ◽

10.1017/s0140525x97271438 ◽

1997 ◽

Vol 20 (2) ◽

pp. 249-250 ◽

Cited By ~ 4

Author(s):

Michel Dufossé ◽

Arthur Kaladjian ◽

Philippe Grandguillaume

Keyword(s):

Prefrontal Cortex ◽

Purkinje Cells ◽

Pyramidal Cells ◽

Climbing Fibers ◽

Lateral Part ◽

Motor Sequence ◽

Attentional Load ◽

Cerebellar Learning

Prefrontal cerebral areas project to Purkinje cells, located in the most lateral part of the cerebellum, via mossy and climbing fibers. The latter olivary error signals reflect the attentional load of the prefrontal cortex. At the cerebral level, LTP-LTD plasticity allows these Purkinje cells to adaptively reinforce the active pyramidal cells involved in the motor sequence.

Download Full-text