scholarly journals Causal evidence supporting the proposal that dopamine transients function as a temporal difference prediction error

2019 ◽  
Author(s):  
Etienne JP Maes ◽  
Melissa J Sharpe ◽  
Matthew P.H. Gardner ◽  
Chun Yun Chang ◽  
Geoffrey Schoenbaum ◽  
...  

Reward-evoked dopamine is well-established as a prediction error. However the central tenet of temporal difference accounts – that similar transients evoked by reward-predictive cues also function as errors – remains untested. To address this, we used two phenomena, second-order conditioning and blocking, in order to examine the role of dopamine in prediction error versus reward prediction. We show that optogenetically-shunting dopamine activity at the start of a reward-predicting cue prevents second-order conditioning without affecting blocking. These results support temporal difference accounts by providing causal evidence that cue-evoked dopamine transients function as prediction errors.

2021 ◽  
Vol 15 ◽  
Author(s):  
Arthur Prével ◽  
Ruth M. Krebs

In a new environment, humans and animals can detect and learn that cues predict meaningful outcomes, and use this information to adapt their responses. This process is termed Pavlovian conditioning. Pavlovian conditioning is also observed for stimuli that predict outcome-associated cues; a second type of conditioning is termed higher-order Pavlovian conditioning. In this review, we will focus on higher-order conditioning studies with simultaneous and backward conditioned stimuli. We will examine how the results from these experiments pose a challenge to models of Pavlovian conditioning like the Temporal Difference (TD) models, in which learning is mainly driven by reward prediction errors. Contrasting with this view, the results suggest that humans and animals can form complex representations of the (temporal) structure of the task, and use this information to guide behavior, which seems consistent with model-based reinforcement learning. Future investigations involving these procedures could result in important new insights on the mechanisms that underlie Pavlovian conditioning.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.


2014 ◽  
Vol 26 (3) ◽  
pp. 467-471 ◽  
Author(s):  
Samuel J. Gershman

Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: a quadratic transformation of proximity to the goal implies approximately linear ramping, as observed experimentally.


2020 ◽  
Author(s):  
Kate Ergo ◽  
Luna De Vilder ◽  
Esther De Loof ◽  
Tom Verguts

Recent years have witnessed a steady increase in the number of studies investigating the role of reward prediction errors (RPEs) in declarative learning. Specifically, in several experimental paradigms RPEs drive declarative learning; with larger and more positive RPEs enhancing declarative learning. However, it is unknown whether this RPE must derive from the participant’s own response, or whether instead any RPE is sufficient to obtain the learning effect. To test this, we generated RPEs in the same experimental paradigm where we combined an agency and a non-agency condition. We observed no interaction between RPE and agency, suggesting that any RPE (irrespective of its source) can drive declarative learning. This result holds implications for declarative learning theory.


Author(s):  
Michiel Van Elk ◽  
Harold Bekkering

We characterize theories of conceptual representation as embodied, disembodied, or hybrid according to their stance on a number of different dimensions: the nature of concepts, the relation between language and concepts, the function of concepts, the acquisition of concepts, the representation of concepts, and the role of context. We propose to extend an embodied view of concepts, by taking into account the importance of multimodal associations and predictive processing. We argue that concepts are dynamically acquired and updated, based on recurrent processing of prediction error signals in a hierarchically structured network. Concepts are thus used as prior models to generate multimodal expectations, thereby reducing surprise and enabling greater precision in the perception of exemplars. This view places embodied theories of concepts in a novel predictive processing framework, by highlighting the importance of concepts for prediction, learning and shaping categories on the basis of prediction errors.


2018 ◽  
Vol 72 (6) ◽  
pp. 1453-1465 ◽  
Author(s):  
Arthur Prével ◽  
Vinca Rivière ◽  
Jean-Claude Darcheville ◽  
Gonzalo P Urcelay ◽  
Ralph R Miller

Prével and colleagues reported excitatory learning with a backward conditioned stimulus (CS) in a conditioned reinforcement preparation. Their results add to existing evidence of backward CSs sometimes being excitatory and were viewed as challenging the view that learning is driven by prediction error reduction, which assumes that only predictive (i.e., forward) relationships are learned. The results instead were consistent with the assumptions of both Miller’s Temporal Coding Hypothesis and Wagner’s Sometimes Opponent Processes (SOP) model. The present experiment extended the conditioned reinforcement preparation developed by Prével et al. to a backward second-order conditioning preparation, with the aim of discriminating between these two accounts. We tested whether a second-order CS can serve as an effective conditioned reinforcer, even when the first-order CS with which it was paired is a backward CS that elicits no responding. Evidence of conditioned reinforcement was found, despite no conditioned response (CR) being elicited by the first-order backward CS. The evidence of second-order conditioning in the absence of excitatory conditioning to the first-order CS is interpreted as a challenge to SOP. In contrast, the present results are consistent with the Temporal Coding Hypothesis and constitute a conceptual replication in humans of previous reports of excitatory second-order conditioning in rodents with a backward CS. The proposal is made that learning is driven by “discrepancy” with prior experience as opposed to “ prediction error.”


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Maya G. Mosner ◽  
R. Edward McLaurin ◽  
Jessica L. Kinard ◽  
Shabnam Hakimi ◽  
Jacob Parelman ◽  
...  

Few studies have explored neural mechanisms of reward learning in ASD despite evidence of behavioral impairments of predictive abilities in ASD. To investigate the neural correlates of reward prediction errors in ASD, 16 adults with ASD and 14 typically developing controls performed a prediction error task during fMRI scanning. Results revealed greater activation in the ASD group in the left paracingulate gyrus during signed prediction errors and the left insula and right frontal pole during thresholded unsigned prediction errors. Findings support atypical neural processing of reward prediction errors in ASD in frontostriatal regions critical for prediction coding and reward learning. Results provide a neural basis for impairments in reward learning that may contribute to traits common in ASD (e.g., intolerance of unpredictability).


Author(s):  
Joseph W. Barter ◽  
Suellen Li ◽  
Dongye Lu ◽  
Ryan A. Bartholomew ◽  
Mark A. Rossi ◽  
...  

2014 ◽  
Vol 26 (3) ◽  
pp. 447-458 ◽  
Author(s):  
Ernest Mas-Herrero ◽  
Josep Marco-Pallarés

In decision-making processes, the relevance of the information yielded by outcomes varies across time and situations. It increases when previous predictions are not accurate and in contexts with high environmental uncertainty. Previous fMRI studies have shown an important role of medial pFC in coding both reward prediction errors and the impact of this information to guide future decisions. However, it is unclear whether these two processes are dissociated in time or occur simultaneously, suggesting that a common mechanism is engaged. In the present work, we studied the modulation of two electrophysiological responses associated to outcome processing—the feedback-related negativity ERP and frontocentral theta oscillatory activity—with the reward prediction error and the learning rate. Twenty-six participants performed two learning tasks differing in the degree of predictability of the outcomes: a reversal learning task and a probabilistic learning task with multiple blocks of novel cue–outcome associations. We implemented a reinforcement learning model to obtain the single-trial reward prediction error and the learning rate for each participant and task. Our results indicated that midfrontal theta activity and feedback-related negativity increased linearly with the unsigned prediction error. In addition, variations of frontal theta oscillatory activity predicted the learning rate across tasks and participants. These results support the existence of a common brain mechanism for the computation of unsigned prediction error and learning rate.


Sign in / Sign up

Export Citation Format

Share Document