Learning the payoffs and costs of actions

Mapping Intimacies ◽

10.1101/346114 ◽

2018 ◽

Author(s):

Moritz Möller ◽

Rafal Bogacz

Keyword(s):

Basal Ganglia ◽

Weak Decay ◽

Specific Aspect ◽

Prediction Errors ◽

Negative Consequences ◽

Motivational State ◽

Vertebrate Brain ◽

Reward Prediction ◽

Dopaminergic Modulation ◽

Overall Evaluation

AbstractA set of sub-cortical nuclei called basal ganglia is critical for learning the values of actions. The basal ganglia include two pathways, which have been associated with approach and avoid behavior respectively, and are differentially modulated by dopamine projections from the midbrain. According to the influential opponent actor learning model, these pathways represent learned estimates of the positive and negative consequences (payoffs and costs) of actions. The level of dopamine release controls to what extent payoffs and costs enter the overall evaluation of actions. How the knowledge about payoff and cost is acquired is still an open question, even though many theories describe learning from feedback in the basal ganglia. We examine whether a set of plasticity rules proposed to model reinforcement learning in the pathways of the basal ganglia is suitable to extract payoffs and costs from a reward prediction error signal. First, we determine the result of such learning, both analytically and via simulations, for different reward schedules that feature payoffs and costs. Then, we combine the plasticity rules with a decision rule to examine the emerging effect of dopaminergic modulation on the willingness to work for reward. We find that the plasticity rules are suitable to infer the mean payoffs and costs of actions, if those occur at different moments in time. Successful learning requires differential effects of positive and negative reward prediction errors on the two pathways, and a weak decay of synaptic weights over trials. We also confirm that dopaminergic modulation produces effects on the willingness to work for reward similar to those observed in classical experiments.Author summaryThe basal ganglia are structures underneath the surface of the vertebrate brain, associated with error driven learning. Much is known about the anatomical and biological features of the basal ganglia; scientists now try to understand the algorithms implemented by these structures. Numerous models aspire to capture the learning functionality, but many of them only cover some specific aspect of the algorithm. Instead of further adding to that pool of partial models, we unify two existing ones - one which captures what the basal ganglia learns, and one that describes the learning mechanism itself. The first model suggests that the basal ganglia keeps track of both positive and negative consequences of frequent opportunities, and weighs these by the motivational state in decisions. It explains how payoff and cost are represented, but not how those representations arise. The other model consists of biologically plausible plasticity rules, which describe how learning takes place, but not how the brain makes use of what is learned. We show that the two theories are compatible. Together, they form a model of learning and decision making that integrates the motivational state as well as the learned payoffs and costs of opportunities.

Download Full-text

Theory of reinforcement learning and motivation in the basal ganglia

10.1101/174524 ◽

2017 ◽

Cited By ~ 1

Author(s):

Rafal Bogacz

Keyword(s):

Synaptic Plasticity ◽

Reinforcement Learning ◽

Basal Ganglia ◽

Dopaminergic Neurons ◽

Neural Circuits ◽

Negative Consequences ◽

Striatal Neurons ◽

Motivational State ◽

Level Of Activity ◽

Dopaminergic Modulation

AbstractThis paper proposes how the neural circuits in vertebrates select actions on the basis of past experience and the current motivational state. According to the presented theory, the basal ganglia evaluate the utility of considered actions by combining the positive consequences (e.g. nutrition) scaled by the motivational state (e.g. hunger) with the negative consequences (e.g. effort). The theory suggests how the basal ganglia compute utility by combining the positive and negative consequences encoded in the synaptic weights of striatal Go and No-Go neurons, and the motivational state carried by neuromodulators including dopamine. Furthermore, the theory suggests how the striatal neurons to learn separately about consequences of actions, and how the dopaminergic neurons themselves learn what level of activity they need to produce to optimize behaviour. The theory accounts for the effects of dopaminergic modulation on behaviour, patterns of synaptic plasticity in striatum, and responses of dopaminergic neurons in diverse situations.

Download Full-text

What, If, and When to Move: Basal Ganglia Circuits and Self-Paced Action Initiation

Annual Review of Neuroscience ◽

10.1146/annurev-neuro-072116-031033 ◽

2019 ◽

Vol 42 (1) ◽

pp. 459-483 ◽

Cited By ~ 40

Author(s):

Andreas Klaus ◽

Joaquim Alves da Silva ◽

Rui M. Costa

Keyword(s):

Basal Ganglia ◽

Prediction Errors ◽

Movement Initiation ◽

Reward Prediction ◽

Behavioral Transitions ◽

Main Input ◽

Action Initiation

Deciding what to do and when to move is vital to our survival. Clinical and fundamental studies have identified basal ganglia circuits as critical for this process. The main input nucleus of the basal ganglia, the striatum, receives inputs from frontal, sensory, and motor cortices and interconnected thalamic areas that provide information about potential goals, context, and actions and directly or indirectly modulates basal ganglia outputs. The striatum also receives dopaminergic inputs that can signal reward prediction errors and also behavioral transitions and movement initiation. Here we review studies and models of how direct and indirect pathways can modulate basal ganglia outputs to facilitate movement initiation, and we discuss the role of cortical and dopaminergic inputs to the striatum in determining what to do and if and when to do it. Complex but exciting scenarios emerge that shed new light on how basal ganglia circuits modulate self-paced movement initiation.

Download Full-text

Reward Prediction Errors Drive Declarative Learning Irrespective of Agency

10.31234/osf.io/63g9w ◽

2020 ◽

Author(s):

Kate Ergo ◽

Luna De Vilder ◽

Esther De Loof ◽

Tom Verguts

Keyword(s):

Learning Theory ◽

Learning Effect ◽

Steady Increase ◽

Prediction Errors ◽

Experimental Paradigm ◽

Reward Prediction ◽

Declarative Learning

Recent years have witnessed a steady increase in the number of studies investigating the role of reward prediction errors (RPEs) in declarative learning. Specifically, in several experimental paradigms RPEs drive declarative learning; with larger and more positive RPEs enhancing declarative learning. However, it is unknown whether this RPE must derive from the participant’s own response, or whether instead any RPE is sufficient to obtain the learning effect. To test this, we generated RPEs in the same experimental paradigm where we combined an agency and a non-agency condition. We observed no interaction between RPE and agency, suggesting that any RPE (irrespective of its source) can drive declarative learning. This result holds implications for declarative learning theory.

Download Full-text

Dissociating the effect of reward uncertainty and timing uncertainty on neural indices of reward prediction errors: A reward positivity (RewP) event-related potential (ERP) study

Biological Psychology ◽

10.1016/j.biopsycho.2021.108121 ◽

2021 ◽

pp. 108121

Author(s):

Alexandra M. Muir ◽

Addison C. Eberhard ◽

Megan S. Walker ◽

Angus Bennion ◽

Mikle South ◽

...

Keyword(s):

Event Related Potential ◽

Prediction Errors ◽

Reward Positivity ◽

Reward Prediction ◽

Timing Uncertainty

Download Full-text

Emotion prediction errors guide socially adaptive behavior

10.31234/osf.io/azeyk ◽

2021 ◽

Author(s):

Joseph Heffner ◽

Jae-Young Son ◽

Oriel FeldmanHall

Keyword(s):

Decision Making ◽

Real Time ◽

Adaptive Behavior ◽

Emotional Response ◽

New Method ◽

Prediction Errors ◽

Emotional Experiences ◽

Past Work ◽

Reward Prediction ◽

Expected Outcomes

People make decisions based on deviations from expected outcomes, known as prediction errors. Past work has focused on reward prediction errors, largely ignoring violations of expected emotional experiences—emotion prediction errors. We leverage a new method to measure real-time fluctuations in emotion as people decide to punish or forgive others. Across four studies (N=1,016), we reveal that emotion and reward prediction errors have distinguishable contributions to choice, such that emotion prediction errors exert the strongest impact during decision-making. We additionally find that a choice to punish or forgive can be decoded in less than a second from an evolving emotional response, suggesting emotions swiftly influence choice. Finally, individuals reporting significant levels of depression exhibit selective impairments in using emotion—but not reward—prediction errors. Evidence for emotion prediction errors potently guiding social behaviors challenge standard decision-making models that have focused solely on reward.

Download Full-text

Reward prediction errors drive declarative learning irrespective of agency

Psychonomic Bulletin & Review ◽

10.3758/s13423-021-01952-7 ◽

2021 ◽

Author(s):

Kate Ergo ◽

Luna De Vilder ◽

Esther De Loof ◽

Tom Verguts

Keyword(s):

Prediction Errors ◽

Reward Prediction ◽

Declarative Learning

Download Full-text

Reward Prediction Errors Reflect an Underlying Learning Process That Parallels Behavioural Adaptations: A Trial-to-Trial Analysis

Computational Brain & Behavior ◽

10.1007/s42113-019-00069-4 ◽

2019 ◽

Vol 3 (2) ◽

pp. 189-199 ◽

Cited By ~ 2

Author(s):

Chad C. Williams ◽

Cameron D. Hassall ◽

Talise Lindenbach ◽

Olave E. Krigolson

Keyword(s):

Learning Process ◽

Prediction Errors ◽

Reward Prediction ◽

Trial Analysis ◽

Behavioural Adaptations

Download Full-text

Who Pays the Price for Parental Education–Occupation Mismatch? Evidence From an Israeli City

SAGE Open ◽

10.1177/2158244019835916 ◽

2019 ◽

Vol 9 (1) ◽

pp. 215824401983591

Author(s):

Yariv Feniger ◽

Anastasia Gorodzeisky ◽

Michal Krumer-Nevo

Keyword(s):

High School ◽

High School Students ◽

Parental Education ◽

Maternal Education ◽

Social Research ◽

Specific Aspect ◽

Negative Consequences ◽

School Students ◽

Science Courses ◽

School Truancy

In recent years, education–occupation mismatch has become an important area of social research. However, little is known about its impact on the intergenerational transmission of educational attainment. This study investigates the possible negative consequences of a specific aspect of parental education–occupation mismatch, also known as overeducation, for high school students. Drawing from a sample of high school students in an Israeli city with a high incidence of overeducation, our analysis suggests that parental education–occupation mismatch does not affect student expectations for progressing to higher education. The results did reveal, however, that maternal education–occupation mismatch is related to school truancy among boys and girls, and that paternal education–occupation mismatch contributes to lower odds of enrollment in advanced science courses, especially among boys.

Download Full-text

When theory and biology differ: The relationship between reward prediction errors and expectancy

Biological Psychology ◽

10.1016/j.biopsycho.2017.09.007 ◽

2017 ◽

Vol 129 ◽

pp. 265-272 ◽

Cited By ~ 5

Author(s):

Chad C. Williams ◽

Cameron D. Hassall ◽

Robert Trska ◽

Clay B. Holroyd ◽

Olave E. Krigolson

Keyword(s):

Prediction Errors ◽

Reward Prediction ◽

The Relationship

Download Full-text

Abnormal prefrontal cortex processing of reward prediction errors in recently diagnosed patients with bipolar disorder and their unaffected relatives

Bipolar Disorders ◽

10.1111/bdi.12915 ◽

2020 ◽

Vol 22 (8) ◽

pp. 849-859

Author(s):

Julian Macoveanu ◽

Hanne L. Kjærstad ◽

Henry W. Chase ◽

Sophia Frangou ◽

Gitte M. Knudsen ◽

...

Keyword(s):

Bipolar Disorder ◽

Prefrontal Cortex ◽

Prediction Errors ◽

Reward Prediction

Download Full-text