The Homeostatic Logic of Reward

Mapping Intimacies ◽

10.1101/242974 ◽

2018 ◽

Cited By ~ 3

Author(s):

Tobias Morville ◽

Karl Friston ◽

Denis Burdakov ◽

Hartwig R. Siebner ◽

Oliver J. Hulme

Keyword(s):

Reinforcement Learning ◽

Energy Homeostasis ◽

Active Inference ◽

Neural Basis ◽

Homeostatic Control ◽

Dopaminergic Cells ◽

Time Horizons ◽

Metabolic States ◽

Circular Definitions ◽

Insight Into

AbstractEnergy homeostasis depends on behavior to predictively regulate metabolic states within narrow bounds. Here we review three theories of homeostatic control and ask how they provide insight into the circuitry underlying energy homeostasis. We offer two contributions. First, we detail how control theory and reinforcement learning are applied to homeostatic control. We show how these schemes rest on implausible assumptions; either via circular definitions, unprincipled drive functions, or by ignoring environmental volatility. We argue active inference can elude these shortcomings while retaining important features of each model. Second, we review the neural basis of energetic control. We focus on a subset of arcuate subpopulations that project directly to, and are thus in a privileged position to opponently modulate, dopaminergic cells as a function of energetic predictions over a spectrum of time horizons. We discuss how this can be interpreted under these theories, and how this can resolve paradoxes that have arisen. We propose this circuit constitutes a homeostatic-reward interface that underwrites the conjoint optimisation of physiological and behavioural homeostasis.

Download Full-text

Neurocomputational Theories of Homeostatic Control

10.31234/osf.io/s2q46 ◽

2019 ◽

Author(s):

Oliver J Hulme ◽

Thobias Morville ◽

Boris Gutkin

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Autonomic Regulation ◽

Behavioural Control ◽

Active Inference ◽

Homeostatic Control ◽

Inference Models ◽

Key Concepts ◽

Internal States

Homeostasis is a problem for all living agents. It entails predictively regulating internal states within the bounds compatible with survival in order to maximise fitness. This can be achieved physiologically, through complex hierarchies of autonomic regulation, but it must also be achieved via behavioural control. Here we review some of the major theories of homeostatic control and their historical cognates, addressing how they tackle the optimisation of both physiological and behavioural homeostasis. We start with optimal control approaches, setting up key concepts, and expanding on their limitations. We then move onto contemporary approaches, in particularly focusing on a branch of reinforcement learning known as homeostatic reinforcement learning (HRL). Weexplain its main advantages, empirical applications, and conceptual insights. We then outline some challenges to HRL and reinforcement learning in general, and how survival constraints and Active Inference models could circumvent these problems.

Download Full-text

Faculty Opinions recommendation of The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1010152.145565 ◽

2002 ◽

Author(s):

Randall C O'Reilly

Keyword(s):

Reinforcement Learning ◽

Human Error ◽

Error Processing ◽

Neural Basis ◽

Error Related Negativity

Download Full-text

Language statistical learning responds to reinforcement learning principles rooted in the striatum

PLoS Biology ◽

10.1371/journal.pbio.3001119 ◽

2021 ◽

Vol 19 (9) ◽

pp. e3001119

Author(s):

Joan Orpella ◽

Ernest Mas-Herrero ◽

Pablo Ripollés ◽

Josep Marco-Pallarés ◽

Ruth de Diego-Balaguer

Keyword(s):

Reinforcement Learning ◽

Language Learning ◽

Statistical Learning ◽

Dorsal Striatum ◽

Rule Learning ◽

Prediction Errors ◽

Neural Basis ◽

Structural Rules ◽

Learning Principles ◽

Striatal Function

Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.

Download Full-text

Insight into the neural basis of why we feel how we feel during exercise

Frontiers in Human Neuroscience ◽

10.3389/conf.fnhum.2013.213.00047 ◽

2013 ◽

Vol 7 ◽

Author(s):

Tempest Gavin ◽

Parfitt Gaynor ◽

Ekkekakis Panteleimon

Keyword(s):

Neural Basis ◽

Insight Into

Download Full-text

Ramping and phasic dopamine activity accounts for efficient cognitive resource allocation during reinforcement learning

10.1101/381103 ◽

2018 ◽

Author(s):

Minryung R. Song ◽

Sang Wan Lee

Keyword(s):

Experimental Data ◽

Resource Allocation ◽

Reinforcement Learning ◽

Learning Model ◽

Temporal Difference ◽

Value Change ◽

Cognitive Resource ◽

A Value ◽

Insight Into ◽

And Task

AbstractDopamine activity may transition between two patterns: phasic responses to reward-predicting cues and ramping activity arising when an agent approaches the reward. However, when and why dopamine activity transitions between these modes is not understood. We hypothesize that the transition between ramping and phasic patterns reflects resource allocation which addresses the task dimensionality problem during reinforcement learning (RL). By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared it with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent narrows down candidate stimuli for the task; the opposite occurs when the agent needs to re-learn candidate stimuli due to a value change. These results lend insight into how dopamine deals with the tradeoff between cognitive resource and task dimensionality during RL.

Download Full-text

Variability in Action Selection Relates to Striatal Dopamine 2/3 Receptor Availability in Humans: A PET Neuroimaging Study Using Reinforcement Learning and Active Inference Models

Cerebral Cortex ◽

10.1093/cercor/bhz327 ◽

2020 ◽

Vol 30 (6) ◽

pp. 3573-3589 ◽

Cited By ~ 1

Author(s):

Rick A Adams ◽

Michael Moutoussis ◽

Matthew M Nour ◽

Tarik Dahoun ◽

Declan Lewis ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Model Comparison ◽

Behavioral Model ◽

Action Selection ◽

Striatal Dopamine ◽

Active Inference ◽

Inference Models ◽

Positron Emission ◽

Dopamine Signaling

Abstract Choosing actions that result in advantageous outcomes is a fundamental function of nervous systems. All computational decision-making models contain a mechanism that controls the variability of (or confidence in) action selection, but its neural implementation is unclear—especially in humans. We investigated this mechanism using two influential decision-making frameworks: active inference (AI) and reinforcement learning (RL). In AI, the precision (inverse variance) of beliefs about policies controls action selection variability—similar to decision ‘noise’ parameters in RL—and is thought to be encoded by striatal dopamine signaling. We tested this hypothesis by administering a ‘go/no-go’ task to 75 healthy participants, and measuring striatal dopamine 2/3 receptor (D2/3R) availability in a subset (n = 25) using [11C]-(+)-PHNO positron emission tomography. In behavioral model comparison, RL performed best across the whole group but AI performed best in participants performing above chance levels. Limbic striatal D2/3R availability had linear relationships with AI policy precision (P = 0.029) as well as with RL irreducible decision ‘noise’ (P = 0.020), and this relationship with D2/3R availability was confirmed with a ‘decision stochasticity’ factor that aggregated across both models (P = 0.0006). These findings are consistent with occupancy of inhibitory striatal D2/3Rs decreasing the variability of action selection in humans.

Download Full-text

Artistic Production Following Brain Damage: A Study of Three Artists

Leonardo ◽

10.1162/leon_a_00240 ◽

2011 ◽

Vol 44 (5) ◽

pp. 405-410 ◽

Cited By ~ 13

Author(s):

Anjan Chatterjee ◽

Bianca Bromberger ◽

William B. Smith ◽

Rebecca Sternschein ◽

Page Widick

Keyword(s):

Brain Damage ◽

Right Hemisphere ◽

Neural Basis ◽

Left Brain ◽

Right Hemisphere Damage ◽

Right Brain ◽

Art Production ◽

The Right ◽

Left Brain Damage ◽

Insight Into

We know little about the neurologic bases of art production. The idea that the right brain hemisphere is the “artistic brain” is widely held, despite the lack of evidence for this claim. Artists with brain damage can offer insight into these laterality questions. The authors used an instrument called the Assessment of Art Attributes to examine the work of two individuals with left-brain damage and one with right-hemisphere damage. In each case, their art became more abstract and distorted and less realistic. They also painted with looser strokes, less depth and more vibrant colors. No unique pattern was observed following right-brain damage. However, art produced after left-brain damage also became more symbolic. These results show that the neural basis of art production is distributed across both hemispheres in the human brain.

Download Full-text

The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice

Journal of Neurophysiology ◽

10.1152/jn.00046.2016 ◽

2016 ◽

Vol 115 (6) ◽

pp. 3195-3203 ◽

Cited By ~ 13

Author(s):

Simon Dunne ◽

Arun D'Souza ◽

John P. O'Doherty

Keyword(s):

Reinforcement Learning ◽

Experiential Learning ◽

Observational Learning ◽

Ventral Striatum ◽

Learning System ◽

Model Based ◽

Model Free ◽

Human Participants ◽

Open Question ◽

Insight Into

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning.

Download Full-text

Intelligence moderates reinforcement learning: a mini-review of the neural evidence

Journal of Neurophysiology ◽

10.1152/jn.00600.2014 ◽

2015 ◽

Vol 113 (10) ◽

pp. 3459-3461 ◽

Cited By ~ 5

Author(s):

Chong Chen

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dorsolateral Prefrontal Cortex ◽

Anterior Cingulate ◽

Neural Responses ◽

Key Factors ◽

Neural Basis ◽

Dorsal Anterior Cingulate Cortex ◽

Neural Signal ◽

Dorsolateral Prefrontal

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.

Download Full-text

TRADER DYNAMICS IN A MODEL MARKET

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024900000358 ◽

2000 ◽

Vol 03 (03) ◽

pp. 443-450 ◽

Cited By ~ 34

Author(s):

NEIL F. JOHNSON ◽

MICHAEL HART ◽

PAK MING HUI ◽

DAFANG ZHENG

Keyword(s):

Financial Markets ◽

Historical Data ◽

Market Price ◽

Heterogeneous Population ◽

Market Participants ◽

Time Horizons ◽

Large Fluctuations ◽

Individual Traders ◽

Model Market ◽

Insight Into

We explore various extensions of Challet and Zhang's Minority Game in an attempt to gain insight into the dynamics underlying financial markets. First we consider a heterogeneous population where individual traders employ differing "time horizons" when making predictions based on historical data. The resulting average winnings per trader is a highly non-linear function of the population's composition. Second, we introduce a threshold confidence level among traders below which they will not trade. This can give rise to large fluctuations in the "volume" of market participants and the resulting market "price".

Download Full-text