scholarly journals The Homeostatic Logic of Reward

2018 ◽  
Author(s):  
Tobias Morville ◽  
Karl Friston ◽  
Denis Burdakov ◽  
Hartwig R. Siebner ◽  
Oliver J. Hulme

AbstractEnergy homeostasis depends on behavior to predictively regulate metabolic states within narrow bounds. Here we review three theories of homeostatic control and ask how they provide insight into the circuitry underlying energy homeostasis. We offer two contributions. First, we detail how control theory and reinforcement learning are applied to homeostatic control. We show how these schemes rest on implausible assumptions; either via circular definitions, unprincipled drive functions, or by ignoring environmental volatility. We argue active inference can elude these shortcomings while retaining important features of each model. Second, we review the neural basis of energetic control. We focus on a subset of arcuate subpopulations that project directly to, and are thus in a privileged position to opponently modulate, dopaminergic cells as a function of energetic predictions over a spectrum of time horizons. We discuss how this can be interpreted under these theories, and how this can resolve paradoxes that have arisen. We propose this circuit constitutes a homeostatic-reward interface that underwrites the conjoint optimisation of physiological and behavioural homeostasis.

2019 ◽  
Author(s):  
Oliver J Hulme ◽  
Thobias Morville ◽  
Boris Gutkin

Homeostasis is a problem for all living agents. It entails predictively regulating internal states within the bounds compatible with survival in order to maximise fitness. This can be achieved physiologically, through complex hierarchies of autonomic regulation, but it must also be achieved via behavioural control. Here we review some of the major theories of homeostatic control and their historical cognates, addressing how they tackle the optimisation of both physiological and behavioural homeostasis. We start with optimal control approaches, setting up key concepts, and expanding on their limitations. We then move onto contemporary approaches, in particularly focusing on a branch of reinforcement learning known as homeostatic reinforcement learning (HRL). Weexplain its main advantages, empirical applications, and conceptual insights. We then outline some challenges to HRL and reinforcement learning in general, and how survival constraints and Active Inference models could circumvent these problems.


PLoS Biology ◽  
2021 ◽  
Vol 19 (9) ◽  
pp. e3001119
Author(s):  
Joan Orpella ◽  
Ernest Mas-Herrero ◽  
Pablo Ripollés ◽  
Josep Marco-Pallarés ◽  
Ruth de Diego-Balaguer

Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.


Author(s):  
Tempest Gavin ◽  
Parfitt Gaynor ◽  
Ekkekakis Panteleimon
Keyword(s):  

2018 ◽  
Author(s):  
Minryung R. Song ◽  
Sang Wan Lee

AbstractDopamine activity may transition between two patterns: phasic responses to reward-predicting cues and ramping activity arising when an agent approaches the reward. However, when and why dopamine activity transitions between these modes is not understood. We hypothesize that the transition between ramping and phasic patterns reflects resource allocation which addresses the task dimensionality problem during reinforcement learning (RL). By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared it with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent narrows down candidate stimuli for the task; the opposite occurs when the agent needs to re-learn candidate stimuli due to a value change. These results lend insight into how dopamine deals with the tradeoff between cognitive resource and task dimensionality during RL.


2020 ◽  
Vol 30 (6) ◽  
pp. 3573-3589 ◽  
Author(s):  
Rick A Adams ◽  
Michael Moutoussis ◽  
Matthew M Nour ◽  
Tarik Dahoun ◽  
Declan Lewis ◽  
...  

Abstract Choosing actions that result in advantageous outcomes is a fundamental function of nervous systems. All computational decision-making models contain a mechanism that controls the variability of (or confidence in) action selection, but its neural implementation is unclear—especially in humans. We investigated this mechanism using two influential decision-making frameworks: active inference (AI) and reinforcement learning (RL). In AI, the precision (inverse variance) of beliefs about policies controls action selection variability—similar to decision ‘noise’ parameters in RL—and is thought to be encoded by striatal dopamine signaling. We tested this hypothesis by administering a ‘go/no-go’ task to 75 healthy participants, and measuring striatal dopamine 2/3 receptor (D2/3R) availability in a subset (n = 25) using [11C]-(+)-PHNO positron emission tomography. In behavioral model comparison, RL performed best across the whole group but AI performed best in participants performing above chance levels. Limbic striatal D2/3R availability had linear relationships with AI policy precision (P = 0.029) as well as with RL irreducible decision ‘noise’ (P = 0.020), and this relationship with D2/3R availability was confirmed with a ‘decision stochasticity’ factor that aggregated across both models (P = 0.0006). These findings are consistent with occupancy of inhibitory striatal D2/3Rs decreasing the variability of action selection in humans.


Leonardo ◽  
2011 ◽  
Vol 44 (5) ◽  
pp. 405-410 ◽  
Author(s):  
Anjan Chatterjee ◽  
Bianca Bromberger ◽  
William B. Smith ◽  
Rebecca Sternschein ◽  
Page Widick

We know little about the neurologic bases of art production. The idea that the right brain hemisphere is the “artistic brain” is widely held, despite the lack of evidence for this claim. Artists with brain damage can offer insight into these laterality questions. The authors used an instrument called the Assessment of Art Attributes to examine the work of two individuals with left-brain damage and one with right-hemisphere damage. In each case, their art became more abstract and distorted and less realistic. They also painted with looser strokes, less depth and more vibrant colors. No unique pattern was observed following right-brain damage. However, art produced after left-brain damage also became more symbolic. These results show that the neural basis of art production is distributed across both hemispheres in the human brain.


2016 ◽  
Vol 115 (6) ◽  
pp. 3195-3203 ◽  
Author(s):  
Simon Dunne ◽  
Arun D'Souza ◽  
John P. O'Doherty

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning.


2015 ◽  
Vol 113 (10) ◽  
pp. 3459-3461 ◽  
Author(s):  
Chong Chen

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.


2000 ◽  
Vol 03 (03) ◽  
pp. 443-450 ◽  
Author(s):  
NEIL F. JOHNSON ◽  
MICHAEL HART ◽  
PAK MING HUI ◽  
DAFANG ZHENG

We explore various extensions of Challet and Zhang's Minority Game in an attempt to gain insight into the dynamics underlying financial markets. First we consider a heterogeneous population where individual traders employ differing "time horizons" when making predictions based on historical data. The resulting average winnings per trader is a highly non-linear function of the population's composition. Second, we introduce a threshold confidence level among traders below which they will not trade. This can give rise to large fluctuations in the "volume" of market participants and the resulting market "price".


Sign in / Sign up

Export Citation Format

Share Document