Disrupted reinforcement learning during post-error slowing in ADHD

Mapping Intimacies ◽

10.1101/449975 ◽

2018 ◽

Author(s):

Andre Chevrier ◽

Mehereen Bhaijiwala ◽

Jonathan Lipszyc ◽

Douglas Cheyne ◽

Simon Graham ◽

...

Keyword(s):

Reinforcement Learning ◽

Error Detection ◽

Ventral Striatum ◽

Dorsal Striatum ◽

Stop Signal Task ◽

Prediction Errors ◽

Error Magnitude ◽

Response Phase ◽

Dopaminergic Neurotransmitter ◽

Phase Activity

AbstractADHD is associated with altered dopamine regulated reinforcement learning on prediction errors. Despite evidence of categorically altered error processing in ADHD, neuroimaging advances have largely investigated models of normal reinforcement learning in greater detail. Further, although reinforcement leaning critically relies on ventral striatum exerting error magnitude related thresholding influences on substantia nigra (SN) and dorsal striatum, these thresholding influences have never been identified with neuroimaging. To identify such thresholding influences, we propose that error magnitude related activities must first be separated from opposite activities in overlapping neural regions during error detection. Here we separate error detection from magnitude related adjustment (post-error slowing) during inhibition errors in the stop signal task in typically developing (TD) and ADHD adolescents using fMRI. In TD, we predicted that: 1) deactivation of dorsal striatum on error detection interrupts ongoing processing, and should be proportional to right frontoparietal response phase activity that has been observed in the SST; 2) deactivation of ventral striatum on post-error slowing exerts thresholding influences on, and should be proportional to activity in dorsal striatum. In ADHD, we predicted that ventral striatum would instead correlate with heightened amygdala responses to errors. We found deactivation of dorsal striatum on error detection correlated with response-phase activity in both groups. In TD, post-error slowing deactivation of ventral striatum correlated with activation of dorsal striatum. In ADHD, ventral striatum correlated with heightened amygdala activity. Further, heightened activities in locus coeruleus (norepinephrine), raphe nucleus (serotonin) and medial septal nuclei (acetylcholine), which all compete for control of DA, and are altered in ADHD, exhibited altered correlations with SN. All correlations in TD were replicated in healthy adults. Results in TD are consistent with dopamine regulated reinforcement learning on post-error slowing. In ADHD, results are consistent with heightened activities in the amygdala and non-dopaminergic neurotransmitter nuclei preventing reinforcement learning.

Download Full-text

Language statistical learning responds to reinforcement learning principles rooted in the striatum

PLoS Biology ◽

10.1371/journal.pbio.3001119 ◽

2021 ◽

Vol 19 (9) ◽

pp. e3001119

Author(s):

Joan Orpella ◽

Ernest Mas-Herrero ◽

Pablo Ripollés ◽

Josep Marco-Pallarés ◽

Ruth de Diego-Balaguer

Keyword(s):

Reinforcement Learning ◽

Language Learning ◽

Statistical Learning ◽

Dorsal Striatum ◽

Rule Learning ◽

Prediction Errors ◽

Neural Basis ◽

Structural Rules ◽

Learning Principles ◽

Striatal Function

Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.

Download Full-text

Statistical learning as reinforcement learning phenomena

10.1101/2021.01.28.428582 ◽

2021 ◽

Author(s):

J Orpella ◽

E Mas-Herrero ◽

P Ripollés ◽

J Marco-Pallarés ◽

R de Diego-Balaguer

Keyword(s):

Reinforcement Learning ◽

Language Learning ◽

Statistical Learning ◽

Computational Modelling ◽

Dorsal Striatum ◽

Rule Learning ◽

Prediction Errors ◽

Neural Basis ◽

Structural Rules ◽

Learning Principles

AbstractStatistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online statistical learning of language rules combined with computational modelling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate - on two different cohorts - that a Temporal Difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the longstanding gap between language learning and reinforcement learning phenomena.

Download Full-text

Dorsal Striatal–midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2009.21092 ◽

2009 ◽

Vol 21 (7) ◽

pp. 1332-1345 ◽

Cited By ~ 67

Author(s):

Thorsten Kahnt ◽

Soyoung Q Park ◽

Michael X Cohen ◽

Anne Beck ◽

Andreas Heinz ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Functional Connectivity ◽

Ventral Striatum ◽

Critical Role ◽

Prediction Errors ◽

Midbrain Neurons ◽

Reward Prediction ◽

Future Behavior ◽

The Impact

It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to the DS and the VS seem to play a critical role in this functional distinction. Here, subjects performed a dynamic, reward-based decision-making task during fMRI acquisition. A computational model of reinforcement learning was used to estimate the different effects of positive and negative reinforcements on future decisions for each subject individually. We found that activity in both the DS and the VS correlated with reward prediction errors. Using functional connectivity, we show that the DS and the VS are differentially connected to different midbrain regions (possibly corresponding to the substantia nigra [SN] and the ventral tegmental area [VTA], respectively). However, only functional connectivity between the DS and the putative SN predicted the impact of different reinforcement types on future behavior. These results suggest that connections between the putative SN and the DS are critical for modulating action values in the DS according to both positive and negative reinforcements to guide future decision making.

Download Full-text

Acute stress blunts prediction error signals in the dorsal striatum during reinforcement learning

10.1101/2021.02.11.430640 ◽

2021 ◽

Author(s):

Joana Carvalheiro ◽

Vasco A. Conceição ◽

Ana Mesquita ◽

Ana Seara-Cardoso

Keyword(s):

Reinforcement Learning ◽

Acute Stress ◽

Dorsal Striatum ◽

Learning Task ◽

Neural Mechanisms ◽

Prediction Errors ◽

Functional Magnetic Resonance ◽

Behavioural Performance ◽

And Control ◽

Gains And Losses

AbstractReinforcement learning, which implicates learning from the rewarding and punishing outcomes of our choices, is critical for adjusted behaviour. Acute stress seems to affect this ability but the neural mechanisms by which it disrupts this type of learning are still poorly understood. Here, we investigate whether and how acute stress blunts neural signalling of prediction errors during reinforcement learning using model-based functional magnetic resonance imaging. Male participants completed a well-established reinforcement learning task involving monetary gains and losses whilst under stress and control conditions. Acute stress impaired participants’ behavioural performance towards obtaining monetary gains, but not towards avoiding losses. Importantly, acute stress blunted signalling of prediction errors during gain and loss trials in the dorsal striatum — with subsidiary analyses suggesting that acute stress preferentially blunted signalling of positive prediction errors. Our results thus reveal a neurocomputational mechanism by which acute stress may impair reward learning.

Download Full-text

Dopamine D2/3 Receptor Availabilities and Evoked Dopamine Release in Striatum Differentially Predict Impulsivity and Novelty Preference in Roman High- and Low-Avoidance Rats

The International Journal of Neuropsychopharmacology ◽

10.1093/ijnp/pyaa084 ◽

2020 ◽

Author(s):

Lidia Bellés ◽

Andrea Dimiziani ◽

Stergios Tsartsalis ◽

Philippe Millet ◽

François R Herrmann ◽

...

Keyword(s):

Ventral Striatum ◽

Single Photon ◽

Ex Vivo ◽

Photon Emission ◽

Dorsal Striatum ◽

Reaction Time Task ◽

Emission Computed Tomography ◽

Novelty Preference ◽

Da Release

Abstract Background Impulsivity and novelty preference are both associated with an increased propensity to develop addiction-like behaviors, but their relationship and respective underlying dopamine (DA) underpinnings are not fully elucidated. Methods We evaluated a large cohort (n = 49) of Roman high- and low-avoidance rats using single photon emission computed tomography to concurrently measure in vivo striatal D2/3 receptor (D2/3R) availability and amphetamine (AMPH)-induced DA release in relation to impulsivity and novelty preference using a within-subject design. To further examine the DA-dependent processes related to these traits, midbrain D2/3-autoreceptor levels were measured using ex vivo autoradiography in the same animals. Results We replicated a robust inverse relationship between impulsivity, as measured with the 5-choice serial reaction time task, and D2/3R availability in ventral striatum and extended this relationship to D2/3R levels measured in dorsal striatum. Novelty preference was positively related to impulsivity and showed inverse associations with D2/3R availability in dorsal striatum and ventral striatum. A high magnitude of AMPH-induced DA release in striatum predicted both impulsivity and novelty preference, perhaps owing to the diminished midbrain D2/3-autoreceptor availability measured in high-impulsive/novelty-preferring Roman high-avoidance animals that may amplify AMPH effect on DA transmission. Mediation analyses revealed that while D2/3R availability and AMPH-induced DA release in striatum are both significant predictors of impulsivity, the effect of striatal D2/3R availability on novelty preference is fully mediated by evoked striatal DA release. Conclusions Impulsivity and novelty preference are related but mediated by overlapping, yet dissociable, DA-dependent mechanisms in striatum that may interact to promote the emergence of an addiction-prone phenotype.

Download Full-text

Expectancies in decision making, reinforcement learning, and ventral striatum

Frontiers in Neuroscience ◽

10.3389/neuro.01.006.2010 ◽

2010 ◽

Cited By ~ 3

Author(s):

Matthijs A. A. van der Meer

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Ventral Striatum

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Attenuated directed exploration during reinforcement learning in gambling disorder

10.1101/823583 ◽

2019 ◽

Cited By ~ 3

Author(s):

A. Wiehler ◽

K. Chakroun ◽

J. Peters

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Gambling Disorder ◽

Brain Activity ◽

Clinical Status ◽

Classical Problem ◽

Behavioral Flexibility ◽

Network Connectivity ◽

Prediction Errors ◽

Reward Contingencies

AbstractGambling disorder is a behavioral addiction associated with impairments in decision-making and reduced behavioral flexibility. Decision-making in volatile environments requires a flexible trade-off between exploitation of options with high expected values and exploration of novel options to adapt to changing reward contingencies. This classical problem is known as the exploration-exploitation dilemma. We hypothesized gambling disorder to be associated with a specific reduction in directed (uncertainty-based) exploration compared to healthy controls, accompanied by changes in brain activity in a fronto-parietal exploration-related network.Twenty-three frequent gamblers and nineteen matched controls performed a classical four-armed bandit task during functional magnetic resonance imaging. Computational modeling revealed that choice behavior in both groups contained signatures of directed exploration, random exploration and perseveration. Gamblers showed a specific reduction in directed exploration, while random exploration and perseveration were similar between groups.Neuroimaging revealed no evidence for group differences in neural representations of expected value and reward prediction errors. Likewise, our hypothesis of attenuated fronto-parietal exploration effects in gambling disorder was not supported. However, during directed exploration, gamblers showed reduced parietal and substantia nigra / ventral tegmental area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of clinical status, suggesting alterations in network dynamics in gambling disorder.In sum, we show that reduced flexibility during reinforcement learning in volatile environments in gamblers is attributable to a reduction in directed exploration rather than an increase in perseveration. Neuroimaging findings suggest that patterns of network connectivity might be more diagnostic of gambling disorder than univariate value and prediction error effects. We provide a computational account of flexibility impairments in gamblers during reinforcement learning that might arise as a consequence of dopaminergic dysregulation in this disorder.

Download Full-text

A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

Neuronal Activity in the Rodent Dorsal Striatum in Sequential Navigation: Separation of Spatial and Reward Responses on the Multiple T Task

Journal of Neurophysiology ◽

10.1152/jn.00687.2003 ◽

2004 ◽

Vol 91 (5) ◽

pp. 2259-2272 ◽

Cited By ~ 97

Author(s):

Neil Schmitzer-Torbert ◽

A. David Redish

Keyword(s):

Reinforcement Learning ◽

Spatial Representation ◽

Sensory Input ◽

Learning Algorithm ◽

Dorsal Striatum ◽

Food Delivery ◽

Simple Motor ◽

Experimental Approaches ◽

Highly Correlated ◽

To Receive

The striatum plays an important role in “habitual” learning and memory and has been hypothesized to implement a reinforcement-learning algorithm to select actions to perform given the current sensory input. Many experimental approaches to striatal activity have made use of temporally structured tasks, which imply that the striatal representation is temporal. To test this assumption, we recorded neurons in the dorsal striatum of rats running a sequential navigation task: the multiple T maze. Rats navigated a sequence of four T maze turns to receive food rewards delivered in two locations. The responses of neurons that fired phasically were examined. Task-responsive phasic neurons were active as rats ran on the maze (maze-responsive) or during reward receipt (reward-responsive). Neither mazenor reward-responsive neurons encoded simple motor commands: maze-responses were not well correlated with the shape of the rat's path and most reward-responsive neurons did not fire at similar rates at both food-delivery sites. Maze-responsive neurons were active at one or more locations on the maze, but these responses did not cluster at spatial landmarks such as turns. Across sessions the activity of maze-responsive neurons was highly correlated when rats ran the same maze. Maze-responses encoded the location of the rat on the maze and imply a spatial representation in the striatum in a task with prominent spatial demands. Maze-responsive and reward-responsive neurons were two separate populations, suggesting a divergence in striatal information processing of navigation and reward.

Download Full-text