scholarly journals Impaired expected value computations coupled with overreliance on prediction error learning in schizophrenia

2017 ◽  
Author(s):  
D Hernaus ◽  
JM Gold ◽  
JA Waltz ◽  
MJ Frank

AbstractBackgroundWhile many have emphasized impaired reward prediction error (RPE) signaling in schizophrenia, multiple studies suggest that some decision-making deficits may arise from overreliance on RPE systems together with a compromised ability to represent expected value. Guided by computational frameworks, we formulated and tested two scenarios in which maladaptive representation of expected value should be most evident, thereby delineating conditions that may evoke decision-making impairments in schizophrenia.MethodsIn a modified reinforcement learning paradigm, 42 medicated people with schizophrenia (PSZ) and 36 healthy volunteers learned to select the most frequently rewarded option in a 75-25 pair: once when presented with more deterministic (90–10) and once when presented with more probabilistic (60–40) pairs. Novel and old combinations of choice options were presented in a subsequent transfer phase. Computational modeling was employed to elucidate contributions from RPE systems (“actor-critic”) and expected value (“Q-leaming”).ResultsPSZ showed robust performance impairments with increasing value difference between two competing options, which strongly correlated with decreased contributions from expected value-based (“Q-leaming”) learning. Moreover, a subtle yet consistent contextual choice bias for the “probabilistic” 75 option was present in PSZ, which could be accounted for by a context-dependent RPE in the “actor-critic”.ConclusionsWe provide evidence that decision-making impairments in schizophrenia increase monotonically with demands placed on expected value computations. A contextual choice bias is consistent with overreliance on RPE-based learning, which may signify a deficit secondary to the maladaptive representation of expected value. These results shed new light on conditions under which decisionmaking impairments may arise.

2020 ◽  
Vol 6 (45) ◽  
pp. eabc9321
Author(s):  
David J. Ottenheimer ◽  
Karen Wang ◽  
Xiao Tong ◽  
Kurt M. Fraser ◽  
Jocelyn M. Richard ◽  
...  

A key function of the nervous system is producing adaptive behavior across changing conditions, like physiological state. Although states like thirst and hunger are known to impact decision-making, the neurobiology of this phenomenon has been studied minimally. Here, we tracked evolving preference for sucrose and water as rats proceeded from a thirsty to sated state. As rats shifted from water choices to sucrose choices across the session, the activity of a majority of neurons in the ventral pallidum, a region crucial for reward-related behaviors, closely matched the evolving behavioral preference. The timing of this signal followed the pattern of a reward prediction error, occurring at the cue or the reward depending on when reward identity was revealed. Additionally, optogenetic stimulation of ventral pallidum neurons at the time of reward was able to reverse behavioral preference. Our results suggest that ventral pallidum neurons guide reward-related decisions across changing physiological states.


2021 ◽  
Author(s):  
Rachit Dubey ◽  
Mark K Ho ◽  
Hermish Mehta ◽  
Tom Griffiths

Psychologists have long been fascinated with understanding the nature of Aha! moments, moments when we transition from not knowing to suddenly realizing the solution to a problem. In this work, we present a theoretical framework that explains when and why we experience Aha! moments. Our theory posits that during problem-solving, in addition to solving the problem, people also maintain a meta-cognitive model of their ability to solve the problem as well as a prediction about the time it would take them to solve that problem. Aha! moments arise when we experience a positive error in this meta-cognitive prediction, i.e. when we solve a problem much faster than we expected to solve it. We posit that this meta-cognitive error is analogous to a positive reward prediction error thereby explaining why we feel so good after an Aha! moment. A large-scale pre-registered experiment on anagram solving supports this theory, showing that people's time prediction errors are strongly correlated with their ratings of an Aha! experience while solving anagrams. A second experiment provides further evidence to our theory by demonstrating a causal link between time prediction errors and the Aha! experience. These results highlight the importance of meta-cognitive prediction errors and deepen our understanding of human meta-reasoning.


2019 ◽  
Author(s):  
Melissa J. Sharpe ◽  
Hannah M. Batchelor ◽  
Lauren E. Mueller ◽  
Chun Yun Chang ◽  
Etienne J.P. Maes ◽  
...  

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexandre Y. Dombrovski ◽  
Beatriz Luna ◽  
Michael N. Hallquist

Abstract When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.


2014 ◽  
Vol 26 (3) ◽  
pp. 635-644 ◽  
Author(s):  
Olav E. Krigolson ◽  
Cameron D. Hassall ◽  
Todd C. Handy

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.


2017 ◽  
Vol 29 (5) ◽  
pp. 1689-1705 ◽  
Author(s):  
Mattia I. Gerin ◽  
Vanessa B. Puetz ◽  
R. James R. Blair ◽  
Stuart White ◽  
Arjun Sethi ◽  
...  

AbstractAlterations in reinforcement-based decision making may be associated with increased psychiatric vulnerability in children who have experienced maltreatment. A probabilistic passive avoidance task and a model-based functional magnetic resonance imaging analytic approach were implemented to assess the neurocomputational components underlying decision making: (a) reinforcement expectancies (the representation of the outcomes associated with a stimulus) and (b) prediction error signaling (the ability to detect the differences between expected and actual outcomes). There were three main findings. First, the maltreated group (n = 18; mean age = 13), relative to nonmaltreated peers (n = 19; mean age = 13), showed decreased activity during expected value processing in a widespread network commonly associated with reinforcement expectancies representation, including the striatum (especially the caudate), the orbitofrontal cortex, and medial temporal structures including the hippocampus and insula. Second, consistent with previously reported hyperresponsiveness to negative cues in the context of childhood abuse, the maltreated group showed increased prediction error signaling in the middle cingulate gyrus, somatosensory cortex, superior temporal gyrus, and thalamus. Third, the maltreated group showed increased activity in frontodorsal regions and in the putamen during expected value representation. These findings suggest that early adverse environments disrupt the development of decision-making processes, which in turn may compromise psychosocial functioning in ways that increase latent vulnerability to psychiatric disorder.


2020 ◽  
Author(s):  
He A. Xu ◽  
Alireza Modirshanechi ◽  
Marco P. Lehmann ◽  
Wulfram Gerstner ◽  
Michael H. Herzog

AbstractDrivers of reinforcement learning (RL), beyond reward, are controversially debated. Novelty and surprise are often used equivocally in this debate. Here, using a deep sequential decision-making paradigm, we show that reward, novelty, and surprise play different roles in human RL. Surprise controls the rate of learning, whereas novelty and the novelty prediction error (NPE) drive exploration. Exploitation is dominated by model-free (habitual) action choices. A theory that takes these separate effects into account predicts on average 73 percent of the action choices of human participants after the first encounter of a reward and allows us to dissociate surprise and novelty in the EEG signal. While the event-related potential (ERP) at around 300ms is positively correlated with surprise, novelty, NPE, reward, and the reward prediction error, the ERP response to novelty and NPE starts earlier than that to surprise.


2018 ◽  
Author(s):  
Samuel D. McDougle ◽  
Peter A. Butcher ◽  
Darius Parvin ◽  
Fasial Mushtaq ◽  
Yael Niv ◽  
...  

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.


2018 ◽  
Author(s):  
Joanne C. Van Slooten ◽  
Sara Jahfari ◽  
Tomas Knapen ◽  
Jan Theeuwes

AbstractPupil responses have been used to track cognitive processes during decision-making. Studies have shown that in these cases the pupil reflects the joint activation of many cortical and subcortical brain regions, also those traditionally implicated in value-based learning. However, how the pupil tracks value-based decisions and reinforcement learning is unknown. We combined a reinforcement learning task with a computational model to study pupil responses during value-based decisions, and decision evaluations. We found that the pupil closely tracks reinforcement learning both across trials and participants. Prior to choice, the pupil dilated as a function of trial-by-trial fluctuations in value beliefs. After feedback, early dilation scaled with value uncertainty, whereas later constriction scaled with reward prediction errors. Our computational approach systematically implicates the pupil in value-based decisions, and the subsequent processing of violated value beliefs, ttese dissociable influences provide an exciting possibility to non-invasively study ongoing reinforcement learning in the pupil.


Sign in / Sign up

Export Citation Format

Share Document