Trading off the cost of conflict against expected rewards

Mapping Intimacies ◽

10.1101/412809 ◽

2018 ◽

Author(s):

Nura Sidarus ◽

Stefano Palminteri ◽

Valérian Chambon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Reversal Learning ◽

Computational Models ◽

Learning Task ◽

Learning Rates ◽

Different Types ◽

Irrelevant Distractors ◽

The Cost

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.

Download Full-text

Intermittent Absence of Control during Reinforcement Learning Interferes with Pavlovian Bias in Action Selection

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01515 ◽

2020 ◽

Vol 32 (4) ◽

pp. 646-663 ◽

Cited By ~ 1

Author(s):

Gábor Csifcsák ◽

Eirik Melsæter ◽

Matthias Mittner

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Learned Helplessness ◽

Well Being ◽

Learning Task ◽

Healthy Adults ◽

Task Demands ◽

Behavioral Manipulation ◽

Different Response

The ability to control the occurrence of rewarding and punishing events is crucial for our well-being. Two ways to optimize performance are to follow heuristics like Pavlovian biases to approach reward and avoid loss or to rely more on slowly accumulated stimulus–action associations. Although reduced control over outcomes has been linked to suboptimal decision-making in clinical conditions associated with learned helplessness, it is unclear how uncontrollability of the environment is related to the arbitration between different response strategies. This study directly tested whether a behavioral manipulation designed to induce learned helplessness in healthy adults (intermittent loss of control over feedback in a reinforcement learning task; “yoking”) would modulate the magnitude of Pavlovian bias and the neurophysiological signature of cognitive control (frontal midline theta power) in healthy adults. Using statistical analysis and computational modeling of behavioral data and electroencephalographic signals, we found stronger Pavlovian influences and alterations in frontal theta activity in the yoked group. However, these effects were not accompanied by reduced performance in experimental blocks with regained control, indicating that our behavioral manipulation was not potent enough for inducing helplessness and impaired coping ability with task demands. We conclude that the level of contingency between instrumental choices and rewards/punishments modulates Pavlovian bias during value-based decision-making, probably via interfering with the implementation of cognitive control. These findings might have implications for understanding the mechanisms underlying helplessness in various psychiatric conditions.

Download Full-text

Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Journal of the International Neuropsychological Society ◽

10.1017/s1355617721001326 ◽

2021 ◽

pp. 1-13

Author(s):

Christina E. Wierenga ◽

Erin Reilly ◽

Amanda Bischoff-Grethe ◽

Walter H. Kaye ◽

Gregory G. Brown

Keyword(s):

Anorexia Nervosa ◽

Reinforcement Learning ◽

Computational Models ◽

Learning Task ◽

Maladaptive Behavior ◽

Prediction Errors ◽

Negative Consequences ◽

Diagnostic And Statistical Manual ◽

Learning Rates ◽

Reward And Punishment

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.

Download Full-text

Intermittent absence of control during reinforcement learning interferes with Pavlovian bias in action selection

10.31234/osf.io/jpq6f ◽

2019 ◽

Cited By ~ 2

Author(s):

Gábor Csifcsák ◽

Eirik Melsæter ◽

Matthias Mittner

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Learned Helplessness ◽

Well Being ◽

Learning Task ◽

Healthy Adults ◽

Behavioral Manipulation ◽

Frontal Midline Theta ◽

Different Response

The ability to control the occurrence of rewarding and punishing events is crucial for our well-being. Two ways to optimize performance are to follow heuristics like Pavlovian biases to approach reward and avoid loss, or to rely more on slowly accumulated stimulus-action associations. Although reduced control over outcomes has been linked to suboptimal decision-making in clinical conditions associated with learned helplessness, it is unclear how uncontrollability of the environment is related to the arbitration between different response strategies.This study directly tested whether a behavioral manipulation designed to induce learned helplessness in healthy adults (intermittent loss of control over feedback in a reinforcement learning task; “yoking”) would modulate the magnitude of Pavlovian bias and the neurophysiological signature of cognitive control (frontal midline theta power) in healthy adults. Using statistical analysis and computational modeling of behavioral data and electroencephalographic signals, we found stronger Pavlovian influences and alterations in frontal theta activity in the yoked group. However, these effects were not accompanied by reduced performance in experimental blocks with regained control, indicating that our behavioral manipulation was not potent enough for inducing helplessness and impaired coping ability with task demands.We conclude that the level of contingency between instrumental choices and rewards/punishments modulates Pavlovian bias during value-based decision-making, probably via interfering with the implementation of cognitive control. These findings might have implications for understanding the mechanisms underlying helplessness in various psychiatric conditions.

Download Full-text

Reward and punishment reversal learning in major depressive disorder

10.31234/osf.io/aqgx3 ◽

2020 ◽

Author(s):

Dahlia Mukherjee ◽

Alexandre Leo Stephen Filipowicz ◽

Khoi D. Vo ◽

Theodore Sattherwaite ◽

Joe Kable

Keyword(s):

Major Depressive Disorder ◽

Reinforcement Learning ◽

Depressive Disorder ◽

Computational Modeling ◽

Reversal Learning ◽

Performance Metrics ◽

Learning Task ◽

Major Depressive ◽

Learning Rates ◽

Reward And Punishment

Depression has been associated with impaired reward and punishment processing, but the specific nature of these deficits is less understood and still widely debated. We analyzed reinforcement-based decision-making in individuals diagnosed with major depressive disorder (MDD) to identify the specific decision mechanisms contributing to poorer performance. Individuals with MDD (n = 64) and matched healthy controls (n = 64) performed a probabilistic reversal learning task in which they used feedback to identify which of two stimuli had the highest probability of reward (reward condition) or lowest probability of punishment (punishment condition). Learning differences were characterized using a hierarchical Bayesian reinforcement learning model. While both groups showed reinforcement learning-like behavior, depressed individuals made fewer optimal choices and adjusted more slowly to reversals in both the reward and punishment conditions. Our computational modeling analysis found that depressed individuals showed lower learning rates and, to a lesser extent, lower value sensitivity in both the reward and punishment conditions. Learning rates also predicted depression more accurately than simple performance metrics. These results demonstrate that depression is characterized by a hyposensitivity to positive outcomes, which influences the rate at which depressed individuals learn from feedback, but not a hypersensitivity to negative outcomes as has previously been suggested. Additionally, we demonstrate that computational modeling provides a more precise characterization of the dynamics contributing to these learning deficits, and offers stronger insights into the mechanistic processes affected by depression.

Download Full-text

Knowledge Acquired from Learning: New Evidence of Hierarchical Conceptualization

Perceptual and Motor Skills ◽

10.2466/pms.1994.79.2.975 ◽

1994 ◽

Vol 79 (2) ◽

pp. 975-993 ◽

Cited By ~ 3

Author(s):

Alberto Montare

Keyword(s):

Sex Differences ◽

Knowledge Acquisition ◽

Reversal Learning ◽

Concept Formation ◽

Learning Task ◽

Learning Rates ◽

Level 3 ◽

Level 1 ◽

New Evidence ◽

Level 2

Following successful inductive acquisition of procedural cognition of a discrimination-reversal learning task, 50 female and 50 male undergraduates articulated declarative cognizance of knowledge acquired from learning. Tests of four hypotheses showed that (1) increasingly higher levels of declarative cognizance were associated with faster learning rates, (2) six new cases of cognition-without-cognizance were observed, (3) students presumably using secondary signalization learned faster than those presumably using primary signalization, and (4) no sex differences in learning rates or declarative cognizance were observed. The notion that explicit levels of declarative cognizance may represent implicit hierarchical conceptualization comprised of four systems of knowledge acquisition led to the conclusions that primary signalization may account for inductive senscept formation at Level 1 and for inductive percept formation at Level 2, whereas emergent secondary signalization may account for inductive precept formation at Level 3 and for inductive concept formation at Level 4.

Download Full-text

Online Pandora’s Boxes and Bandits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011885 ◽

2019 ◽

Vol 33 ◽

pp. 1885-1892 ◽

Cited By ~ 1

Author(s):

Hossein Esfandiari ◽

MohammadTaghi HajiAghayi ◽

Brendan Lucier ◽

Michael Mitzenmacher

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Approximation Algorithms ◽

Standard Model ◽

Decision Process ◽

Bandit Problems ◽

Knapsack Constraints ◽

Feasibility Constraints ◽

The Cost

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.

Download Full-text

Reward sensitivity differs depending on global self-esteem in value-based decision-making

Scientific Reports ◽

10.1038/s41598-020-78635-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Aya Ogasawara ◽

Yoshiyuki Ohmura ◽

Yasuo Kuniyoshi

Keyword(s):

Decision Making ◽

Computational Models ◽

Learning Task ◽

Self Esteem ◽

Expected Value ◽

Reward Sensitivity ◽

Self Confidence ◽

Maximum Value ◽

Individual Personality ◽

Social Decision Making

AbstractGlobal self-esteem is a component of individual personality that impacts decision-making. Many studies have discussed the different preferences for decision-making in response to threats to a person’s self-confidence, depending on global self-esteem. However, studies about global self-esteem and non-social decision-making have indicated that decisions differ due to reward sensitivity. Here, reward sensitivity refers to the extent to which rewards change decisions. We hypothesized that individuals with lower global self-esteem have lower reward sensitivity and investigated the relationship between self-esteem and reward sensitivity using a computational model. We first examined the effect of expected value and maximum value in learning under uncertainties because some studies have shown the possibility of saliency (e.g. maximum value) and relative value (e.g. expected value) affecting decisions, respectively. In our learning task, expected value affected decisions, but there was no significant effect of maximum value. Therefore, we modelled participants’ choices under the condition of different expected value without considering maximum value. We used the Q-learning model, which is one of the traditional computational models in explaining experiential learning decisions. Global self-esteem correlated positively with reward sensitivity. Our results suggest that individual reward sensitivity affects decision-making depending on one’s global self-esteem.

Download Full-text

Effects of Depressive Symptoms, Feelings, and Interoception on Reward-Based Decision-Making: Investigation Using Reinforcement Learning Model

Brain Sciences ◽

10.3390/brainsci10080508 ◽

2020 ◽

Vol 10 (8) ◽

pp. 508

Author(s):

Hiroyoshi Ogishima ◽

Shunta Maeda ◽

Yuki Tanaka ◽

Hironori Shimada

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Depressive Symptoms ◽

Emotional Experience ◽

Learning Task ◽

Experimental Manipulation ◽

Detection Task ◽

Control Group ◽

Before And After ◽

Experimental Group

Background: In this study, we examined the relationships between reward-based decision-making in terms of learning rate, memory rate, exploration rate, and depression-related subjective emotional experience, in terms of interoception and feelings, to understand how reward-based decision-making is impaired in depression. Methods: In all, 52 university students were randomly assigned to an experimental group and a control group. To manipulate interoception, the participants in the experimental group were instructed to tune their internal somatic sense to the skin-conductance-response waveform presented on a display. The participants in the control group were only instructed to stay relaxed. Before and after the manipulation, the participants completed a probabilistic reversal-learning task to assess reward-based decision-making using reinforcement learning modeling. Similarly, participants completed a probe-detection task, a heartbeat-detection task, and self-rated scales. Results: The experimental manipulation of interoception was not successful. In the baseline testing, reinforcement learning modeling indicated a marginally-significant correlation between the exploration rate and depressive symptoms. However, the exploration rate was significantly associated with lower interoceptive attention and higher depressive feeling. Conclusions: The findings suggest that situational characteristics may be closely involved in reward exploration and highlight the clinically-meaningful possibility that intervention for affective processes may impact reward-based decision-making in those with depression.

Download Full-text

Memory-reliant Post-error Slowing Is Associated with Successful Learning and Fronto-occipital Activity

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00987 ◽

2016 ◽

Vol 28 (10) ◽

pp. 1539-1552 ◽

Cited By ~ 4

Author(s):

Björn C. Schiffler ◽

Rita Almeida ◽

Mathias Granqvist ◽

Sara L. Bengtsson

Keyword(s):

Reinforcement Learning ◽

Cognitive Control ◽

Negative Feedback ◽

Test Performance ◽

Cognitive Task ◽

Inferior Frontal Gyrus ◽

Response Speed ◽

Occipital Cortex ◽

Learning Task ◽

Learning Theories

Negative feedback after an action in a cognitive task can lead to devaluing that action on future trials as well as to more cautious responding when encountering that same choice again. These phenomena have been explored in the past by reinforcement learning theories and cognitive control accounts, respectively. Yet, how cognitive control interacts with value updating to give rise to adequate adaptations under uncertainty is less clear. In this fMRI study, we investigated cognitive control-based behavioral adjustments during a probabilistic reinforcement learning task and studied their influence on performance in a later test phase in which the learned value of items is tested. We provide support for the idea that functionally relevant and memory-reliant behavioral adjustments in the form of post-error slowing during reinforcement learning are associated with test performance. Adjusting response speed after negative feedback was correlated with BOLD activity in right inferior frontal gyrus and bilateral middle occipital cortex during the event of receiving the feedback. Bilateral middle occipital cortex activity overlapped partly with activity reflecting feedback deviance from expectations as measured by unsigned prediction error. These results suggest that cognitive control and feature processing cortical regions interact to implement feedback-congruent adaptations beneficial to learning.

Download Full-text

Explaining Valence Asymmetries in Value Learning: A Reinforcement Learning Account

10.31234/osf.io/23kuf ◽

2022 ◽

Author(s):

Chenxu Hao ◽

Lilian E. Cabrera-Haro ◽

Ziyong Lin ◽

Patricia Reuter-Lorenz ◽

Richard L. Lewis

Keyword(s):

Reinforcement Learning ◽

Downstream Processing ◽

Learning Task ◽

Model Parameters ◽

Special Role ◽

Learning Rates ◽

The Asymmetry ◽

Value Learning ◽

Choice Policy ◽

Better Than

To understand how acquired value impacts how we perceive and process stimuli, psychologists have developed the Value Learning Task (VLT; e.g., Raymond & O’Brien, 2009). The task consists of a series of trials in which participants attempt to maximize accumulated winnings as they make choices from a pair of presented images associated with probabilistic win, loss, or no-change outcomes. Despite the task having a symmetric outcome structure for win and loss pairs, people learn win associations better than loss associations (Lin, Cabrera-Haro, & Reuter-Lorenz, 2020). This asymmetry could lead to differences when the stimuli are probed in subsequent tasks, compromising inferences about how acquired value affects downstream processing. We investigate the nature of the asymmetry using a standard error-driven reinforcement learning model with a softmax choice rule. Despite having no special role for valence, the model yields the asymmetry observed in human behavior, whether the model parameters are set to maximize empirical fit, or task payoff. The asymmetry arises from an interaction between a neutral initial value estimate and a choice policy that exploits while exploring, leading to more poorly discriminated value estimates for loss stimuli. We also show how differences in estimated individual learning rates help to explain individual differences in the observed win-loss asymmetries, and how the final value estimates produced by the model provide a simple account of a post-learning explicit value categorization task.

Download Full-text