Linear reinforcement learning: Flexible reuse of computation in planning, grid fields, and cognitive control

Mapping Intimacies ◽

10.1101/856849 ◽

2019 ◽

Cited By ~ 5

Author(s):

Payam Piray ◽

Nathaniel D. Daw

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Nonlinear Model ◽

Long Distance ◽

Control Engineering ◽

Future Events ◽

The Brain ◽

Insight Into ◽

Do So

AbstractIt is thought that the brain’s judicious allocation and reuse of computation underlies our ability to plan flexibly, but also failures to do so as in habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, we introduce a new model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a learned default policy. This solution exposes connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also gives new insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.

Download Full-text

Linear reinforcement learning in planning, grid fields, and cognitive control

Nature Communications ◽

10.1038/s41467-021-25123-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Payam Piray ◽

Nathaniel D. Daw

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Linear Approximation ◽

Nonlinear Model ◽

Long Distance ◽

Control Engineering ◽

Future Events ◽

The Brain ◽

Insight Into

AbstractIt is thought that the brain’s judicious reuse of previous computation underlies our ability to plan flexibly, but also that inappropriate reuse gives rise to inflexibilities like habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, here we introduce a model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a default policy. This solution demonstrates connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also provides insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.

Download Full-text

A Bird in the Hand Isn’t Good for Long

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000385 ◽

2018 ◽

Vol 65 (1) ◽

pp. 23-31 ◽

Cited By ~ 2

Author(s):

Stefan Scherbaum ◽

Simon Frisch ◽

Maja Dshemuchadse

Keyword(s):

Decision Making ◽

Cognitive Control ◽

The Other ◽

Decision Task ◽

Short Term ◽

Additional Time ◽

Control Processes ◽

Folk Wisdom ◽

Study Participants ◽

Insight Into

Abstract. Folk wisdom tells us that additional time to make a decision helps us to refrain from the first impulse to take the bird in the hand. However, the question why the time to decide plays an important role is still unanswered. Here we distinguish two explanations, one based on a bias in value accumulation that has to be overcome with time, the other based on cognitive control processes that need time to set in. In an intertemporal decision task, we use mouse tracking to study participants’ responses to options’ values and delays which were presented sequentially. We find that the information about options’ delays does indeed lead to an immediate bias that is controlled afterwards, matching the prediction of control processes needed to counter initial impulses. Hence, by using a dynamic measure, we provide insight into the processes underlying short-term oriented choices in intertemporal decision making.

Download Full-text

Reinforcement Learning and Optimal Decision Making. Review of Income and Choice in Biological Control Systems: A Framework for Understanding the Function and Dysfunction of the Brain, by Gershom-Zvi Rosenstein

Journal of Mathematical Psychology ◽

10.1006/jmps.1994.1027 ◽

1994 ◽

Vol 38 (3) ◽

pp. 384-391

Author(s):

Leemon C. Baird ◽

A.Harry Klopf

Keyword(s):

Decision Making ◽

Biological Control ◽

Reinforcement Learning ◽

Control Systems ◽

Optimal Decision ◽

Optimal Decision Making ◽

The Brain

Download Full-text

Trading off the cost of conflict against expected rewards

10.1101/412809 ◽

2018 ◽

Author(s):

Nura Sidarus ◽

Stefano Palminteri ◽

Valérian Chambon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Cognitive Control ◽

Reversal Learning ◽

Computational Models ◽

Learning Task ◽

Learning Rates ◽

Different Types ◽

Irrelevant Distractors ◽

The Cost

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.

Download Full-text

Forgetting Enhances Episodic Control with Structured Memories

10.1101/2021.08.11.455968 ◽

2021 ◽

Author(s):

Annik Yalnizyan-Carson ◽

Blake A Richards

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Structural Information ◽

Mammalian Brain ◽

Capacity Limitations ◽

Memory Cache ◽

Learning Agent ◽

Episodic Memories ◽

The Brain ◽

Over Time

Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limitations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.

Download Full-text

What were they thinking? How the brain is wired for Limbic RiskTM

The APPEA Journal ◽

10.1071/aj18058 ◽

2019 ◽

Vol 59 (2) ◽

pp. 749

Author(s):

Robert Wentzel ◽

Nada Wentzel

Keyword(s):

Decision Making ◽

Limbic System ◽

Human Error ◽

Safety Management ◽

Additional Risk ◽

Rhetorical Question ◽

Common Response ◽

Organisational Factors ◽

The Brain ◽

Insight Into

A common response following an incident is, ‘What were they thinking?’. This rhetorical question implies blame. While all incidents can be linked to human error, a more insightful and expansive question would be ‘Were they thinking?’. This question leads to identifying broader organisational factors that contributed to the error in decision making. Understanding thinking is critical in taking the next step to prevent harm. Neuroscience provides us with insight into how we think; how the brain makes decisions and introduces additional risk we refer to as Limbic RiskTM. The majority of our thinking is in fact unconscious, automatic and reactive and stems from the oldest part of our brain called the limbic system. The minority of decisions are conscious, logical and responsive and use a newer part of our brain called the pre-frontal cortex (PFC). The ability to use our PFC is significantly impacted by stress. There are five significant stressors; pressure, fatigue, irritation, distraction and complacency which impact our ability to use our PFC and add Limbic RiskTM to the environment. Traditional safety management focuses on managing external, observable including physical and behavioural risk. Preventing harm requires an expanded perspective to understand, interrupt and prevent Limbic RiskTM, and importantly equip leaders with the capability to create a LimbicSafe® environment given their influence on others is a significant 70%.

Download Full-text

The Role of Executive Function in Shaping Reinforcement Learning

10.31234/osf.io/9cvw3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Milena Rmus ◽

Samuel McDougle ◽

Anne Collins

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Instrumental Behavior ◽

Complex Environments ◽

Neural Computations ◽

Human Decision ◽

Brain And Behavior ◽

And Behavior ◽

The Brain

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of instrumental behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.

Download Full-text

On the cognitive bases of illusionism

PeerJ ◽

10.7717/peerj.9712 ◽

2020 ◽

Vol 8 ◽

pp. e9712

Author(s):

Jordi Camí ◽

Alex Gomez-Marin ◽

Luis M. Martínez

Keyword(s):

Decision Making ◽

Cognitive Dissonance ◽

Human Activity ◽

Human Cognition ◽

Natural Laws ◽

Visual Neuroscience ◽

Context Dependent ◽

Attention And Perception ◽

The Brain ◽

Do So

Cognitive scientists have paid very little attention to magic as a distinctly human activity capable of creating situations that are considered impossible because they violate expectations and conclude with the apparent transgression of well-established cognitive and natural laws. This illusory experience of the “impossible” entails a very particular cognitive dissonance that is followed by a subjective and complex “magical experience”. Here, from a perspective inspired by visual neuroscience and ecological cognition, we propose a set of seven fundamental cognitive phenomena (from attention and perception to memory and decision-making) plus a previous pre-sensory stage that magicians interfere with during the presentation of their effects. By doing so, and using as an example the deconstruction of a classic trick, we show how magic offers novel and powerful insights to study human cognition. Furthermore, live magic performances afford to do so in tasks that are more ecological and context-dependent than those usually exploited in artificial laboratory settings. We thus believe that some of the mysteries of how the brain works may be trapped in the split realities present in every magic effect.

Download Full-text

A Recurrent Neural Network Model for Flexible and Adaptive Decision Making based on Sequence Learning

10.1101/555862 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhewei Zhang ◽

Huzi Cheng ◽

Tianming Yang

Keyword(s):

Neural Network ◽

Decision Making ◽

Reinforcement Learning ◽

Network Model ◽

Recurrent Neural Network ◽

Adaptive Behavior ◽

Neural Network Model ◽

Sequence Learning ◽

Sensory Inputs ◽

The Brain

AbstractThe brain makes flexible and adaptive responses in the complicated and ever-changing environment for the organism’s survival. To achieve this, the brain needs to choose appropriate actions flexibly in response to sensory inputs. Moreover, the brain also has to understand how its actions affect future sensory inputs and what reward outcomes should be expected, and adapts its behavior based on the actual outcomes. A modeling approach that takes into account of the combined contingencies between sensory inputs, actions, and reward outcomes may be the key to understanding the underlying neural computation. Here, we train a recurrent neural network model based on sequence learning to predict future events based on the past event sequences that combine sensory, action, and reward events. We use four exemplary tasks that have been used in previous animal and human experiments to study different aspects of decision making and learning. We first show that the model reproduces the animals’ choice and reaction time pattern in a probabilistic reasoning task, and its units’ activities mimics the classical findings of the ramping pattern of the parietal neurons that reflects the evidence accumulation process during decision making. We further demonstrate that the model carries out Bayesian inference and may support meta-cognition such as confidence with additional tasks. Finally, we show how the network model achieves adaptive behavior with an approach distinct from reinforcement learning. Our work pieces together many experimental findings in decision making and reinforcement learning and provides a unified framework for the flexible and adaptive behavior of the brain.

Download Full-text

Reliability of decision-making and reinforcement learning computational parameters

10.1101/2021.06.30.450026 ◽

2021 ◽

Author(s):

Anahit Mkrtchian ◽

Vincent Valton ◽

Jonathan P Roiser

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prospect Theory ◽

Computational Models ◽

Processing Parameters ◽

Theory Model ◽

Individual Characteristics ◽

Good Reliability ◽

Reinforcement Learning Model ◽

Insight Into

Background: Computational models can offer mechanistic insight into cognition and therefore have the potential to transform our understanding of psychiatric disorders and their treatment. For translational efforts to be successful, it is imperative that computational measures capture individual characteristics reliably. To date, this issue has received little consideration. Methods: Here we examine the reliability of canonical reinforcement learning and economic models derived from two commonly used tasks. Healthy individuals (N=50) completed a restless four-armed bandit and a calibrated gambling task twice, two weeks apart. Results: Reward and punishment processing parameters from the reinforcement learning model showed fair-to-good reliability, while risk/loss aversion parameters from a prospect theory model exhibited good-to-excellent reliability. Both models were further able to predict future behaviour above chance within individuals. Conclusions: These results suggest that reinforcement learning, and particularly prospect theory measures, represent relatively reliable decision-making mechanisms, which are also unique across individuals, indicating the translational potential of clinically-relevant computational parameters for precision psychiatry.

Download Full-text