A common deliberative process underlies model-based planning and patient intertemporal choice

Mapping Intimacies ◽

10.1101/499707 ◽

2018 ◽

Cited By ~ 5

Author(s):

Lindasy E Hunter ◽

Aaron M Bornstein ◽

Catherine A Hartley

Keyword(s):

Reinforcement Learning ◽

Intertemporal Choice ◽

Learning Task ◽

Self Control ◽

Common Mechanism ◽

Deliberative Process ◽

Model Based ◽

Near Term ◽

Economic Measure ◽

Core Construct

Humans and animals consistently forego, or 'discount' future rewards in favor of more proximal, but less valuable, options. This behavior is often thought of in terms of a failure of 'self-control', a lack of inhibition when considering the possibility of immediate gratification. However, rather than overweighting the near-term reward, the same behavior can result from failing to properly consider the far-off reward. The capacity to plan for future gains is a core construct in Reinforcement Learning (RL), known as 'model-based' planning. Both discounting and model-based planning have been shown to track everyday behaviors from diet to exercise habits to drug abuse. Here, we show that these two capacities are related via a common mechanism -- people who are more likely to deliberate about future reward in an intertemporal choice task, as indicated by the time they spend considering the choice, are also more likely to make multi-step plans for reward in a sequential reinforcement learning task. In contrast, the degree to which people's intertemporal choices were driven by a more automatic bias did not correspond to their planning tendency, and neither did the more standard measure of discounting behavior. These results suggest that the standard behavioral economic measure of discounting is more fruitfully understood by decomposing it into constituent parts, and that only one of these parts corresponds to the sort of multi-step thinking needed to make plans for the future.

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task

Frontiers in Human Neuroscience ◽

10.3389/fnhum.2013.00525 ◽

2013 ◽

Vol 7 ◽

Cited By ~ 17

Author(s):

Anya Skatova ◽

Patricia A. Chan ◽

Nathaniel D. Daw

Keyword(s):

Reinforcement Learning ◽

Learning Task ◽

Model Based ◽

Model Free

Download Full-text

Predictive representations can link model-based reinforcement learning to model-free mechanisms

10.1101/083857 ◽

2016 ◽

Cited By ~ 9

Author(s):

Evan M. Russek ◽

Ida Momennejad ◽

Matthew M. Botvinick ◽

Samuel J. Gershman ◽

Nathaniel D. Daw

Keyword(s):

Reinforcement Learning ◽

Neural Circuits ◽

Full Range ◽

Decision Time ◽

Empirical Literature ◽

Automatic Process ◽

Deliberative Process ◽

Model Based ◽

Model Free ◽

Long Run

AbstractHumans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.Author SummaryAccording to standard models, when confronted with a choice, animals and humans rely on two separate, distinct processes to come to a decision. One process deliberatively evaluates the consequences of each candidate action and is thought to underlie the ability to flexibly come up with novel plans. The other process gradually increases the propensity to perform behaviors that were previously successful and is thought to underlie automatically executed, habitual reflexes. Although computational principles and animal behavior support this dichotomy, at the neural level, there is little evidence supporting a clean segregation. For instance, although dopamine — famously implicated in drug addiction and Parkinson’s disease — currently only has a well-defined role in the automatic process, evidence suggests that it also plays a role in the deliberative process. In this work, we present a computational framework for resolving this mismatch. We show that the types of behaviors associated with either process could result from a common learning mechanism applied to different strategies for how populations of neurons could represent candidate actions. In addition to demonstrating that this account can produce the full range of flexible behavior observed in the empirical literature, we suggest experiments that could detect the various approaches within this framework.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Neuroevolution approach for reinforcement learning task applied to control system of a cart pole system

10.26678/abcm.cobem2019.cob2019-1044 ◽

2019 ◽

Author(s):

Marco Boaretto ◽

Gabriel Chaves Becchi ◽

Luiza Scapinello Aquino ◽

Aderson Cleber Pifer ◽

Helon Vicente Hultmann Ayala ◽

...

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Learning Task ◽

Pole System

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Faculty Opinions recommendation of Dopamine and performance in a reinforcement learning task: evidence from Parkinson's disease.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.713947855.789352805 ◽

2012 ◽

Author(s):

Kent Berridge

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Reinforcement Learning ◽

Learning Task ◽

And Performance

Download Full-text

Bestärkendes Lernen mittels Offline-Trajektorienplanung basierend auf iterativ approximierten Modellen

at - Automatisierungstechnik ◽

10.1515/auto-2020-0024 ◽

2020 ◽

Vol 68 (8) ◽

pp. 612-624

Author(s):

Max Pritzkoleit ◽

Robert Heedt ◽

Carsten Knoll ◽

Klaus Röbenack

Keyword(s):

Reinforcement Learning ◽

Neuronale Netze ◽

Model Based ◽

Künstliche Neuronale Netze

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.

Download Full-text