Learning to Coordinate Efficiently: A Model-based Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.1154 ◽

2003 ◽

Vol 19 ◽

pp. 11-23 ◽

Cited By ~ 15

Author(s):

R. I. Brafman ◽

M. Tennenholtz

Keyword(s):

Reinforcement Learning ◽

Simple Model ◽

Stochastic Games ◽

Convergence Rates ◽

Learning Algorithms ◽

Common Interest ◽

Model Based ◽

Optimal Value ◽

To Receive

In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

On convergence rates of game theoretic reinforcement learning algorithms

Automatica ◽

10.1016/j.automatica.2019.02.032 ◽

2019 ◽

Vol 104 ◽

pp. 90-101 ◽

Cited By ~ 2

Author(s):

Zhisheng Hu ◽

Minghui Zhu ◽

Ping Chen ◽

Peng Liu

Keyword(s):

Reinforcement Learning ◽

Convergence Rates ◽

Learning Algorithms ◽

Game Theoretic

Download Full-text

Episodic Control as Meta-Reinforcement Learning

10.1101/360537 ◽

2018 ◽

Cited By ~ 3

Author(s):

S Ritter ◽

JX Wang ◽

Z Kurth-Nelson ◽

M Botvinick

Keyword(s):

Reinforcement Learning ◽

Episodic Memory ◽

Learning Strategies ◽

Learning Algorithms ◽

Memory System ◽

Generic Model ◽

Model Based ◽

Model Free

AbstractRecent research has placed episodic reinforcement learning (RL) alongside model-free and model-based RL on the list of processes centrally involved in human reward-based learning. In the present work, we extend the unified account of model-free and model-based RL developed by Wang et al. (2018) to further integrate episodic learning. In this account, a generic model-free “meta-learner” learns to deploy and coordinate among all of these learning algorithms. The meta-learner learns through brief encounters with many novel tasks, so that it learns to learn about new tasks. We show that when equipped with an episodic memory system inspired by theories of reinstatement and gating, the meta-learner learns to use the episodic and model-based learning algorithms observed in humans in a task designed to dissociate among the influences of various learning strategies. We discuss implications and predictions of the model.

Download Full-text

Estimating Scale-Invariant Future in Continuous Time

Neural Computation ◽

10.1162/neco_a_01171 ◽

2019 ◽

Vol 31 (4) ◽

pp. 681-709 ◽

Cited By ~ 6

Author(s):

Zoran Tiganj ◽

Samuel J. Gershman ◽

Per B. Sederberg ◽

Marc W. Howard

Keyword(s):

Reinforcement Learning ◽

Continuous Time ◽

Learning Algorithms ◽

Future Time ◽

Scale Invariant ◽

Model Based ◽

Model Free ◽

Transition Functions ◽

Future Reward ◽

Future Outcomes

Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (model-based algorithms) or a scalar value of exponentially discounted future reward using the Bellman equation (model-free algorithms). An important drawback of model-based algorithms is that computational cost grows linearly with the amount of time to be simulated. An important drawback of model-free algorithms is the need to select a timescale required for exponential discounting. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future outcomes. This mechanism efficiently computes an estimate of inputs as a function of future time on a logarithmically compressed scale and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. The representation of future time retains information about what will happen when. The entire timeline can be constructed in a single parallel operation that generates concrete behavioral and neural predictions. This computational mechanism could be incorporated into future reinforcement learning algorithms.

Download Full-text

An Experimental Study of Different Approaches to Reinforcement Learning in Common Interest Stochastic Games

Machine Learning: ECML 2004 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30115-8_10 ◽

2004 ◽

pp. 75-86

Author(s):

Avi Bab ◽

Ronen Brafman

Keyword(s):

Experimental Study ◽

Reinforcement Learning ◽

Stochastic Games ◽

Common Interest

Download Full-text

A review of motion planning algorithms for intelligent robots

Journal of Intelligent Manufacturing ◽

10.1007/s10845-021-01867-z ◽

2021 ◽

Author(s):

Chengmin Zhou ◽

Bingding Huang ◽

Pasi Fränti

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Q Learning ◽

Learning Network ◽

Gradient Algorithms ◽

Optimal Value ◽

Policy Gradient ◽

Planning Algorithms

AbstractPrinciples of typical motion planning algorithms are investigated and analyzed in this paper. These algorithms include traditional planning algorithms, classical machine learning algorithms, optimal value reinforcement learning, and policy gradient reinforcement learning. Traditional planning algorithms investigated include graph search algorithms, sampling-based algorithms, interpolating curve algorithms, and reaction-based algorithms. Classical machine learning algorithms include multiclass support vector machine, long short-term memory, Monte-Carlo tree search and convolutional neural network. Optimal value reinforcement learning algorithms include Q learning, deep Q-learning network, double deep Q-learning network, dueling deep Q-learning network. Policy gradient algorithms include policy gradient method, actor-critic algorithm, asynchronous advantage actor-critic, advantage actor-critic, deterministic policy gradient, deep deterministic policy gradient, trust region policy optimization and proximal policy optimization. New general criteria are also introduced to evaluate the performance and application of motion planning algorithms by analytical comparisons. The convergence speed and stability of optimal value and policy gradient algorithms are specially analyzed. Future directions are presented analytically according to principles and analytical comparisons of motion planning algorithms. This paper provides researchers with a clear and comprehensive understanding about advantages, disadvantages, relationships, and future of motion planning algorithms in robots, and paves ways for better motion planning algorithms in academia, engineering, and manufacturing.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Cognitive Radio Networks with Reinforcement Learning Algorithms for Spectrum Allocation: A Survey

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/211952020 ◽

2020 ◽

Vol 9 (5) ◽

pp. 8371-8384

Keyword(s):

Reinforcement Learning ◽

Cognitive Radio ◽

Cognitive Radio Networks ◽

Learning Algorithms ◽

Radio Networks ◽

Spectrum Allocation

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text