Diffusion gradient temporal difference for cooperative reinforcement learning with linear function approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Operations Research ◽

10.1287/opre.2020.2024 ◽

2021 ◽

Author(s):

Jalaj Bhandari ◽

Daniel Russo ◽

Raghav Singal

Keyword(s):

Linear Function ◽

Finite Time ◽

Function Approximation ◽

Gradient Descent ◽

Convergence Rates ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Convergence Results ◽

Linear Function Approximation ◽

Markov Reward

Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes. Bhandari et al. prove finite time convergence rates for TD learning with linear function approximation. The analysis follows using a key insight that establishes rigorous connections between TD updates and those of online gradient descent. In a model where observations are corrupted by i.i.d. noise, convergence results for TD follow by essentially mirroring the analysis for online gradient descent. Using an information-theoretic technique, the authors also provide results for the case when TD is applied to a single Markovian data stream where the algorithm’s updates can be severely biased. Their analysis seamlessly extends to the study of TD learning with eligibility traces and Q-learning for high-dimensional optimal stopping problems.

Download Full-text

Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3091014 ◽

2021 ◽

pp. 1-10

Author(s):

Lucas N. Alegre ◽

Theresa Ziemke ◽

Ana L. C. Bazzan

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Real World ◽

Function Approximation ◽

Traffic Signals ◽

Linear Function Approximation ◽

Control Traffic

Download Full-text

Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation

SIAM Journal on Mathematics of Data Science ◽

10.1137/20m1311971 ◽

2021 ◽

Vol 3 (1) ◽

pp. 298-320

Author(s):

Thinh T. Doan ◽

Siva Theja Maguluri ◽

Justin Romberg

Keyword(s):

Linear Function ◽

Finite Time ◽

Function Approximation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Time Performance ◽

Linear Function Approximation

Download Full-text