two timescale stochastic approximation Latest Research Papers

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5779 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3701-3708

Author(s):

Gal Dalal ◽

Balazs Szorenyi ◽

Gugan Thoppe

Keyword(s):

Reinforcement Learning ◽

Convergence Rate ◽

Policy Evaluation ◽

Finite Time ◽

High Probability ◽

Temporal Difference ◽

Time Analysis ◽

Difference Methods ◽

Temporal Difference Methods ◽

Two Timescale Stochastic Approximation

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, θn and wn, which are updated using two distinct stepsize sequences, αn and βn, respectively. Assuming αn = n−α and βn = n−β with 1 > α > β > 0, we show that, with high probability, the two iterates converge to their respective solutions θ* and w* at rates given by ∥θn - θ*∥ = Õ(n−α/2) and ∥wn - w*∥ = Õ(n−β/2); here, Õ hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones.

Download Full-text

A stability criterion for two timescale stochastic approximation schemes

Automatica ◽

10.1016/j.automatica.2016.12.014 ◽

2017 ◽

Vol 79 ◽

pp. 108-114 ◽

Cited By ~ 1

Author(s):

Chandrashekar Lakshminarayanan ◽

Shalabh Bhatnagar

Keyword(s):

Stochastic Approximation ◽

Stability Criterion ◽

Approximation Schemes ◽

Two Timescale Stochastic Approximation

Download Full-text

Two Timescale Analysis of the Alopex Algorithm for Optimization

Neural Computation ◽

10.1162/089976602760408044 ◽

2002 ◽

Vol 14 (11) ◽

pp. 2729-2750 ◽

Cited By ~ 11

Author(s):

P. S. Sastry ◽

M. Magesh ◽

K. P. Unnikrishnan

Keyword(s):

Asymptotic Behavior ◽

Stochastic Approximation ◽

Approximation Method ◽

Gradient Descent ◽

Optimization Technique ◽

Descent Method ◽

Learning Problems ◽

Gradient Descent Method ◽

Gradient Information ◽

Two Timescale Stochastic Approximation

Alopex is a correlation-based gradient-free optimization technique useful in many learning problems. However, there are no analytical results on the asymptotic behavior of this algorithm. This article presents a new version of Alopex that can be analyzed using techniques of two timescale stochastic approximation method. It is shown that the algorithm asymptotically behaves like a gradient-descent method, though it does not need (or estimate) any gradient information. It is also shown, through simulations, that the algorithm is quite effective.

Download Full-text

A two Timescale Stochastic Approximation Scheme for Simulation-Based Parametric Optimization

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800005362 ◽

1998 ◽

Vol 12 (4) ◽

pp. 519-531 ◽

Cited By ~ 32

Author(s):

Shalabh Bhatnagar ◽

Vivek S. Borkar

Keyword(s):

Stochastic Approximation ◽

Perturbation Analysis ◽

Parametric Optimization ◽

Approximation Scheme ◽

Infinitesimal Perturbation Analysis ◽

Simulation Based ◽

Infinitesimal Perturbation ◽

Two Timescale Stochastic Approximation

A two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional “infinitesimal perturbation analysis” schemes. It avoids the aggregation of data present in many other schemes. Its convergence is analyzed, and a queueing example is presented.

Download Full-text

two timescale stochastic approximation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

A stability criterion for two timescale stochastic approximation schemes

Two Timescale Analysis of the Alopex Algorithm for Optimization

A two Timescale Stochastic Approximation Scheme for Simulation-Based Parametric Optimization

Export Citation Format

two timescale stochastic approximationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

A stability criterion for two timescale stochastic approximation schemes

Two Timescale Analysis of the Alopex Algorithm for Optimization

A two Timescale Stochastic Approximation Scheme for Simulation-Based Parametric Optimization

two timescale stochastic approximation
Recently Published Documents