Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11316 ◽

2020 ◽

Vol 67 ◽

pp. 115-128

Author(s):

Ronald Ortner

Keyword(s):

Markov Chain ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Mixing Time ◽

Decision Processes ◽

Time Parameter ◽

Markov Decision ◽

Regret Bounds ◽

Optimistic Algorithm ◽

S States

We give a simple optimistic algorithm for which it is easy to derive regret bounds of O(sqrt{t-mix SAT}) steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t-mix. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

Download Full-text

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303982 ◽

2020 ◽

Author(s):

Yu Wang ◽

Nima Roohi ◽

Matthew West ◽

Mahesh Viswanathan ◽

Geir E. Dullerud

Keyword(s):

Reinforcement Learning ◽

Model Checking ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

Online regret bounds for Markov decision processes with deterministic transitions

Theoretical Computer Science ◽

10.1016/j.tcs.2010.04.005 ◽

2010 ◽

Vol 411 (29-30) ◽

pp. 2684-2695 ◽

Cited By ~ 2

Author(s):

Ronald Ortner

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Regret Bounds

Download Full-text

Average Reward Reinforcement Learning for Semi-Markov Decision Processes

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70087-8_79 ◽

2017 ◽

pp. 768-777

Author(s):

Jiayuan Yang ◽

Yanjie Li ◽

Haoyao Chen ◽

Jiangang Li

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

Markov decision processes with continuous time parameter

European Journal of Operational Research ◽

10.1016/0377-2217(84)90298-4 ◽

1984 ◽

Vol 16 (3) ◽

pp. 392-393

Author(s):

M. Schäl

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Time Parameter ◽

Markov Decision

Download Full-text

RVI reinforcement learning for semi-Markov decision processes with average reward

2010 8th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2010.5554785 ◽

2010 ◽

Author(s):

Yanjie Li ◽

Fang Cao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes

Systems and Computers in Japan ◽

10.1002/scj.10645 ◽

2005 ◽

Vol 36 (3) ◽

pp. 42-52 ◽

Cited By ~ 3

Author(s):

Koichiro Takita ◽

Masafumi Hagiwara

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Algorithm ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Reinforcement Learning Algorithm

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

Markov Decision Processes with Continuous Time Parameter

Journal of the Operational Research Society ◽

10.2307/2581180 ◽

1984 ◽

Vol 35 (4) ◽

pp. 366

Author(s):

Sean Collins ◽

F. A. van der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Time Parameter ◽

Markov Decision

Download Full-text

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

Discrete Event Dynamic Systems ◽

10.1007/s10626-006-0003-y ◽

2007 ◽

Vol 17 (1) ◽

pp. 23-52 ◽

Cited By ~ 9

Author(s):

Mohammed Shahid Abdulla ◽

Shalabh Bhatnagar

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Markov Decision

Download Full-text

Markov Decision Processes with Continuous Time Parameter

Journal of the Operational Research Society ◽

10.1057/jors.1984.74 ◽

1984 ◽

Vol 35 (4) ◽

pp. 366-367

Author(s):

Sean Collins

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Time Parameter ◽

Markov Decision

Download Full-text