Online regret bounds for Markov decision processes with deterministic transitions

Ronald Ortner

doi:10.1016/j.tcs.2010.04.005

Online regret bounds for Markov decision processes with deterministic transitions

Theoretical Computer Science ◽

10.1016/j.tcs.2010.04.005 ◽

2010 ◽

Vol 411 (29-30) ◽

pp. 2684-2695 ◽

Cited By ~ 2

Author(s):

Ronald Ortner

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Regret Bounds

Download Full-text

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/978-3-540-87987-9_14 ◽

2008 ◽

pp. 123-137 ◽

Cited By ~ 5

Author(s):

Ronald Ortner

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Regret Bounds

Download Full-text

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11316 ◽

2020 ◽

Vol 67 ◽

pp. 115-128

Author(s):

Ronald Ortner

Keyword(s):

Markov Chain ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Mixing Time ◽

Decision Processes ◽

Time Parameter ◽

Markov Decision ◽

Regret Bounds ◽

Optimistic Algorithm ◽

S States

We give a simple optimistic algorithm for which it is easy to derive regret bounds of O(sqrt{t-mix SAT}) steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t-mix. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text

A Framework for Modeling Bounded Rationality: Mis-Specified Bayesian-Markov Decision Processes

SSRN Electronic Journal ◽

10.2139/ssrn.2710475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ignacio Esponda ◽

Demian Pouzo

Keyword(s):

Bounded Rationality ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text