Necessary conditions for the optimality equation in average-reward Markov decision processes

Rolando Cavazos-Cadena

doi:10.1007/bf01448194

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

Abstract and Applied Analysis ◽

10.1155/2009/103723 ◽

2009 ◽

Vol 2009 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Quanxin Zhu ◽

Xinsong Yang ◽

Chuangxia Huang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Optimality

We study thepolicy iteration algorithm(PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to beunbounded, and the reward rates may haveneither upper nor lower bounds. The criterion that we are concerned with isexpected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under twoslightlydifferent sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Download Full-text

Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion

The ANZIAM Journal ◽

10.1017/s144618110001213x ◽

2002 ◽

Vol 43 (4) ◽

pp. 541-557 ◽

Cited By ~ 10

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Transition Rates ◽

Birth And Death Processes ◽

Optimality Equation ◽

Average Criterion ◽

Markov Decision ◽

Unbounded Cost ◽

Queue Model

AbstractIn this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission control queue model and controlled birth and death processes.

Download Full-text

A unified approach to adaptive control of average reward Markov decision processes

OR Spectrum ◽

10.1007/bf01740510 ◽

1988 ◽

Vol 10 (3) ◽

pp. 161-166 ◽

Cited By ~ 5

Author(s):

G. Hübner

Keyword(s):

Adaptive Control ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Unified Approach ◽

Markov Decision

Download Full-text

Approximate receding horizon approach for Markov decision processes: average reward case

Journal of Mathematical Analysis and Applications ◽

10.1016/s0022-247x(03)00506-7 ◽

2003 ◽

Vol 286 (2) ◽

pp. 636-651 ◽

Cited By ~ 22

Author(s):

Hyeong Soo Chang ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Receding Horizon ◽

Markov Decision

Download Full-text

Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States

SSRN Electronic Journal ◽

10.2139/ssrn.3993508 ◽

2021 ◽

Author(s):

Xiangyu Yang ◽

Jiaqiao Hu ◽

Jianqiang Hu

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Q Learning ◽

Continuous States ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Average Reward Reinforcement Learning for Semi-Markov Decision Processes

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70087-8_79 ◽

2017 ◽

pp. 768-777

Author(s):

Jiayuan Yang ◽

Yanjie Li ◽

Haoyao Chen ◽

Jiangang Li

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

RVI reinforcement learning for semi-Markov decision processes with average reward

2010 8th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2010.5554785 ◽

2010 ◽

Author(s):

Yanjie Li ◽

Fang Cao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1017/s0001867800007722 ◽

2015 ◽

Vol 47 (01) ◽

pp. 106-127 ◽

Cited By ~ 2

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Asymptotic results for the best-choice problem with a random number of objects

Journal of Applied Probability ◽

10.2307/3213614 ◽

1984 ◽

Vol 21 (3) ◽

pp. 521-536 ◽

Cited By ~ 5

Author(s):

Masami Yasuda

Keyword(s):

Integral Equation ◽

Markov Decision Processes ◽

Random Number ◽

Scaling Limit ◽

Decision Processes ◽

Choice Problem ◽

Asymptotic Results ◽

Optimality Equation ◽

Best Choice Problem ◽

Markov Decision

This paper considers the best-choice problem with a random number of objects having a known distribution. The optimality equation of the problem reduces to an integral equation by a scaling limit. The equation is explicitly solved under conditions on the distribution, which relate to the condition for an OLA policy to be optimal in Markov decision processes. This technique is then applied to three different versions of the problem and an exact value for the asymptotic optimal strategy is found.

Download Full-text