A Version of the Euler Equation in Discounted Markov Decision Processes

Journal of Applied Mathematics ◽

10.1155/2012/103698 ◽

2012 ◽

Vol 2012 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

H. Cruz-Suárez ◽

G. Zacarías-Espinoza ◽

V. Vázquez-Guevara

Keyword(s):

Control Problem ◽

Euler Equation ◽

Markov Decision Processes ◽

Optimal Policy ◽

Infinite Horizon ◽

Decision Processes ◽

Value Iteration ◽

Programming Technique ◽

Iteration Functions ◽

Markov Decision

This paper deals with Markov decision processes (MDPs) on Euclidean spaces with an infinite horizon. An approach to study this kind of MDPs is using the dynamic programming technique (DP). Then the optimal value function is characterized through the value iteration functions. The paper provides conditions that guarantee the convergence of maximizers of the value iteration functions to the optimal policy. Then, using the Euler equation and an envelope formula, the optimal solution of the optimal control problem is obtained. Finally, this theory is applied to a linear-quadratic control problem in order to find its optimal policy.

Download Full-text

First Order Decision Diagrams for Relational MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.2489 ◽

2008 ◽

Vol 31 ◽

pp. 431-472 ◽

Cited By ~ 18

Author(s):

C. Wang ◽

S. Joshi ◽

R. Khardon

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Decision Processes ◽

Compact Representation ◽

Iteration Algorithm ◽

Decision Diagrams ◽

Value Iteration ◽

First Order ◽

Relational Structures ◽

Markov Decision

Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

Serial and parallel value iteration algorithms for discounted Markov decision processes

European Journal of Operational Research ◽

10.1016/0377-2217(93)90061-q ◽

1993 ◽

Vol 67 (2) ◽

pp. 188-203 ◽

Cited By ~ 3

Author(s):

T.W. Archibald ◽

K.I.M. McKinnon ◽

L.C. Thomas

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Value Iteration ◽

Markov Decision

Download Full-text

A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

European Journal of Operational Research ◽

10.1016/0377-2217(94)00208-8 ◽

1996 ◽

Vol 88 (3) ◽

pp. 622-636 ◽

Cited By ~ 5

Author(s):

Meir Herzberg ◽

Uri Yechiali

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Value Iteration ◽

Look Ahead ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1057/jors.1992.167 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D. J. White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis

Operations Research ◽

10.1287/opre.42.5.940 ◽

1994 ◽

Vol 42 (5) ◽

pp. 940-946 ◽

Cited By ~ 10

Author(s):

Meir Herzberg ◽

Uri Yechiali

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

One Step ◽

Value Iteration Algorithm

Download Full-text

Multi-objective infinite-horizon discounted Markov decision processes

Journal of Mathematical Analysis and Applications ◽

10.1016/0022-247x(82)90122-6 ◽

1982 ◽

Vol 89 (2) ◽

pp. 639-647 ◽

Cited By ~ 39

Author(s):

D.J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Multi Objective ◽

Markov Decision

Download Full-text

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2015070104 ◽

2015 ◽

Vol 13 (3) ◽

pp. 47-57 ◽

Cited By ~ 1

Author(s):

Sanaa Chafik ◽

Cherki Daoui

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Decomposition Technique ◽

Artificial Data ◽

Markov Decision ◽

Speed Up ◽

Value Iteration Algorithm

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.

Download Full-text

Learning algorithms for Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200030825 ◽

1987 ◽

Vol 24 (01) ◽

pp. 270-276

Author(s):

Masami Kurano

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Learning Algorithm ◽

Learning Algorithms ◽

Decision Processes ◽

The State ◽

Reward Structure ◽

Adaptive Policy ◽

Markov Decision ◽

Reward Criterion

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly. We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

Download Full-text