Exact Decomposition Approaches for Markov Decision Processes: A Survey

Advances in Operations Research ◽

10.1155/2010/659432 ◽

2010 ◽

Vol 2010 ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Cherki Daoui ◽

Mohamed Abbad ◽

Mohamed Tkiouat

Keyword(s):

Markov Decision Processes ◽

Optimal Strategies ◽

Decision Processes ◽

Divide And Conquer ◽

The Past ◽

Large State Space ◽

Aggregation Techniques ◽

Markov Decision ◽

Pros And Cons ◽

Special Case

As classical methods are intractable for solving Markov decision processes (MDPs) requiring a large state space, decomposition and aggregation techniques are very useful to cope with large problems. These techniques are in general a special case of the classic Divide-and-Conquer framework to split a large, unwieldy problem into smaller components and solving the parts in order to construct the global solution. This paper reviews most of decomposition approaches encountered in the associated literature over the past two decades, weighing their pros and cons. We consider several categories of MDPs (average, discounted, and weighted MDPs), and we present briefly a variety of methodologies to find or approximate optimal strategies.

Download Full-text

Oracular Partially Observable Markov Decision Processes: A Very Special Case

Proceedings 2007 IEEE International Conference on Robotics and Automation ◽

10.1109/robot.2007.363691 ◽

2007 ◽

Cited By ~ 3

Author(s):

Nicholas Armstrong-Crews ◽

Manuela Veloso

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Special Case

Download Full-text

Assessment and Optimal Strategies of Semi-Continuous Killed Markov Decision Processes

Cybernetics and Systems Analysis ◽

10.1007/s10559-016-9865-7 ◽

2016 ◽

Vol 52 (4) ◽

pp. 631-635

Author(s):

P. R. Shpak ◽

Y. I. Yeleyko

Keyword(s):

Markov Decision Processes ◽

Optimal Strategies ◽

Decision Processes ◽

Markov Decision

Download Full-text

Optimization of a special case of continuous-time Markov decision processes with compact action set

European Journal of Operational Research ◽

10.1016/j.ejor.2007.04.011 ◽

2008 ◽

Vol 187 (1) ◽

pp. 113-119 ◽

Cited By ~ 5

Author(s):

Tang Hao ◽

Zhou Lei ◽

Arai Tamio

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Markov Decision ◽

Special Case

Download Full-text

A special case of partially observable Markov decision processes problem by event-based optimization

2016 IEEE International Conference on Industrial Technology (ICIT) ◽

10.1109/icit.2016.7474986 ◽

2016 ◽

Author(s):

Junyu Zhang

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Event Based ◽

Special Case

Download Full-text

Pure Stationary Optimal Strategies in Markov Decision Processes

STACS 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-70918-3_18 ◽

2007 ◽

pp. 200-211 ◽

Cited By ~ 5

Author(s):

Hugo Gimbert

Keyword(s):

Markov Decision Processes ◽

Optimal Strategies ◽

Decision Processes ◽

Markov Decision

Download Full-text

Determining the Optimal Strategies for Antagonistic Positional Games in Markov Decision Processes

Operations Research Proceedings - Operations Research Proceedings 2011 ◽

10.1007/978-3-642-29210-1_37 ◽

2012 ◽

pp. 229-234

Author(s):

Dmitrii Lozovanu ◽

Stefan Pickl

Keyword(s):

Markov Decision Processes ◽

Optimal Strategies ◽

Decision Processes ◽

Positional Games ◽

Markov Decision

Download Full-text

Temporal Logic Monitoring Rewards via Transducers

Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning ◽

10.24963/kr.2020/89 ◽

2020 ◽

Author(s):

Giuseppe De Giacomo ◽

Marco Favorito ◽

Luca Iocchi ◽

Fabio Patrizi ◽

Alessandro Ronca

Keyword(s):

State Space ◽

Markov Decision Processes ◽

Decision Processes ◽

Runtime Monitoring ◽

Extended State ◽

Temporal Logics ◽

The Past ◽

Markov Decision ◽

History Of ◽

Extended States

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning.

Download Full-text

On a reduction principle in dynamic programming

Advances in Applied Probability ◽

10.1017/s0001867800018401 ◽

1988 ◽

Vol 20 (04) ◽

pp. 836-851

Author(s):

K. D. Glazebrook

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Sufficient Conditions ◽

Optimal Strategies ◽

Decision Processes ◽

Reduction Principle ◽

The Status ◽

Markov Decision ◽

The Individual ◽

Necessary And Sufficient

Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.

Download Full-text

On a reduction principle in dynamic programming

Advances in Applied Probability ◽

10.2307/1427363 ◽

1988 ◽

Vol 20 (4) ◽

pp. 836-851 ◽

Cited By ~ 3

Author(s):

K. D. Glazebrook

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Sufficient Conditions ◽

Optimal Strategies ◽

Decision Processes ◽

Reduction Principle ◽

The Status ◽

Markov Decision ◽

The Individual ◽

Necessary And Sufficient

Download Full-text

On Polynomial Sized MDP Succinct Policies

Journal of Artificial Intelligence Research ◽

10.1613/jair.1134 ◽

2004 ◽

Vol 21 ◽

pp. 551-577 ◽

Cited By ~ 1

Author(s):

P. Liberatore

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Polynomial Hierarchy ◽

The Past ◽

Current State ◽

A Value ◽

Markov Decision ◽

The Value Function

Policies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered.

Download Full-text