scholarly journals Exact Decomposition Approaches for Markov Decision Processes: A Survey

2010 ◽  
Vol 2010 ◽  
pp. 1-19 ◽  
Author(s):  
Cherki Daoui ◽  
Mohamed Abbad ◽  
Mohamed Tkiouat

As classical methods are intractable for solving Markov decision processes (MDPs) requiring a large state space, decomposition and aggregation techniques are very useful to cope with large problems. These techniques are in general a special case of the classic Divide-and-Conquer framework to split a large, unwieldy problem into smaller components and solving the parts in order to construct the global solution. This paper reviews most of decomposition approaches encountered in the associated literature over the past two decades, weighing their pros and cons. We consider several categories of MDPs (average, discounted, and weighted MDPs), and we present briefly a variety of methodologies to find or approximate optimal strategies.

Author(s):  
Giuseppe De Giacomo ◽  
Marco Favorito ◽  
Luca Iocchi ◽  
Fabio Patrizi ◽  
Alessandro Ronca

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning.


1988 ◽  
Vol 20 (04) ◽  
pp. 836-851
Author(s):  
K. D. Glazebrook

Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.


1988 ◽  
Vol 20 (4) ◽  
pp. 836-851 ◽  
Author(s):  
K. D. Glazebrook

Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.


2004 ◽  
Vol 21 ◽  
pp. 551-577 ◽  
Author(s):  
P. Liberatore

Policies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered.


Sign in / Sign up

Export Citation Format

Share Document