Compression of Optimal Value Functions for Markov Decision Processes

Author(s):  
M. J. Kochenderfer ◽  
N. Monath
2006 ◽  
Vol 43 (3) ◽  
pp. 603-621 ◽  
Author(s):  
Huw W. James ◽  
E. J. Collins

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.


2020 ◽  
Vol 34 (10) ◽  
pp. 13845-13846
Author(s):  
Nishanth Kumar ◽  
Michael Fishman ◽  
Natasha Danas ◽  
Stefanie Tellex ◽  
Michael Littman ◽  
...  

We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.


2013 ◽  
Vol 45 (3) ◽  
pp. 837-859 ◽  
Author(s):  
François Dufour ◽  
A. B. Piunovskiy

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.


2006 ◽  
Vol 25 ◽  
pp. 75-118 ◽  
Author(s):  
A. Fern ◽  
S. Yoon ◽  
R. Givan

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.


2011 ◽  
Vol 10 (06) ◽  
pp. 1175-1197 ◽  
Author(s):  
JOHN GOULIONIS ◽  
D. STENGOS

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.


2006 ◽  
Vol 43 (03) ◽  
pp. 603-621 ◽  
Author(s):  
Huw W. James ◽  
E. J. Collins

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.


2020 ◽  
Vol 40 (1) ◽  
pp. 117-137
Author(s):  
R. Israel Ortega-Gutiérrez ◽  
H. Cruz-Suárez

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.


Sign in / Sign up

Export Citation Format

Share Document