Compression of Optimal Value Functions for Markov Decision Processes

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Download Full-text

Task Scoping for Efficient Planning in Open Worlds (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7195 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13845-13846

Author(s):

Nishanth Kumar ◽

Michael Fishman ◽

Natasha Danas ◽

Stefanie Tellex ◽

Michael Littman ◽

...

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Initial State ◽

Open World ◽

Optimal Value ◽

Markov Decision ◽

Efficient Planning ◽

Action Spaces ◽

Action Variables

We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.

Download Full-text

Weighted difference approximation of value functions for slow-discounting Markov Decision Processes

53rd IEEE Conference on Decision and Control ◽

10.1109/cdc.2014.7039526 ◽

2014 ◽

Author(s):

Yin-Lam Chow ◽

Junjie Qin

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Difference Approximation ◽

Value Functions ◽

Markov Decision

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1239/aap/1377868541 ◽

2013 ◽

Vol 45 (3) ◽

pp. 837-859 ◽

Cited By ~ 5

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1700 ◽

2006 ◽

Vol 25 ◽

pp. 75-118 ◽

Cited By ~ 31

Author(s):

A. Fern ◽

S. Yoon ◽

R. Givan

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Value Functions ◽

Policy Language ◽

Language Bias ◽

Policy Selection ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Future Work

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.

Download Full-text

Separable value functions for infinite horizon average reward Markov decision processes

Journal of Mathematical Analysis and Applications ◽

10.1016/0022-247x(89)90345-4 ◽

1989 ◽

Vol 144 (2) ◽

pp. 450-465

Author(s):

D.J. White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Value Functions ◽

Average Reward ◽

Markov Decision

Download Full-text

PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004762 ◽

2011 ◽

Vol 10 (06) ◽

pp. 1175-1197 ◽

Cited By ~ 1

Author(s):

JOHN GOULIONIS ◽

D. STENGOS

Keyword(s):

Markov Decision Processes ◽

Piecewise Linear ◽

Linear Equations ◽

Infinite Horizon ◽

Decision Processes ◽

Value Functions ◽

Discounted Cost ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.

Download Full-text

An analysis of transient Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200001972 ◽

2006 ◽

Vol 43 (03) ◽

pp. 603-621 ◽

Cited By ~ 2

Author(s):

Huw W. James ◽

E. J. Collins

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Optimal Value Function ◽

Natural Form ◽

Convergence Results ◽

Optimal Value ◽

Finite State ◽

Markov Decision ◽

Bounded Below

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Download Full-text

A Moreau-Yosida regularization for Markov decision processes

Proyecciones (Antofagasta) ◽

10.22199/issn.0717-6279-2021-01-0008 ◽

2020 ◽

Vol 40 (1) ◽

pp. 117-137

Author(s):

R. Israel Ortega-Gutiérrez ◽

H. Cruz-Suárez

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Decision Process ◽

Value Function ◽

Decision Processes ◽

Original Process ◽

Optimal Value ◽

Markov Decision ◽

Yosida Regularization

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.

Download Full-text