Policy Iteration Based on Stochastic Factorization

Journal of Artificial Intelligence Research ◽

10.1613/jair.4301 ◽

2014 ◽

Vol 50 ◽

pp. 763-803 ◽

Cited By ~ 4

Author(s):

A. M. S. Barreto ◽

J. Pineau ◽

D. Precup

Keyword(s):

Large Scale ◽

Transition Probability ◽

Computational Cost ◽

Approximation Error ◽

Transition Probability Matrix ◽

Policy Iteration ◽

Iteration Algorithm ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm

When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited to create a compact version of a Markov decision process (MDP), and hence to reduce the computational cost of dynamic programming. Building on this idea, this paper presents an approximate policy iteration algorithm called policy iteration based on stochastic factorization, or PISF for short. In terms of computational complexity, PISF replaces standard policy iteration's cubic dependence on the size of the MDP with a function that grows only linearly with the number of states in the model. The proposed algorithm also enjoys nice theoretical properties: it always terminates after a finite number of iterations and returns a decision policy whose performance only depends on the quality of the stochastic factorization. In particular, if the approximation error in the factorization is sufficiently small, PISF computes the optimal value function of the MDP. The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest.

Download Full-text

Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces

Risk and Decision Analysis ◽

10.3233/rda-160116 ◽

2017 ◽

Vol 6 (2) ◽

pp. 79-95 ◽

Cited By ~ 1

Author(s):

M. Teresa Robles-Alcaráz ◽

Óscar Vega-Amaya ◽

J. Adolfo Minjárez-Sosa

Keyword(s):

Policy Iteration ◽

Decision Models ◽

Iteration Algorithm ◽

Markov Decision Models ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm

Download Full-text

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4032875 ◽

2016 ◽

Vol 138 (6) ◽

Author(s):

Thai Duong ◽

Duong Nguyen-Huu ◽

Thinh Nguyen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Probability ◽

Transition Probability Matrix ◽

Rate Of Change ◽

Optimal Decision ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download Full-text

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2009.4927527 ◽

2009 ◽

Cited By ~ 7

Author(s):

Jun Ma ◽

Warren B. Powell

Keyword(s):

Least Squares ◽

Markov Decision Process ◽

Decision Process ◽

Recursive Least Squares ◽

Iteration Algorithm ◽

Continuous State ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm ◽

Action Spaces

Download Full-text

The policy iteration algorithm for average reward Markov decision processes with general state space

IEEE Transactions on Automatic Control ◽

10.1109/9.650016 ◽

1997 ◽

Vol 42 (12) ◽

pp. 1663-1680 ◽

Cited By ~ 81

Author(s):

S.P. Meyn

Keyword(s):

State Space ◽

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

General State ◽

Average Reward ◽

Markov Decision ◽

Policy Iteration Algorithm ◽

General State Space

Download Full-text

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

International Journal of Computer Applications ◽

10.5120/ijca2016908033 ◽

2016 ◽

Vol 133 (10) ◽

pp. 28-33 ◽

Cited By ~ 1

Author(s):

Sanaa Chafik ◽

Cherki Daoui

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

Operations Research/Computer Science Interfaces Series - Computing Tools for Modeling, Optimization and Simulation ◽

10.1007/978-1-4615-4567-5_9 ◽

2000 ◽

pp. 161-182 ◽

Cited By ~ 4

Author(s):

Ying He ◽

Michael C. Fu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Simulation Based ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

A policy iteration algorithm for Markov decision processes skip-free in one direction

Proceedings of the 2nd International ICST Conference on Performance Evaluation Methodologies and Tools ◽

10.4108/smctools.2007.1948 ◽

2007 ◽

Cited By ~ 3

Author(s):

J. Lambert ◽

B. Van Houdt ◽

C. Blondia

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

Policy Iteration for Decentralized Control of Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.2667 ◽

2009 ◽

Vol 34 ◽

pp. 89-132 ◽

Cited By ~ 28

Author(s):

D. S. Bernstein ◽

C. Amato ◽

E. A. Hansen ◽

S. Zilberstein

Keyword(s):

Probability Distributions ◽

Single Agent ◽

Policy Iteration ◽

The Other ◽

Iteration Algorithm ◽

Test Problems ◽

Formal Framework ◽

Finite State ◽

Markov Decision ◽

Policy Iteration Algorithm

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.

Download Full-text

Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm

Advances in Neural Networks – ISNN 2011 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-21090-7_70 ◽

2011 ◽

pp. 611-619

Author(s):

Jun Wu ◽

Xin Xu ◽

Lei Zuo ◽

Zhaobin Li ◽

Jian Wang

Keyword(s):

Least Squares ◽

Policy Iteration ◽

Iteration Algorithm ◽

Kernel Width ◽

Adaptive Kernel ◽

Policy Iteration Algorithm ◽

Selection For

Download Full-text

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Automated Technology for Verification and Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-319-46520-3_2 ◽

2016 ◽

pp. 13-31 ◽

Cited By ~ 2

Author(s):

Alessandro Abate ◽

Milan Češka ◽

Marta Kwiatkowska

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision ◽

Approximate Policy Iteration

Download Full-text