scholarly journals Policy Iteration for Decentralized Control of Markov Decision Processes

2009 ◽  
Vol 34 ◽  
pp. 89-132 ◽  
Author(s):  
D. S. Bernstein ◽  
C. Amato ◽  
E. A. Hansen ◽  
S. Zilberstein

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.

2012 ◽  
Vol 89 (1-2) ◽  
pp. 123-156 ◽  
Author(s):  
Johannes Fürnkranz ◽  
Eyke Hüllermeier ◽  
Weiwei Cheng ◽  
Sang-Hyeun Park

2014 ◽  
Vol 50 ◽  
pp. 763-803 ◽  
Author(s):  
A. M. S. Barreto ◽  
J. Pineau ◽  
D. Precup

When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited to create a compact version of a Markov decision process (MDP), and hence to reduce the computational cost of dynamic programming. Building on this idea, this paper presents an approximate policy iteration algorithm called policy iteration based on stochastic factorization, or PISF for short. In terms of computational complexity, PISF replaces standard policy iteration's cubic dependence on the size of the MDP with a function that grows only linearly with the number of states in the model. The proposed algorithm also enjoys nice theoretical properties: it always terminates after a finite number of iterations and returns a decision policy whose performance only depends on the quality of the stochastic factorization. In particular, if the approximation error in the factorization is sufficiently small, PISF computes the optimal value function of the MDP. The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest.


2019 ◽  
Vol 65 ◽  
pp. 27-45
Author(s):  
René Aïd ◽  
Francisco Bernal ◽  
Mohamed Mnif ◽  
Diego Zabaljauregui ◽  
Jorge P. Zubelli

This work presents a novel policy iteration algorithm to tackle nonzero-sum stochastic impulse games arising naturally in many applications. Despite the obvious impact of solving such problems, there are no suitable numerical methods available, to the best of our knowledge. Our method relies on the recently introduced characterisation of the value functions and Nash equilibrium via a system of quasi-variational inequalities. While our algorithm is heuristic and we do not provide a convergence analysis, numerical tests show that it performs convincingly in a wide range of situations, including the only analytically solvable example available in the literature at the time of writing.


Sign in / Sign up

Export Citation Format

Share Document