A polynomial time bound for Howard's policy improvement algorithm

OR Spectrum ◽  
1986 ◽  
Vol 8 (1) ◽  
pp. 37-40 ◽  
Author(s):  
U. Meister ◽  
U. Holzbaur
1989 ◽  
Vol 3 (3) ◽  
pp. 397-403 ◽  
Author(s):  
P. Whittle

A condition expressed in Eq. (7) is given which, with one simplifying regularity condition, ensures that the policy-improvement algorithm is equivalent to application of the Newton–Raphson algorithm to an optimality condition. It is shown that this condition covers the two known cases of such equivalence, and another example is noted. The condition is believed to be necessary to within transformations of the problem, but this has not been proved.


2000 ◽  
Vol 7 (48) ◽  
Author(s):  
Marcin Jurdzinski ◽  
Jens Vöge

A discrete strategy improvement algorithm is given for constructing<br />winning strategies in parity games, thereby providing<br />also a new solution of the model-checking problem for the modal<br />-calculus. Known strategy improvement algorithms, as proposed<br />for stochastic games by Homan and Karp in 1966, and for discounted payoff games and parity games by Puri in 1995, work with real numbers and require solving linear programming instances involving high precision arithmetic. In the present algorithm for parity games these difficulties are avoided by the use of discrete vertex valuations in which information about the relevance of vertices and certain distances is coded. An efficient implementation is given for a strategy improvement step. Another advantage of the present approach is that it provides a better conceptual understanding and easier analysis of strategy improvement algorithms for parity games. However, so far it is not known whether the present algorithm works in polynomial time. The long standing problem whether parity games can be solved in polynomial time remains open.


2015 ◽  
Vol 115 (6-8) ◽  
pp. 612-617 ◽  
Author(s):  
S. Haddadi ◽  
S. Chenche ◽  
M. Cheraitia ◽  
F. Guessoum

Author(s):  
Ari Arapostathis ◽  
Anup Biswas ◽  
Somnath Pradhan

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.


2020 ◽  
Vol 22 (02) ◽  
pp. 2040008
Author(s):  
P. Mondal ◽  
S. K. Neogy ◽  
A. Gupta ◽  
D. Ghorui

Zero-sum two-person discounted semi-Markov games with finite state and action spaces are studied where a collection of states having Perfect Information (PI) property is mixed with another collection of states having Additive Reward–Additive Transition and Action Independent Transition Time (AR-AT-AITT) property. For such a PI/AR-AT-AITT mixture class of games, we prove the existence of an optimal pure stationary strategy for each player. We develop a policy improvement algorithm for solving discounted semi-Markov decision processes (one player version of semi-Markov games) and using it we obtain a policy-improvement type algorithm for computing an optimal strategy pair of a PI/AR-AT-AITT mixture semi-Markov game. Finally, we extend our results when the states having PI property are replaced by a subclass of Switching Control (SC) states.


Stochastics ◽  
2016 ◽  
Vol 89 (1) ◽  
pp. 348-359 ◽  
Author(s):  
Saul D. Jacka ◽  
Aleksandar Mijatović

Sign in / Sign up

Export Citation Format

Share Document