The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

1978 ◽  
Vol 26 (2) ◽  
pp. 282-304 ◽  
Author(s):  
Edward J. Sondik
2011 ◽  
Vol 10 (06) ◽  
pp. 1175-1197 ◽  
Author(s):  
JOHN GOULIONIS ◽  
D. STENGOS

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.


Sign in / Sign up

Export Citation Format

Share Document