Bounds and good policies in stationary finite–stage Markovian decision problems

1980 ◽  
Vol 12 (1) ◽  
pp. 154-173 ◽  
Author(s):  
Gerhard Hübner

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

1980 ◽  
Vol 12 (01) ◽  
pp. 154-173
Author(s):  
Gerhard Hübner

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 916 ◽  
Author(s):  
Wen Cao ◽  
Chunmei Liu ◽  
Pengfei Jia

Aroma plays a significant role in the quality of citrus fruits and processed products. The detection and analysis of citrus volatiles can be measured by an electronic nose (E-nose); in this paper, an E-nose is employed to classify the juice which is stored for different days. Feature extraction and classification are two important requirements for an E-nose. During the training process, a classifier can optimize its own parameters to achieve a better classification accuracy but cannot decide its input data which is treated by feature extraction methods, so the classification result is not always ideal. Label consistent KSVD (L-KSVD) is a novel technique which can extract the feature and classify the data at the same time, and such an operation can improve the classification accuracy. We propose an enhanced L-KSVD called E-LCKSVD for E-nose in this paper. During E-LCKSVD, we introduce a kernel function to the traditional L-KSVD and present a new initialization technique of its dictionary; finally, the weighted coefficients of different parts of its object function is studied, and enhanced quantum-behaved particle swarm optimization (EQPSO) is employed to optimize these coefficients. During the experimental section, we firstly find the classification accuracy of KSVD, and L-KSVD is improved with the help of the kernel function; this can prove that their ability of dealing nonlinear data is improved. Then, we compare the results of different dictionary initialization techniques and prove our proposed method is better. Finally, we find the optimal value of the weighted coefficients of the object function of E-LCKSVD that can make E-nose reach a better performance.


1975 ◽  
Vol 7 (2) ◽  
pp. 330-348 ◽  
Author(s):  
Ulrich Rieder

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.


Symmetry ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1107
Author(s):  
Javier Cuesta

We study the relation between almost-symmetries and the geometry of Banach spaces. We show that any almost-linear extension of a transformation that preserves transition probabilities up to an additive error admits an approximation by a linear map, and the quality of the approximation depends on the type and cotype constants of the involved spaces.


2019 ◽  
Vol 56 (3) ◽  
pp. 810-829
Author(s):  
János Flesch ◽  
Dries Vermeulen ◽  
Anna Zseleva

AbstractWe consider decision problems with arbitrary action spaces, deterministic transitions, and infinite time horizon. In the usual setup when probability measures are countably additive, a general version of Kuhn’s theorem implies under fairly general conditions that for every mixed strategy of the decision maker there exists an equivalent behavior strategy. We examine to what extent this remains valid when probability measures are only assumed to be finitely additive. Under the classical approach of Dubins and Savage (2014), we prove the following statements: (1) If the action space is finite, every mixed strategy has an equivalent behavior strategy. (2) Even if the action space is infinite, at least one optimal mixed strategy has an equivalent behavior strategy. The approach by Dubins and Savage turns out to be essentially maximal: these two statements are no longer valid if we take any extension of their approach that considers all singleton plays.


Sign in / Sign up

Export Citation Format

Share Document