scholarly journals OBLIGATION BLACKWELL GAMES AND P-AUTOMATA

2017 ◽  
Vol 82 (2) ◽  
pp. 420-452
Author(s):  
KRISHNENDU CHATTERJEE ◽  
NIR PITERMAN

AbstractWe generalize winning conditions in two-player games by adding a structural acceptance condition called obligations. Obligations are orthogonal to the linear winning conditions that define whether a play is winning. Obligations are a declaration that player 0 can achieve a certain value from a configuration. If the obligation is met, the value of that configuration for player 0 is 1.We define the value in such games and show that obligation games are determined. For Markov chains with Borel objectives and obligations, and finite turn-based stochastic parity games with obligations we give an alternative and simpler characterization of the value function. Based on this simpler definition we show that the decision problem of winning finite turn-based stochastic parity games with obligations is in NP∩co-NP. We also show that obligation games provide a game framework for reasoning about p-automata.

Author(s):  
Nicholay Topin ◽  
Manuela Veloso

Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.


2006 ◽  
Vol 36 (1) ◽  
pp. 195-205 ◽  
Author(s):  
Annika Kangas

In many cases, it may be difficult to obtain explicit information on criteria weights for multicriteria decision analysis. Usually, however, at least the relevant criteria can be assumed to be known, even if their weights are not. In addition, complete or incomplete rank order of these criteria can be known, and it may be possible to obtain estimates for at least some of the value-function parameters. With some decision support tools, such as stochastic multicriteria acceptability analysis (SMAA), it is possible to use incomplete information. The main results of SMAA are the probabilities of certain alternative obtaining a given rank, given all the information available. These probabilities can be used for choosing the most recommendable alternative. However, recommendations are risky when the preference information is incomplete. In this study, the risks are studied through a simulation study based on a previous forestry decision problem with multiple criteria. (1) The probability that the best alternative is recommended and (2) the expected losses in the value of value function due to choosing the wrong alternative are modelled as a function of the characteristics of the true value function and the best alternative. The results show that the quality of decisions improves very quickly with improving information on weights. Determining at least the complete rank order of criteria is advisable, especially if the importances vary markedly among the criteria.


2005 ◽  
Vol 08 (01) ◽  
pp. 123-139 ◽  
Author(s):  
MARTIN DAHLGREN ◽  
RALF KORN

The valuation of a Swing option for stocks under the additional constraint of a minimum time distance between two different exercise times is considered. We give an explicit characterization of its pricing function as the value function of a multiple optimal stopping problem. The solution of this problem is related to a system of variational inequalities. We prove existence of a solution to this system and discuss the numerical implementation of a valuation algorithm.


1995 ◽  
Vol 2 (4) ◽  
pp. 335-346
Author(s):  
B. Dochviri

Abstract The connection between the optimal stopping problems for inhomogeneous standard Markov process and the corresponding homogeneous Markov process constructed in the extended state space is established. An excessive characterization of the value-function and the limit procedure for its construction in the problem of optimal stopping of an inhomogeneous standard Markov process is given. The form of ε-optimal (optimal) stopping times is also found.


1996 ◽  
Vol 44 (3) ◽  
pp. 387-399 ◽  
Author(s):  
Eitan Altman ◽  
Arie Hordijk ◽  
Lodewijk C. M. Kallenberg

Sign in / Sign up

Export Citation Format

Share Document