OBLIGATION BLACKWELL GAMES AND P-AUTOMATA

KRISHNENDU CHATTERJEE; NIR PITERMAN

doi:10.1017/jsl.2016.71

OBLIGATION BLACKWELL GAMES AND P-AUTOMATA

Journal of Symbolic Logic ◽

10.1017/jsl.2016.71 ◽

2017 ◽

Vol 82 (2) ◽

pp. 420-452

Author(s):

KRISHNENDU CHATTERJEE ◽

NIR PITERMAN

Keyword(s):

Markov Chains ◽

Decision Problem ◽

Value Function ◽

Acceptance Condition ◽

Parity Games ◽

The Value Function

AbstractWe generalize winning conditions in two-player games by adding a structural acceptance condition called obligations. Obligations are orthogonal to the linear winning conditions that define whether a play is winning. Obligations are a declaration that player 0 can achieve a certain value from a configuration. If the obligation is met, the value of that configuration for player 0 is 1.We define the value in such games and show that obligation games are determined. For Markov chains with Borel objectives and obligations, and finite turn-based stochastic parity games with obligations we give an alternative and simpler characterization of the value function. Based on this simpler definition we show that the decision problem of winning finite turn-based stochastic parity games with obligations is in NP∩co-NP. We also show that obligation games provide a game framework for reasoning about p-automata.

Download Full-text

Dual Characterization of the Value Function in the Robust Utility Maximization Problem

Theory of Probability and Its Applications ◽

10.1137/s0040585x9798508x ◽

2011 ◽

Vol 55 (4) ◽

pp. 611-630 ◽

Cited By ~ 3

Author(s):

A. A. Gushchin

Keyword(s):

Utility Maximization ◽

Value Function ◽

Maximization Problem ◽

Robust Utility Maximization ◽

Utility Maximization Problem ◽

The Value Function

Download Full-text

Generation of Policy-Level Explanations for Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012514 ◽

2019 ◽

Vol 33 ◽

pp. 2514-2521

Author(s):

Nicholay Topin ◽

Manuela Veloso

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Markov Chains ◽

Time Complexity ◽

Value Function ◽

Worst Case ◽

Policy Level ◽

Individual Decisions ◽

The Value Function

Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.

Download Full-text

The risk of decision making with incomplete criteria weight information

Canadian Journal of Forest Research ◽

10.1139/x05-243 ◽

2006 ◽

Vol 36 (1) ◽

pp. 195-205 ◽

Cited By ~ 8

Author(s):

Annika Kangas

Keyword(s):

Decision Problem ◽

Value Function ◽

Rank Order ◽

Decision Support Tools ◽

Preference Information ◽

True Value ◽

Support Tools ◽

Criteria Weights ◽

The Value Function

In many cases, it may be difficult to obtain explicit information on criteria weights for multicriteria decision analysis. Usually, however, at least the relevant criteria can be assumed to be known, even if their weights are not. In addition, complete or incomplete rank order of these criteria can be known, and it may be possible to obtain estimates for at least some of the value-function parameters. With some decision support tools, such as stochastic multicriteria acceptability analysis (SMAA), it is possible to use incomplete information. The main results of SMAA are the probabilities of certain alternative obtaining a given rank, given all the information available. These probabilities can be used for choosing the most recommendable alternative. However, recommendations are risky when the preference information is incomplete. In this study, the risks are studied through a simulation study based on a previous forestry decision problem with multiple criteria. (1) The probability that the best alternative is recommended and (2) the expected losses in the value of value function due to choosing the wrong alternative are modelled as a function of the characteristics of the true value function and the best alternative. The results show that the quality of decisions improves very quickly with improving information on weights. Determining at least the complete rank order of criteria is advisable, especially if the importances vary markedly among the criteria.

Download Full-text

THE SWING OPTION ON THE STOCK MARKET

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024905002895 ◽

2005 ◽

Vol 08 (01) ◽

pp. 123-139 ◽

Cited By ~ 7

Author(s):

MARTIN DAHLGREN ◽

RALF KORN

Keyword(s):

Optimal Stopping ◽

Value Function ◽

Additional Constraint ◽

Numerical Implementation ◽

Optimal Stopping Problem ◽

Existence Of A Solution ◽

Swing Option ◽

Time Distance ◽

The Value Function

The valuation of a Swing option for stocks under the additional constraint of a minimum time distance between two different exercise times is considered. We give an explicit characterization of its pricing function as the value function of a multiple optimal stopping problem. The solution of this problem is related to a system of variational inequalities. We prove existence of a solution to this system and discuss the numerical implementation of a valuation algorithm.

Download Full-text

Viscosity Characterization of the Value Function of an Investment-Consumption Problem in Presence of an Illiquid Asset

Journal of Optimization Theory and Applications ◽

10.1007/s10957-013-0372-y ◽

2013 ◽

Vol 160 (3) ◽

pp. 966-991 ◽

Cited By ~ 7

Author(s):

Salvatore Federico ◽

Paul Gassiat

Keyword(s):

Value Function ◽

Illiquid Asset ◽

The Value Function

Download Full-text

On a Level-Set Characterization of the Value Function of an Integer Program and Its Application to Stochastic Programming

Operations Research ◽

10.1287/opre.1120.1156 ◽

2013 ◽

Vol 61 (2) ◽

pp. 498-511 ◽

Cited By ~ 10

Author(s):

Andrew C. Trapp ◽

Oleg A. Prokopyev ◽

Andrew J. Schaefer

Keyword(s):

Stochastic Programming ◽

Level Set ◽

Value Function ◽

Integer Program ◽

The Value Function

Download Full-text

On Optimal Stopping of Inhomogeneous Standard Markov Processes

Georgian Mathematical Journal ◽

10.1515/gmj.1994.335 ◽

1995 ◽

Vol 2 (4) ◽

pp. 335-346

Author(s):

B. Dochviri

Keyword(s):

Markov Process ◽

Optimal Stopping ◽

Value Function ◽

Stopping Times ◽

Homogeneous Markov Process ◽

Limit Procedure ◽

Optimal Stopping Problems ◽

Optimal Stopping Times ◽

The Value Function

Abstract The connection between the optimal stopping problems for inhomogeneous standard Markov process and the corresponding homogeneous Markov process constructed in the extended state space is established. An excessive characterization of the value-function and the limit procedure for its construction in the problem of optimal stopping of an inhomogeneous standard Markov process is given. The form of ε-optimal (optimal) stopping times is also found.

Download Full-text