On the value function in constrained control of Markov chains

Eitan Altman; Arie Hordijk; Lodewijk C. M. Kallenberg

doi:10.1007/bf01193938

OBLIGATION BLACKWELL GAMES AND P-AUTOMATA

Journal of Symbolic Logic ◽

10.1017/jsl.2016.71 ◽

2017 ◽

Vol 82 (2) ◽

pp. 420-452

Author(s):

KRISHNENDU CHATTERJEE ◽

NIR PITERMAN

Keyword(s):

Markov Chains ◽

Decision Problem ◽

Value Function ◽

Acceptance Condition ◽

Parity Games ◽

The Value Function

AbstractWe generalize winning conditions in two-player games by adding a structural acceptance condition called obligations. Obligations are orthogonal to the linear winning conditions that define whether a play is winning. Obligations are a declaration that player 0 can achieve a certain value from a configuration. If the obligation is met, the value of that configuration for player 0 is 1.We define the value in such games and show that obligation games are determined. For Markov chains with Borel objectives and obligations, and finite turn-based stochastic parity games with obligations we give an alternative and simpler characterization of the value function. Based on this simpler definition we show that the decision problem of winning finite turn-based stochastic parity games with obligations is in NP∩co-NP. We also show that obligation games provide a game framework for reasoning about p-automata.

Download Full-text

A note on the value function for constrained control problems

Systems & Control Letters ◽

10.1016/j.sysconle.2005.04.012 ◽

2006 ◽

Vol 55 (1) ◽

pp. 21-26 ◽

Cited By ~ 3

Author(s):

Aurelian Cernea ◽

Hélène Frankowska

Keyword(s):

Value Function ◽

Control Problems ◽

Constrained Control ◽

Constrained Control Problems ◽

The Value Function

Download Full-text

Generation of Policy-Level Explanations for Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012514 ◽

2019 ◽

Vol 33 ◽

pp. 2514-2521

Author(s):

Nicholay Topin ◽

Manuela Veloso

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Markov Chains ◽

Time Complexity ◽

Value Function ◽

Worst Case ◽

Policy Level ◽

Individual Decisions ◽

The Value Function

Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.

Download Full-text

Characterization of the value function of final state constrained control problems with BV trajectories

Communications on Pure and Applied Analysis ◽

10.3934/cpaa.2011.10.1567 ◽

2011 ◽

Vol 10 (6) ◽

pp. 1567-1587 ◽

Cited By ~ 2

Author(s):

Ariela Briani ◽

Hasnaa Zidani

Keyword(s):

Value Function ◽

Control Problems ◽

Constrained Control ◽

Final State ◽

Constrained Control Problems ◽

The Value Function

Download Full-text

Discrete homing problems

Archives of Control Sciences ◽

10.2478/v10170-011-0039-6 ◽

2013 ◽

Vol 23 (1) ◽

pp. 5-18

Author(s):

Mario Lefebvre ◽

Moussa Kounta

Keyword(s):

Optimal Control ◽

Markov Chain ◽

Markov Chains ◽

Cost Function ◽

Discrete Time ◽

Value Function ◽

The Cost ◽

The Value Function

Abstract We consider the so-called homing problem for discrete-time Markov chains. The aim is to optimally control the Markov chain until it hits a given boundary. Depending on a parameter in the cost function, the optimizer either wants to maximize or minimize the time spent by the controlled process in the continuation region. Particular problems are considered and solved explicitly. Both the optimal control and the value function are obtained

Download Full-text

On the forward algorithm for stopping problems on continuous-time Markov chains

Journal of Applied Probability ◽

10.1017/jpr.2021.11 ◽

2021 ◽

Vol 58 (4) ◽

pp. 1043-1063

Author(s):

Laurent Miclo ◽

Stéphane Villeneuve

Keyword(s):

Markov Chains ◽

Optimal Stopping ◽

Continuous Time ◽

Value Function ◽

Constructive Method ◽

Forward Algorithm ◽

Continuous Time Markov Chains ◽

Stopping Set ◽

Optimal Stopping Problems ◽

The Value Function

AbstractWe revisit the forward algorithm, developed by Irle, to characterize both the value function and the stopping set for a large class of optimal stopping problems on continuous-time Markov chains. Our objective is to renew interest in this constructive method by showing its usefulness in solving some constrained optimal stopping problems that have emerged recently.

Download Full-text

The value function for time-related decisions

PsycEXTRA Dataset ◽

10.1037/e653632011-006 ◽

2011 ◽

Author(s):

Anouk Festjens ◽

Siegfried Dewitte ◽

Enrico Diecidue ◽

Sabrina Bruyneel

Keyword(s):

Value Function ◽

The Value Function

Download Full-text

The Equal Tails: A Method to Elicit the Value Function

SSRN Electronic Journal ◽

10.2139/ssrn.893748 ◽

2006 ◽

Author(s):

Manel Baucells ◽

Antonio Villasis

Keyword(s):

Value Function ◽

The Value Function

Download Full-text

Solving flow-shop scheduling problem with a reinforcement learning algorithm that generalizes the value function with neural network

Alexandria Engineering Journal ◽

10.1016/j.aej.2021.01.030 ◽

2021 ◽

Vol 60 (3) ◽

pp. 2787-2800

Author(s):

Jianfeng Ren ◽

Chunming Ye ◽

Feng Yang

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Value Function ◽

Flow Shop ◽

Learning Algorithm ◽

Flow Shop Scheduling ◽

Scheduling Problem ◽

Shop Scheduling ◽

The Value Function ◽

Reinforcement Learning Algorithm

Download Full-text

Pricing Perpetual American Put Options with Asset-Dependent Discounting

Journal of Risk and Financial Management ◽

10.3390/jrfm14030130 ◽

2021 ◽

Vol 14 (3) ◽

pp. 130

Author(s):

Jonas Al-Hadad ◽

Zbigniew Palmowski

Keyword(s):

Value Function ◽

Asset Price ◽

Stopping Times ◽

Martingale Measure ◽

Put Options ◽

American Put Options ◽

Exact Calculations ◽

Negative Exponential ◽

American Put ◽

The Value Function

The main objective of this paper is to present an algorithm of pricing perpetual American put options with asset-dependent discounting. The value function of such an instrument can be described as VAPutω(s)=supτ∈TEs[e−∫0τω(Sw)dw(K−Sτ)+], where T is a family of stopping times, ω is a discount function and E is an expectation taken with respect to a martingale measure. Moreover, we assume that the asset price process St is a geometric Lévy process with negative exponential jumps, i.e., St=seζt+σBt−∑i=1NtYi. The asset-dependent discounting is reflected in the ω function, so this approach is a generalisation of the classic case when ω is constant. It turns out that under certain conditions on the ω function, the value function VAPutω(s) is convex and can be represented in a closed form. We provide an option pricing algorithm in this scenario and we present exact calculations for the particular choices of ω such that VAPutω(s) takes a simplified form.

Download Full-text