Bayesian dynamic programming

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.

Download Full-text

Information processing in a three-actions dynamic decision model

European Journal of Operational Research ◽

10.1016/0377-2217(92)90118-s ◽

1992 ◽

Vol 62 (3) ◽

pp. 282-293 ◽

Cited By ~ 1

Author(s):

Werner Jammernegg ◽

Peter Kischka

Keyword(s):

Information Processing ◽

Decision Model ◽

Dynamic Decision Model

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336.v1 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

Dynamic Decision Model for Amount of AlF3 Addition in Industrial Aluminum Electrolysis

Proceedings of the 3rd International Conference on Mechatronics, Robotics and Automation ◽

10.2991/icmra-15.2015.152 ◽

2015 ◽

Cited By ~ 1

Author(s):

Shuiping Zeng ◽

Fuwei Cui

Keyword(s):

Decision Model ◽

Aluminum Electrolysis ◽

Dynamic Decision Model ◽

Industrial Aluminum

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

Sequential screening: A Bayesian dynamic programming analysis of optimal group-splitting

Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC) ◽

10.1109/wsc.2012.6465233 ◽

2012 ◽

Cited By ~ 3

Author(s):

Peter I. Frazier ◽

Bruno Jedynak ◽

Li Chen

Keyword(s):

Dynamic Programming ◽

Sequential Screening ◽

Programming Analysis ◽

Bayesian Dynamic Programming

Download Full-text

Bounds and good policies in stationary finite–stage Markovian decision problems

Advances in Applied Probability ◽

10.2307/1426499 ◽

1980 ◽

Vol 12 (1) ◽

pp. 154-173 ◽

Cited By ~ 9

Author(s):

Gerhard Hübner

Keyword(s):

Decision Model ◽

Transition Probabilities ◽

Planning Horizon ◽

Decision Problems ◽

Optimal Decisions ◽

Optimal Value ◽

Action Spaces ◽

Stationary Problems ◽

Markovian Decision Problems

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

Download Full-text

Fair comparison of developing software in different locations: dynamic decision model

World Review of Intermodal Transportation Research ◽

10.1504/writr.2019.099131 ◽

2019 ◽

Vol 8 (2) ◽

pp. 97

Author(s):

Darja Smite ◽

Emil Numminen

Keyword(s):

Decision Model ◽

Fair Comparison ◽

Dynamic Decision Model

Download Full-text

The robustness of positive recurrence and recurrence of Markov chains under perturbations of the transition probabilities

Journal of Applied Probability ◽

10.1017/s0021900200048701 ◽

1975 ◽

Vol 12 (04) ◽

pp. 744-752 ◽

Cited By ~ 3

Author(s):

Richard L. Tweedie

Keyword(s):

Markov Chain ◽

Markov Chains ◽

Finite Number ◽

Transition Probabilities ◽

General State ◽

Transition Matrices ◽

Positive Recurrence ◽

Markov Chain Models ◽

Countable Space ◽

General State Space

In many Markov chain models, the immediate characteristic of importance is the positive recurrence of the chain. In this note we investigate whether positivity, and also recurrence, are robust properties of Markov chains when the transition laws are perturbed. The chains we consider are on a fairly general state space : when specialised to a countable space, our results are essentially that, if the transition matrices of two irreducible chains coincide on all but a finite number of columns, then positivity of one implies positivity of both; whilst if they coincide on all but a finite number of rows and columns, recurrence of one implies recurrence of both. Examples are given to show that these results (and their general analogues) cannot in general be strengthened.

Download Full-text

A dynamic decision model of users' sustained participation in crowdsourcing innovation

International Journal of Services Operations and Informatics ◽

10.1504/ijsoi.2018.094660 ◽

2018 ◽

Vol 9 (3) ◽

pp. 223

Author(s):

Meng Qing liang ◽

Xu Xin hui

Keyword(s):

Decision Model ◽

Dynamic Decision Model

Download Full-text