Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Thai Duong; Duong Nguyen-Huu; Thinh Nguyen

doi:10.1115/1.4032875

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4032875 ◽

2016 ◽

Vol 138 (6) ◽

Author(s):

Thai Duong ◽

Duong Nguyen-Huu ◽

Thinh Nguyen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Probability ◽

Transition Probability Matrix ◽

Rate Of Change ◽

Optimal Decision ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download Full-text

Markov Decision Process Parallel Value Iteration Algorithm On GPU

Proceedings of the 2013 International Conference on Information Science and Computer Applications (ISCA 2013) ◽

10.2991/isca-13.2013.51 ◽

2013 ◽

Cited By ~ 4

Author(s):

Peng Chen ◽

Lu Lu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Download Full-text

Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis

Operations Research ◽

10.1287/opre.42.5.940 ◽

1994 ◽

Vol 42 (5) ◽

pp. 940-946 ◽

Cited By ~ 10

Author(s):

Meir Herzberg ◽

Uri Yechiali

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

One Step ◽

Value Iteration Algorithm

Download Full-text

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2015070104 ◽

2015 ◽

Vol 13 (3) ◽

pp. 47-57 ◽

Cited By ~ 1

Author(s):

Sanaa Chafik ◽

Cherki Daoui

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Decomposition Technique ◽

Artificial Data ◽

Markov Decision ◽

Speed Up ◽

Value Iteration Algorithm

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.

Download Full-text

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2009.4927527 ◽

2009 ◽

Cited By ~ 7

Author(s):

Jun Ma ◽

Warren B. Powell

Keyword(s):

Least Squares ◽

Markov Decision Process ◽

Decision Process ◽

Recursive Least Squares ◽

Iteration Algorithm ◽

Continuous State ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm ◽

Action Spaces

Download Full-text

Criteria for selecting the relaxation factor of the value iteration algorithm for undiscounted Markov and semi-Markov decision processes

Operations Research Letters ◽

10.1016/0167-6377(91)90059-x ◽

1991 ◽

Vol 10 (4) ◽

pp. 193-202 ◽

Cited By ~ 6

Author(s):

Meir Herzberg ◽

Uri Yechiali

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Relaxation Factor ◽

Markov Decision ◽

Value Iteration Algorithm

Download Full-text

Optimizing the re-profiling policy regarding metropolitan train wheels based on a semi-Markov decision process

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x17710816 ◽

2017 ◽

Vol 231 (5) ◽

pp. 495-507 ◽

Cited By ~ 3

Author(s):

Zengqiang Jiang ◽

Dragan Banjevic ◽

Mingcheng E ◽

Bing Li

Keyword(s):

Markov Decision Process ◽

Condition Monitoring ◽

Decision Process ◽

Maintenance Cost ◽

Iteration Algorithm ◽

Long Run ◽

Flange Thickness ◽

Markov Decision ◽

Dimensional State Space ◽

Policy Iteration Algorithm

In this article, we present a maintenance model for metropolitan train wheels subjected to diameter or flange thickness overruns that includes condition monitoring with periodic inspection. We present a dynamic ([Formula: see text], [Formula: see text]) policy based on condition monitoring information, where [Formula: see text] is the wheel flange thickness threshold that triggers preventive re-profiling and [Formula: see text] is the recovery value for the wheel flange thickness after preventive re-profiling. The problem is modelled as a semi-Markov decision process that considers wear in terms of the diameter and flange thickness simultaneously. The problem is formulated in a two-dimensional state space; this space is defined as a combination of the diameter state and the flange thickness state. The model also considers imperfect wheel maintenance. The model’s objective is to minimize the maintenance cost per unit time that is expected in the long run. We apply a policy-iteration algorithm as the computational approach to determine the optimal re-profiling policy and use an example to demonstrate the method’s effectiveness.

Download Full-text

Optimal treatment recommendations for diabetes patients using the Markov decision process along with the South Korean electronic health records

Scientific Reports ◽

10.1038/s41598-021-86419-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sang-Ho Oh ◽

Su Jin Lee ◽

Juhwan Noh ◽

Jeonghoon Mo

Keyword(s):

Electronic Health Records ◽

Markov Decision Process ◽

Decision Process ◽

Recommendation System ◽

Transition Probability ◽

Treatment Recommendation ◽

Diabetic Patients ◽

Health Records ◽

Markov Decision ◽

Electronic Health

AbstractThe extensive utilization of electronic health records (EHRs) and the growth of enormous open biomedical datasets has readied the area for applications of computational and machine learning techniques to reveal fundamental patterns. This study’s goal is to develop a medical treatment recommendation system using Korean EHRs along with the Markov decision process (MDP). The sharing of EHRs by the National Health Insurance Sharing Service (NHISS) of Korea has made it possible to analyze Koreans’ medical data which include treatments, prescriptions, and medical check-up. After considering the merits and effectiveness of such data, we analyzed patients’ medical information and recommended optimal pharmaceutical prescriptions for diabetes, which is known to be the most burdensome disease for Koreans. We also proposed an MDP-based treatment recommendation system for diabetic patients to help doctors when prescribing diabetes medications. To build the model, we used the 11-year Korean NHISS database. To overcome the challenge of designing an MDP model, we carefully designed the states, actions, reward functions, and transition probability matrices, which were chosen to balance the tradeoffs between reality and the curse of dimensionality issues.

Download Full-text

Toward an optimized value iteration algorithm for average cost Markov decision processes

49th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc.2010.5717895 ◽

2010 ◽

Cited By ~ 3

Author(s):

Edilson F. Arruda ◽

Fabricio Ourique ◽

Anthony Almudevar

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Download Full-text

The Value Iteration Algorithm in Risk-Sensitive Average Markov Decision Chains with Finite State Space

Mathematics of Operations Research ◽

10.1287/moor.28.4.752.20515 ◽

2003 ◽

Vol 28 (4) ◽

pp. 752-776 ◽

Cited By ~ 12

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

State Space ◽

Iteration Algorithm ◽

Value Iteration ◽

Risk Sensitive ◽

Finite State ◽

Markov Decision ◽

Value Iteration Algorithm ◽

Finite State Space

Download Full-text

An Offloading Algorithm based on Markov Decision Process in Mobile Edge Computing System

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.15 ◽

2022 ◽

Vol 16 ◽

pp. 115-121

Author(s):

Bingxin Yao ◽

Bin Wu ◽

Siyun Wu ◽

Yin Ji ◽

Danggui Chen ◽

...

Keyword(s):

Energy Consumption ◽

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Wireless Channel ◽

Edge Computing ◽

Iteration Algorithm ◽

Mobile Edge Computing ◽

Markov Decision ◽

The Value Function

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.

Download Full-text