Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

2016 ◽  
Vol 138 (6) ◽  
Author(s):  
Thai Duong ◽  
Duong Nguyen-Huu ◽  
Thinh Nguyen

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

2015 ◽  
Vol 13 (3) ◽  
pp. 47-57 ◽  
Author(s):  
Sanaa Chafik ◽  
Cherki Daoui

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.


Author(s):  
Zengqiang Jiang ◽  
Dragan Banjevic ◽  
Mingcheng E ◽  
Bing Li

In this article, we present a maintenance model for metropolitan train wheels subjected to diameter or flange thickness overruns that includes condition monitoring with periodic inspection. We present a dynamic ([Formula: see text], [Formula: see text]) policy based on condition monitoring information, where [Formula: see text] is the wheel flange thickness threshold that triggers preventive re-profiling and [Formula: see text] is the recovery value for the wheel flange thickness after preventive re-profiling. The problem is modelled as a semi-Markov decision process that considers wear in terms of the diameter and flange thickness simultaneously. The problem is formulated in a two-dimensional state space; this space is defined as a combination of the diameter state and the flange thickness state. The model also considers imperfect wheel maintenance. The model’s objective is to minimize the maintenance cost per unit time that is expected in the long run. We apply a policy-iteration algorithm as the computational approach to determine the optimal re-profiling policy and use an example to demonstrate the method’s effectiveness.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sang-Ho Oh ◽  
Su Jin Lee ◽  
Juhwan Noh ◽  
Jeonghoon Mo

AbstractThe extensive utilization of electronic health records (EHRs) and the growth of enormous open biomedical datasets has readied the area for applications of computational and machine learning techniques to reveal fundamental patterns. This study’s goal is to develop a medical treatment recommendation system using Korean EHRs along with the Markov decision process (MDP). The sharing of EHRs by the National Health Insurance Sharing Service (NHISS) of Korea has made it possible to analyze Koreans’ medical data which include treatments, prescriptions, and medical check-up. After considering the merits and effectiveness of such data, we analyzed patients’ medical information and recommended optimal pharmaceutical prescriptions for diabetes, which is known to be the most burdensome disease for Koreans. We also proposed an MDP-based treatment recommendation system for diabetic patients to help doctors when prescribing diabetes medications. To build the model, we used the 11-year Korean NHISS database. To overcome the challenge of designing an MDP model, we carefully designed the states, actions, reward functions, and transition probability matrices, which were chosen to balance the tradeoffs between reality and the curse of dimensionality issues.


Author(s):  
Bingxin Yao ◽  
Bin Wu ◽  
Siyun Wu ◽  
Yin Ji ◽  
Danggui Chen ◽  
...  

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.


Sign in / Sign up

Export Citation Format

Share Document