Optimal routeing in two-queue polling systems

2018 ◽  
Vol 55 (3) ◽  
pp. 944-967 ◽  
Author(s):  
I. J. B. F. Adan ◽  
V. G. Kulkarni ◽  
N. Lee ◽  
E. Lefeber

Abstract We consider a polling system with two queues, exhaustive service, no switchover times, and exponential service times with rate µ in each queue. The waiting cost depends on the position of the queue relative to the server: it costs a customer c per time unit to wait in the busy queue (where the server is) and d per time unit in the idle queue (where there is no server). Customers arrive according to a Poisson process with rate λ. We study the control problem of how arrivals should be routed to the two queues in order to minimize the expected waiting costs and characterize individually and socially optimal routeing policies under three scenarios of available information at decision epochs: no, partial, and complete information. In the complete information case, we develop a new iterative algorithm to determine individually optimal policies (which are symmetric Nash equilibria), and show that such policies can be described by a switching curve. We use Markov decision processes to compute the socially optimal policies. We observe numerically that the socially optimal policy is well approximated by a linear switching curve. We prove that the control policy described by this linear switching curve is indeed optimal for the fluid version of the two-queue polling system.

2017 ◽  
Vol 32 (2) ◽  
pp. 163-178 ◽  
Author(s):  
Kenneth C. Chong ◽  
Shane G. Henderson ◽  
Mark E. Lewis

We consider the problem of routing and admission control in a loss system featuring two classes of arriving jobs (high-priority and low-priority jobs) and two types of servers, in which decision-making for high-priority jobs is forced, and rewards influence the desirability of each of the four possible routing decisions. We seek a policy that maximizes expected long-run reward, under both the discounted reward and long-run average reward criteria, and formulate the problem as a Markov decision process. When the reward structure favors high-priority jobs, we demonstrate that there exists an optimal monotone switching curve policy with slope of at least −1. When the reward structure favors low-priority jobs, we demonstrate that the value function, in general, lacks structure, which complicates the search for structure in optimal policies. However, we identify conditions under which optimal policies can be characterized in greater detail. We also examine the performance of heuristic policies in a brief numerical study.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


1995 ◽  
Vol 5 (3) ◽  
pp. 681-719 ◽  
Author(s):  
E. G. Coffman ◽  
A. A. Puhalskii ◽  
M. I. Reiman

1997 ◽  
Vol 29 (01) ◽  
pp. 114-137
Author(s):  
Linn I. Sennott

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.


2001 ◽  
Vol 15 (4) ◽  
pp. 557-564 ◽  
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-de-Oca

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.


2007 ◽  
Vol 22 (1) ◽  
pp. 81-106 ◽  
Author(s):  
M. Vlasiou ◽  
U. Yechiali

We consider a polling system where a group of an infinite number of servers visits sequentially a set of queues. When visited, each queue is attended for a random time. Arrivals at each queue follow a Poisson process, and the service time of each individual customer is drawn from a general probability distribution function. Thus, each of the queues comprising the system is, in isolation, anM/G/∞-type queue. A job that is not completed during a visit will have a new service-time requirement sampled from the service-time distribution of the corresponding queue. To the best of our knowledge, this article is the first in which anM/G/∞-type polling system is analyzed. For this polling model, we derive the probability generating function and expected value of the queue lengths and the Laplace–Stieltjes transform and expected value of the sojourn time of a customer. Moreover, we identify the policy that maximizes the throughput of the system per cycle and conclude that under the Hamiltonian-tour approach, the optimal visiting order isindependentof the number of customers present at the various queues at the start of the cycle.


2008 ◽  
Vol 2008 ◽  
pp. 1-13 ◽  
Author(s):  
N. Eva Wu ◽  
Matthew C. Ruschmann ◽  
Mark H. Linderman

Optimal state information-based control policy for a distributed database system subject to server failures is considered. Fault-tolerance is made possible by the partitioned architecture of the system and data redundancy therein. Control actions include restoration of lost data sets in a single server using redundant data sets in the remaining servers, routing of queries to intact servers, or overhaul of the entire system for renewal. Control policies are determined by solving Markov decision problems with cost criteria that penalize system unavailability and slow query response. Steady-state system availability and expected query response time of the controlled database are evaluated with the Markov model of the database. Robustness is addressed by introducing additional states into the database model to account for control action delays and decision errors. A robust control policy is solved for the Markov decision problem described by the augmented state model.


Sign in / Sign up

Export Citation Format

Share Document