Optimal routeing in two-queue polling systems

I. J. B. F. Adan; V. G. Kulkarni; N. Lee; E. Lefeber

doi:10.1017/jpr.2018.59

Optimal routeing in two-queue polling systems

Journal of Applied Probability ◽

10.1017/jpr.2018.59 ◽

2018 ◽

Vol 55 (3) ◽

pp. 944-967 ◽

Cited By ~ 3

Author(s):

I. J. B. F. Adan ◽

V. G. Kulkarni ◽

N. Lee ◽

E. Lefeber

Keyword(s):

Nash Equilibria ◽

Control Policy ◽

Complete Information ◽

Polling Systems ◽

Polling System ◽

Switching Curve ◽

Optimal Policies ◽

Markov Decision ◽

Switchover Times ◽

Available Information

Abstract We consider a polling system with two queues, exhaustive service, no switchover times, and exponential service times with rate µ in each queue. The waiting cost depends on the position of the queue relative to the server: it costs a customer c per time unit to wait in the busy queue (where the server is) and d per time unit in the idle queue (where there is no server). Customers arrive according to a Poisson process with rate λ. We study the control problem of how arrivals should be routed to the two queues in order to minimize the expected waiting costs and characterize individually and socially optimal routeing policies under three scenarios of available information at decision epochs: no, partial, and complete information. In the complete information case, we develop a new iterative algorithm to determine individually optimal policies (which are symmetric Nash equilibria), and show that such policies can be described by a switching curve. We use Markov decision processes to compute the socially optimal policies. We observe numerically that the socially optimal policy is well approximated by a linear switching curve. We prove that the control policy described by this linear switching curve is indeed optimal for the fluid version of the two-queue polling system.

Download Full-text

TWO-CLASS ROUTING WITH ADMISSION CONTROL AND STRICT PRIORITIES

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964817000195 ◽

2017 ◽

Vol 32 (2) ◽

pp. 163-178 ◽

Cited By ~ 1

Author(s):

Kenneth C. Chong ◽

Shane G. Henderson ◽

Mark E. Lewis

Keyword(s):

Admission Control ◽

Numerical Study ◽

Reward Structure ◽

Long Run ◽

Switching Curve ◽

Optimal Policies ◽

Markov Decision ◽

Average Reward Criteria ◽

The Value Function ◽

Long Run Average Reward

We consider the problem of routing and admission control in a loss system featuring two classes of arriving jobs (high-priority and low-priority jobs) and two types of servers, in which decision-making for high-priority jobs is forced, and rewards influence the desirability of each of the four possible routing decisions. We seek a policy that maximizes expected long-run reward, under both the discounted reward and long-run average reward criteria, and formulate the problem as a Markov decision process. When the reward structure favors high-priority jobs, we demonstrate that there exists an optimal monotone switching curve policy with slope of at least −1. When the reward structure favors low-priority jobs, we demonstrate that the value function, in general, lacks structure, which complicates the search for structure in optimal policies. However, we identify conditions under which optimal policies can be characterized in greater detail. We also examine the performance of heuristic policies in a brief numerical study.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Polling Systems with Zero Switchover Times: A Heavy-Traffic Averaging Principle

The Annals of Applied Probability ◽

10.1214/aoap/1177004701 ◽

1995 ◽

Vol 5 (3) ◽

pp. 681-719 ◽

Cited By ~ 61

Author(s):

E. G. Coffman ◽

A. A. Puhalskii ◽

M. I. Reiman

Keyword(s):

Heavy Traffic ◽

Polling Systems ◽

Averaging Principle ◽

Switchover Times

Download Full-text

A Markov Decision Process to Determine Optimal Policies in Moving Target

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security ◽

10.1145/3243734.3278489 ◽

2018 ◽

Cited By ~ 5

Author(s):

Jianjun Zheng ◽

Akbar Siami Namin

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Moving Target ◽

Optimal Policies ◽

Markov Decision

Download Full-text

On the Set of Optimal Policies in Variance Penalized Markov Decision Chains

Operations Research Proceedings - Operations Research Proceedings 2003 ◽

10.1007/978-3-642-17022-5_51 ◽

2004 ◽

pp. 395-402

Author(s):

Karel Sladký ◽

Milan Sitař

Keyword(s):

Optimal Policies ◽

Markov Decision

Download Full-text

The Computation of Average Optimal Policies in Denumerable State Markov Decision Chains

Advances in Applied Probability ◽

10.1017/s0001867800027816 ◽

1997 ◽

Vol 29 (01) ◽

pp. 114-137

Author(s):

Linn I. Sennott

Keyword(s):

Discrete Time ◽

Average Cost ◽

Queueing Systems ◽

State Spaces ◽

Original Process ◽

Optimal Policies ◽

Finite State ◽

Markov Decision ◽

Optimal Average ◽

Infinite State

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.

Download Full-text

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964801154082 ◽

2001 ◽

Vol 15 (4) ◽

pp. 557-564 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

Control Policy ◽

Stationary Policy ◽

Reward Function ◽

Total Reward ◽

Dynamic Programs ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Action Spaces ◽

Discounted Criterion

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.

Download Full-text

Average and blackwell optimal policies in denumerable Markov decision chains

10.1109/cdc.1986.267436 ◽

1986 ◽

Author(s):

Arie Hordijk

Keyword(s):

Optimal Policies ◽

Markov Decision

Download Full-text

M/G/∞ POLLING SYSTEMS WITH RANDOM VISIT TIMES

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964808000065 ◽

2007 ◽

Vol 22 (1) ◽

pp. 81-106 ◽

Cited By ~ 7

Author(s):

M. Vlasiou ◽

U. Yechiali

Keyword(s):

Service Time ◽

Probability Generating Function ◽

Expected Value ◽

Polling Systems ◽

Service Time Distribution ◽

Polling System ◽

Stieltjes Transform ◽

Polling Model ◽

Queue Lengths ◽

Number Of Customers

We consider a polling system where a group of an infinite number of servers visits sequentially a set of queues. When visited, each queue is attended for a random time. Arrivals at each queue follow a Poisson process, and the service time of each individual customer is drawn from a general probability distribution function. Thus, each of the queues comprising the system is, in isolation, anM/G/∞-type queue. A job that is not completed during a visit will have a new service-time requirement sampled from the service-time distribution of the corresponding queue. To the best of our knowledge, this article is the first in which anM/G/∞-type polling system is analyzed. For this polling model, we derive the probability generating function and expected value of the queue lengths and the Laplace–Stieltjes transform and expected value of the sojourn time of a customer. Moreover, we identify the policy that maximizes the throughput of the system per cycle and conclude that under the Hamiltonian-tour approach, the optimal visiting order isindependentof the number of customers present at the various queues at the start of the cycle.

Download Full-text

Fault-Tolerant Control of a Distributed Database System

Journal of Control Science and Engineering ◽

10.1155/2008/310652 ◽

2008 ◽

Vol 2008 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

N. Eva Wu ◽

Matthew C. Ruschmann ◽

Mark H. Linderman

Keyword(s):

Fault Tolerant ◽

Control Policy ◽

Distributed Database ◽

Database System ◽

Data Sets ◽

Single Server ◽

Markov Decision Problem ◽

Redundant Data ◽

Markov Decision ◽

Distributed Database System

Optimal state information-based control policy for a distributed database system subject to server failures is considered. Fault-tolerance is made possible by the partitioned architecture of the system and data redundancy therein. Control actions include restoration of lost data sets in a single server using redundant data sets in the remaining servers, routing of queries to intact servers, or overhaul of the entire system for renewal. Control policies are determined by solving Markov decision problems with cost criteria that penalize system unavailability and slow query response. Steady-state system availability and expected query response time of the controlled database are evaluated with the Markov model of the database. Robustness is addressed by introducing additional states into the database model to account for control action delays and decision errors. A robust control policy is solved for the Markov decision problem described by the augmented state model.

Download Full-text