long run average reward Latest Research Papers

A Modest Approach to Markov Automata

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3449355 ◽

2021 ◽

Vol 31 (3) ◽

pp. 1-34

Author(s):

Yuliya Butkova ◽

Arnd Hartmanns ◽

Holger Hermanns

Keyword(s):

Model Checking ◽

The State ◽

Straightforward Application ◽

Long Run ◽

Simulation Based ◽

Detailed Evaluation ◽

High Level ◽

Stochastic Time ◽

Markov Automata ◽

Long Run Average Reward

Markov automata are a compositional modelling formalism with continuous stochastic time, discrete probabilities, and nondeterministic choices. In this article, we present extensions to M ODEST , an expressive high-level language with roots in process algebra, that allow large Markov automata models to be specified in a succinct, modular way. We illustrate the advantages of M ODEST over alternative languages. Model checking Markov automata models requires dedicated algorithms for time-bounded and long-run average reward properties. We describe and evaluate the state-of-the-art algorithms implemented in the mcsta model checker of the M ODEST T OOLSET . We find that mcsta improves the performance and scalability of Markov automata model checking compared to earlier and alternative tools. We explain a partial-exploration approach based on the BRTDP method designed to mitigate the state space explosion problem of model checking, and experimentally evaluate its effectiveness. This problem can be avoided entirely by purely simulation-based techniques, but the nondeterminism in Markov automata hinders their straightforward application. We explain how lightweight scheduler sampling can make simulation possible, and provide a detailed evaluation of its usefulness on several benchmarks using the M ODEST T OOLSET ’s modes simulator.

Download Full-text

TWO-CLASS ROUTING WITH ADMISSION CONTROL AND STRICT PRIORITIES

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964817000195 ◽

2017 ◽

Vol 32 (2) ◽

pp. 163-178 ◽

Cited By ~ 1

Author(s):

Kenneth C. Chong ◽

Shane G. Henderson ◽

Mark E. Lewis

Keyword(s):

Admission Control ◽

Numerical Study ◽

Reward Structure ◽

Long Run ◽

Switching Curve ◽

Optimal Policies ◽

Markov Decision ◽

Average Reward Criteria ◽

The Value Function ◽

Long Run Average Reward

We consider the problem of routing and admission control in a loss system featuring two classes of arriving jobs (high-priority and low-priority jobs) and two types of servers, in which decision-making for high-priority jobs is forced, and rewards influence the desirability of each of the four possible routing decisions. We seek a policy that maximizes expected long-run reward, under both the discounted reward and long-run average reward criteria, and formulate the problem as a Markov decision process. When the reward structure favors high-priority jobs, we demonstrate that there exists an optimal monotone switching curve policy with slope of at least −1. When the reward structure favors low-priority jobs, we demonstrate that the value function, in general, lacks structure, which complicates the search for structure in optimal policies. However, we identify conditions under which optimal policies can be characterized in greater detail. We also examine the performance of heuristic policies in a brief numerical study.

Download Full-text

Value Iteration for Long-Run Average Reward in Markov Decision Processes

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-319-63387-9_10 ◽

2017 ◽

pp. 201-221 ◽

Cited By ~ 9

Author(s):

Pranav Ashok ◽

Krishnendu Chatterjee ◽

Przemysław Daca ◽

Jan Křetínský ◽

Tobias Meggendorfer

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Value Iteration ◽

Average Reward ◽

Long Run ◽

Markov Decision ◽

Long Run Average Reward

Download Full-text

A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward

IIE Transactions ◽

10.1080/07408170490438672 ◽

2004 ◽

Vol 36 (6) ◽

pp. 557-567 ◽

Cited By ~ 14

Author(s):

ABHIJIT GOSAVI ◽

TAPAS K. DAS ◽

SUDEEP SARKAR

Keyword(s):

Learning Automata ◽

Decision Problems ◽

Average Reward ◽

Markov Decision Problems ◽

Long Run ◽

Simulation Based ◽

Markov Decision ◽

Long Run Average Reward

Download Full-text

Some memoryless bandit policies

Journal of Applied Probability ◽

10.1239/jap/1044476838 ◽

2003 ◽

Vol 40 (1) ◽

pp. 250-256 ◽

Cited By ~ 1

Author(s):

Erol A. Peköz

Keyword(s):

Average Reward ◽

Bandit Problem ◽

Unknown Distribution ◽

Long Run ◽

Long Run Average Reward ◽

Multiarmed Bandit ◽

Independent And Identically Distributed

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.

Download Full-text

Some memoryless bandit policies

Journal of Applied Probability ◽

10.1017/s0021900200022373 ◽

2003 ◽

Vol 40 (01) ◽

pp. 250-256

Author(s):

Erol A. Peköz

Keyword(s):

Average Reward ◽

Bandit Problem ◽

Unknown Distribution ◽

Long Run ◽

Long Run Average Reward ◽

Multiarmed Bandit ◽

Independent And Identically Distributed

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.

Download Full-text

STOCHASTIC DISCRETIZATION FOR THE LONG-RUN AVERAGE REWARD IN FLUID MODELS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964803172075 ◽

2003 ◽

Vol 17 (2) ◽

pp. 251-265 ◽

Cited By ~ 3

Author(s):

I.J.B.F. Adan ◽

J.A.C. Resing ◽

V.G. Kulkarni

Keyword(s):

Fluid Model ◽

Geometric Approach ◽

Random Variable ◽

Death Process ◽

Average Reward ◽

Discrete State ◽

Limiting Behavior ◽

Long Run ◽

Continuous State ◽

Long Run Average Reward

Stochastic discretization is a technique of representing a continuous random variable as a random sum of i.i.d. exponential random variables. In this article, we apply this technique to study the limiting behavior of a stochastic fluid model. Specifically, we consider an infinite-capacity fluid buffer, where the net input of fluid is regulated by a finite-state irreducible continuous-time Markov chain. Most long-run performance characteristics for such a fluid system can be expressed as the long-run average reward for a suitably chosen reward structure. In this article, we use stochastic discretization of the fluid content process to efficiently determine the long-run average reward. This method transforms the continuous-state Markov process describing the fluid model into a discrete-state quasi-birth–death process. Hence, standard tools, such as the matrix-geometric approach, become available for the analysis of the fluid buffer. To demonstrate this approach, we analyze the output of a buffer processing fluid from K sources on a first-come first-served basis.

Download Full-text

Bias optimal admission control policies for a multiclass nonstationary queueing system

Journal of Applied Probability ◽

10.1017/s0021900200021483 ◽

2002 ◽

Vol 39 (01) ◽

pp. 20-37 ◽

Cited By ~ 8

Author(s):

Mark E. Lewis ◽

Hayriye Ayhan ◽

Robert D. Foley

Keyword(s):

Optimal Policy ◽

Queueing System ◽

Sufficient Conditions ◽

System Capacity ◽

Average Reward ◽

Finite Capacity ◽

Long Run ◽

Optimal Policies ◽

Service Rates ◽

Long Run Average Reward

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.

Download Full-text

Bias optimal admission control policies for a multiclass nonstationary queueing system

Journal of Applied Probability ◽

10.1239/jap/1019737985 ◽

2002 ◽

Vol 39 (1) ◽

pp. 20-37 ◽

Cited By ~ 6

Author(s):

Mark E. Lewis ◽

Hayriye Ayhan ◽

Robert D. Foley

Keyword(s):

Optimal Policy ◽

Queueing System ◽

Sufficient Conditions ◽

System Capacity ◽

Average Reward ◽

Finite Capacity ◽

Long Run ◽

Optimal Policies ◽

Service Rates ◽

Long Run Average Reward

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.

Download Full-text

BIAS OPTIMALITY IN A QUEUE WITH ADMISSION CONTROL

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964899133047 ◽

1999 ◽

Vol 13 (3) ◽

pp. 309-327 ◽

Cited By ~ 26

Author(s):

Mark E. Lewis ◽

Hayriye Ayhan ◽

Robert D. Foley

Keyword(s):

Admission Control ◽

Queueing System ◽

Average Reward ◽

Objective Functions ◽

Finite Capacity ◽

Long Run ◽

Blackwell Optimality ◽

Optimal Policies ◽

Bias Optimality ◽

Long Run Average Reward

We consider a finite capacity queueing system in which each arriving customer offers a reward. A gatekeeper decides based on the reward offered and the space remaining whether each arriving customer should be accepted or rejected. The gatekeeper only receives the offered reward if the customer is accepted. A traditional objective function is to maximize the gain, that is, the long-run average reward. It is quite possible, however, to have several different gain optimal policies that behave quite differently. Bias and Blackwell optimality are more refined objective functions that can distinguish among multiple stationary, deterministic gain optimal policies. This paper focuses on describing the structure of stationary, deterministic, optimal policies and extending this optimality to distinguish between multiple gain optimal policies. We show that these policies are of trunk reservation form and must occur consecutively. We then prove that we can distinguish among these gain optimal policies using the bias or transient reward and extend to Blackwell optimality.

Download Full-text

long run average reward
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Modest Approach to Markov Automata

TWO-CLASS ROUTING WITH ADMISSION CONTROL AND STRICT PRIORITIES

Value Iteration for Long-Run Average Reward in Markov Decision Processes

A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward

Some memoryless bandit policies

Some memoryless bandit policies

STOCHASTIC DISCRETIZATION FOR THE LONG-RUN AVERAGE REWARD IN FLUID MODELS

Bias optimal admission control policies for a multiclass nonstationary queueing system

Bias optimal admission control policies for a multiclass nonstationary queueing system

BIAS OPTIMALITY IN A QUEUE WITH ADMISSION CONTROL

Export Citation Format

long run average rewardRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Modest Approach to Markov Automata

TWO-CLASS ROUTING WITH ADMISSION CONTROL AND STRICT PRIORITIES

Value Iteration for Long-Run Average Reward in Markov Decision Processes

A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward

Some memoryless bandit policies

Some memoryless bandit policies

STOCHASTIC DISCRETIZATION FOR THE LONG-RUN AVERAGE REWARD IN FLUID MODELS

Bias optimal admission control policies for a multiclass nonstationary queueing system

Bias optimal admission control policies for a multiclass nonstationary queueing system

BIAS OPTIMALITY IN A QUEUE WITH ADMISSION CONTROL

long run average reward
Recently Published Documents