long run average reward
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 1)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Vol 31 (3) ◽  
pp. 1-34
Author(s):  
Yuliya Butkova ◽  
Arnd Hartmanns ◽  
Holger Hermanns

Markov automata are a compositional modelling formalism with continuous stochastic time, discrete probabilities, and nondeterministic choices. In this article, we present extensions to M ODEST , an expressive high-level language with roots in process algebra, that allow large Markov automata models to be specified in a succinct, modular way. We illustrate the advantages of M ODEST over alternative languages. Model checking Markov automata models requires dedicated algorithms for time-bounded and long-run average reward properties. We describe and evaluate the state-of-the-art algorithms implemented in the mcsta model checker of the M ODEST T OOLSET . We find that mcsta improves the performance and scalability of Markov automata model checking compared to earlier and alternative tools. We explain a partial-exploration approach based on the BRTDP method designed to mitigate the state space explosion problem of model checking, and experimentally evaluate its effectiveness. This problem can be avoided entirely by purely simulation-based techniques, but the nondeterminism in Markov automata hinders their straightforward application. We explain how lightweight scheduler sampling can make simulation possible, and provide a detailed evaluation of its usefulness on several benchmarks using the M ODEST T OOLSET ’s modes simulator.


2017 ◽  
Vol 32 (2) ◽  
pp. 163-178 ◽  
Author(s):  
Kenneth C. Chong ◽  
Shane G. Henderson ◽  
Mark E. Lewis

We consider the problem of routing and admission control in a loss system featuring two classes of arriving jobs (high-priority and low-priority jobs) and two types of servers, in which decision-making for high-priority jobs is forced, and rewards influence the desirability of each of the four possible routing decisions. We seek a policy that maximizes expected long-run reward, under both the discounted reward and long-run average reward criteria, and formulate the problem as a Markov decision process. When the reward structure favors high-priority jobs, we demonstrate that there exists an optimal monotone switching curve policy with slope of at least −1. When the reward structure favors low-priority jobs, we demonstrate that the value function, in general, lacks structure, which complicates the search for structure in optimal policies. However, we identify conditions under which optimal policies can be characterized in greater detail. We also examine the performance of heuristic policies in a brief numerical study.


Author(s):  
Pranav Ashok ◽  
Krishnendu Chatterjee ◽  
Przemysław Daca ◽  
Jan Křetínský ◽  
Tobias Meggendorfer

2003 ◽  
Vol 40 (1) ◽  
pp. 250-256 ◽  
Author(s):  
Erol A. Peköz

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.


2003 ◽  
Vol 40 (01) ◽  
pp. 250-256
Author(s):  
Erol A. Peköz

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.


2003 ◽  
Vol 17 (2) ◽  
pp. 251-265 ◽  
Author(s):  
I.J.B.F. Adan ◽  
J.A.C. Resing ◽  
V.G. Kulkarni

Stochastic discretization is a technique of representing a continuous random variable as a random sum of i.i.d. exponential random variables. In this article, we apply this technique to study the limiting behavior of a stochastic fluid model. Specifically, we consider an infinite-capacity fluid buffer, where the net input of fluid is regulated by a finite-state irreducible continuous-time Markov chain. Most long-run performance characteristics for such a fluid system can be expressed as the long-run average reward for a suitably chosen reward structure. In this article, we use stochastic discretization of the fluid content process to efficiently determine the long-run average reward. This method transforms the continuous-state Markov process describing the fluid model into a discrete-state quasi-birth–death process. Hence, standard tools, such as the matrix-geometric approach, become available for the analysis of the fluid buffer. To demonstrate this approach, we analyze the output of a buffer processing fluid from K sources on a first-come first-served basis.


2002 ◽  
Vol 39 (01) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


2002 ◽  
Vol 39 (1) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


1999 ◽  
Vol 13 (3) ◽  
pp. 309-327 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite capacity queueing system in which each arriving customer offers a reward. A gatekeeper decides based on the reward offered and the space remaining whether each arriving customer should be accepted or rejected. The gatekeeper only receives the offered reward if the customer is accepted. A traditional objective function is to maximize the gain, that is, the long-run average reward. It is quite possible, however, to have several different gain optimal policies that behave quite differently. Bias and Blackwell optimality are more refined objective functions that can distinguish among multiple stationary, deterministic gain optimal policies. This paper focuses on describing the structure of stationary, deterministic, optimal policies and extending this optimality to distinguish between multiple gain optimal policies. We show that these policies are of trunk reservation form and must occur consecutively. We then prove that we can distinguish among these gain optimal policies using the bias or transient reward and extend to Blackwell optimality.


Sign in / Sign up

Export Citation Format

Share Document