Randomised allocation of treatments in sequential trials

1980 ◽  
Vol 12 (1) ◽  
pp. 174-182 ◽  
Author(s):  
John Bather

Given a finite number of different experiments with unknown probabilities p1, p2, ···, pk of success, the multi-armed bandit problem is concerned with maximising the expected number of successes in a sequence of trials. There are many policies which ensure that the proportion of successes converges to p = max (p1, p2, ···, pk), in the long run. This property is established for a class of decision procedures which rely on randomisation, at each stage, in selecting the experiment for the next trial. Further, it is suggested that some of these procedures might perform well over any finite sequence of trials.

1980 ◽  
Vol 12 (01) ◽  
pp. 174-182 ◽  
Author(s):  
John Bather

Given a finite number of different experiments with unknown probabilities p 1, p 2, ···, p k of success, the multi-armed bandit problem is concerned with maximising the expected number of successes in a sequence of trials. There are many policies which ensure that the proportion of successes converges to p = max (p 1, p 2, ···, p k ), in the long run. This property is established for a class of decision procedures which rely on randomisation, at each stage, in selecting the experiment for the next trial. Further, it is suggested that some of these procedures might perform well over any finite sequence of trials.


1965 ◽  
Vol 30 (1) ◽  
pp. 49-57 ◽  
Author(s):  
Hilary Putnam

The purpose of this paper is to present two groups of results which have turned out to have a surprisingly close interconnection. The first two results (Theorems 1 and 2) were inspired by the following question: we know what sets are “decidable” — namely, the recursive sets (according to Church's Thesis). But what happens if we modify the notion of a decision procedure by (1) allowing the procedure to “change its mind” any finite number of times (in terms of Turing Machines: we visualize the machine as being given an integer (or an n-tuple of integers) as input. The machine then “prints out” a finite sequence of “yesses” and “nos”. The last “yes” or “no” is always to be the correct answer.); and (2) we give up the requirement that it be possible to tell (effectively) if the computation has terminated? I.e., if the machine has most recently printed “yes”, then we know that the integer put in as input must be in the set unless the machine is going to change its mind; but we have no procedure for telling whether the machine will change its mind or not.The sets for which there exist decision procedures in this widened sense are decidable by “empirical” means — for, if we always “posit” that the most recently generated answer is correct, we will make a finite number of mistakes, but we will eventually get the correct answer. (Note, however, that even if we have gotten to the correct answer (the end of the finite sequence) we are never sure that we have the correct answer.)


1990 ◽  
Vol 4 (4) ◽  
pp. 447-460 ◽  
Author(s):  
Coastas Courcobetis ◽  
Richard Weber

Items of various types arrive at a bin-packing facility according to random processes and are to be combined with other readily available items of different types and packed into bins using one of a number of possible packings. One might think of a manufacturing context in which randomly arriving subassemblies are to be combined with subassemblies from an existing inventory to assemble a variety of finished products. Packing must be done on-line; that is, as each item arrives, it must be allocated to a bin whose configuration of packing is fixed. Moreover, it is required that the packing be managed in such a way that the readily available items are consumed at predescribed rates, corresponding perhaps to optimal rates for manufacturing these items. At any moment, some number of bins will be partially full. In practice, it is important that the packing be managed so that the expected number of partially full bins remains uniformly bounded in time. We present a necessary and sufficient condition for this goal to be realized and describe an algorithm to achieve it.


1988 ◽  
Vol 25 (03) ◽  
pp. 624-629
Author(s):  
Stephen Scheinberg

Consider an ‘experiment' which can be repeated indefinitely often resulting in independent random outcomes. Fix attention on a finite number of possible (sets of) outcomes E 1, E 2, … and define W = W(N 1, N 2, …) to be the expected number of repetitions needed to ensure that E 1 has occurred (at least) N 1 times, E 2 has occurred (at least) N 2 times, etc. This article examines the asymptotic behavior of W as a function of the sum Σ j N j, as the latter grows without bound.


2018 ◽  
Vol 55 (1) ◽  
pp. 318-324
Author(s):  
Maher Nouiehed ◽  
Sheldon M. Ross

Abstract We consider the Bernoulli bandit problem where one of the arms has win probability α and the others β, with the identity of the α arm specified by initial probabilities. With u = max(α, β), v = min(α, β), call an arm with win probability u a good arm. Whereas it is known that the strategy of always playing the arm with the largest probability of being a good arm maximizes the expected number of wins in the first n games for all n, we conjecture that it also stochastically maximizes the number of wins. That is, we conjecture that this strategy maximizes the probability of at least k wins in the first n games for all k, n. The conjecture is proven when k = 1, and k = n, and when there are only two arms and k = n - 1.


2007 ◽  
Vol 39 (04) ◽  
pp. 898-921 ◽  
Author(s):  
Idriss Maoui ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We study a service facility modeled as a queueing system with finite or infinite capacity. Arriving customers enter if there is room in the facility and if they are willing to pay the price posted by the service provider. Customers belong to one of a finite number of classes that have different willingnesses-to-pay. Moreover, there is a penalty for congestion in the facility in the form of state-dependent holding costs. The service provider may advertise class-specific prices that may fluctuate over time. We show the existence of a unique optimal stationary pricing policy in a continuous and unbounded action space that maximizes the long-run average profit per unit time. We determine an expression for this policy under certain conditions. We also analyze the structure and the properties of this policy.


1990 ◽  
Vol 27 (2) ◽  
pp. 351-364 ◽  
Author(s):  
Rhonda Righter

In the classical sequential assignment problem as introduced by Derman et al. (1972) there are n workers who are to be assigned a finite number of sequentially arriving jobs. If a worker of value p is assigned a job of value x the return is px, where we interpret the return as the probability that the given worker correctly completes the given job. The job value is a random value that is observed upon arrival, and jobs must be assigned or rejected when they arrive. Each worker can only do one job. Derman et al. showed that when the objective is to maximize the expected return, i.e., the expected number of correctly completed jobs, the optimal policy is a simple threshold policy, which does not depend on the worker values. Their result was extended by Albright (1974) to allow job arrivals according to a Poisson process and a single random deadline for job completion (which is equivalent to discounting). Righter (1987) further extended the result to permit workers to have independent random deadlines for job completions. Here we show that when there are independent deadlines a simple threshold policy that is independent of the worker values stochastically maximizes the number of correctly completed jobs, and therefore maximizes the expected number of correctly completed jobs. We also show that there is no policy that stochastically maximizes the number of correctly completed jobs when there is a single deadline. However, when there is single deadline and the objective is to maximize the probability that n jobs are done correctly by n workers, then the optimal policy is determined by a single threshold that is independent of n and of the worker values.


1988 ◽  
Vol 25 (3) ◽  
pp. 624-629
Author(s):  
Stephen Scheinberg

Consider an ‘experiment' which can be repeated indefinitely often resulting in independent random outcomes. Fix attention on a finite number of possible (sets of) outcomes E1, E2, … and define W = W(N1, N2, …) to be the expected number of repetitions needed to ensure that E1 has occurred (at least) N1 times, E2 has occurred (at least) N2 times, etc. This article examines the asymptotic behavior of W as a function of the sum ΣjNj, as the latter grows without bound.


Author(s):  
Thomas Godland ◽  
Zakhar Kabluchko

AbstractWe consider the simplices $$\begin{aligned} K_n^A=\{x\in {\mathbb {R}}^{n+1}:x_1\ge x_2\ge \cdots \ge x_{n+1},x_1-x_{n+1}\le 1,\,x_1+\cdots +x_{n+1}=0\} \end{aligned}$$ K n A = { x ∈ R n + 1 : x 1 ≥ x 2 ≥ ⋯ ≥ x n + 1 , x 1 - x n + 1 ≤ 1 , x 1 + ⋯ + x n + 1 = 0 } and $$\begin{aligned} K_n^B=\{x\in {\mathbb {R}}^n:1\ge x_1\ge x_2\ge \cdots \ge x_n\ge 0\}, \end{aligned}$$ K n B = { x ∈ R n : 1 ≥ x 1 ≥ x 2 ≥ ⋯ ≥ x n ≥ 0 } , which are called the Schläfli orthoschemes of types A and B, respectively. We describe the tangent cones at their j-faces and compute explicitly the sums of the conic intrinsic volumes of these tangent cones at all j-faces of $$K_n^A$$ K n A and $$K_n^B$$ K n B . This setting contains sums of external and internal angles of $$K_n^A$$ K n A and $$K_n^B$$ K n B as special cases. The sums are evaluated in terms of Stirling numbers of both kinds. We generalize these results to finite products of Schläfli orthoschemes of type A and B and, as a probabilistic consequence, derive formulas for the expected number of j-faces of the Minkowski sums of the convex hulls of a finite number of Gaussian random walks and random bridges. Furthermore, we evaluate the analogous angle sums for the tangent cones of Weyl chambers of types A and B and finite products thereof.


Author(s):  
Michael Scanlan

Emil Post was a pioneer in the theory of computation, which investigates the solution of problems by algorithmic methods. An algorithmic method is a finite set of precisely defined elementary directions for solving a problem in a finite number of steps. More specifically, Post was interested in the existence of algorithmic decision procedures that eventually give a yes or no answer to a problem. For instance, in his dissertation, Post introduced the truth-table method for deciding whether or not a formula of propositional logic is a tautology. Post developed a notion of ‘canonical systems’ which was intended to encompass any algorithmic procedure for symbol manipulation. Using this notion, Post partially anticipated, in unpublished work, the results of Gödel, Church and Turing in the 1930s. This showed that many problems in logic and mathematics are algorithmically unsolvable. Post’s ideas influenced later research in logic, computer theory, formal language theory and other areas.


Sign in / Sign up

Export Citation Format

Share Document