Behavioral Bandits: Analyzing the Exploration Versus Exploitation Trade-Off in the Lab

2019 ◽  
Author(s):  
Stanton Hudja ◽  
Daniel Woods
2020 ◽  
Vol 68 (5) ◽  
pp. 1585-1604
Author(s):  
Sajad Modaresi ◽  
Denis Sauré ◽  
Juan Pablo Vielma

When moving from the traditional to combinatorial multiarmed bandit setting, addressing the classical exploration versus exploitation trade-off is a challenging task. In “Learning in Combinatorial Optimization: What and How to Explore,” Modaresi, Sauré, and Vielma show that the combinatorial setting has salient features that distinguish it from the traditional bandit. In particular, combinatorial structure induces correlation between cost of different solutions, thus raising the questions of what parameters to estimate and how to collect and combine information. The authors answer such questions by developing a novel optimization problem called the lower-bound problem (LBP). They establish a fundamental limit on asymptotic performance of any admissible policy and propose near-optimal LBP-based policies. Because LBP is likely intractable in practice, they propose policies that instead solve a proxy for LBP, which they call the optimality cover problem (OCP). They provide strong evidence of practical tractability of OCP and illustrate the markedly superior performance of OCP-based policies numerically.


1982 ◽  
Vol 14 (2) ◽  
pp. 109-113 ◽  
Author(s):  
Suleyman Tufekci
Keyword(s):  

2012 ◽  
Vol 11 (3) ◽  
pp. 118-126 ◽  
Author(s):  
Olive Emil Wetter ◽  
Jürgen Wegge ◽  
Klaus Jonas ◽  
Klaus-Helmut Schmidt

In most work contexts, several performance goals coexist, and conflicts between them and trade-offs can occur. Our paper is the first to contrast a dual goal for speed and accuracy with a single goal for speed on the same task. The Sternberg paradigm (Experiment 1, n = 57) and the d2 test (Experiment 2, n = 19) were used as performance tasks. Speed measures and errors revealed in both experiments that dual as well as single goals increase performance by enhancing memory scanning. However, the single speed goal triggered a speed-accuracy trade-off, favoring speed over accuracy, whereas this was not the case with the dual goal. In difficult trials, dual goals slowed down scanning processes again so that errors could be prevented. This new finding is particularly relevant for security domains, where both aspects have to be managed simultaneously.


Sign in / Sign up

Export Citation Format

Share Document