Behavioral Bandits: Analyzing the Exploration Versus Exploitation Trade-Off in the Lab

Learning in Combinatorial Optimization: What and How to Explore

Operations Research ◽

10.1287/opre.2019.1926 ◽

2020 ◽

Vol 68 (5) ◽

pp. 1585-1604

Author(s):

Sajad Modaresi ◽

Denis Sauré ◽

Juan Pablo Vielma

Keyword(s):

Combinatorial Optimization ◽

Optimization Problem ◽

Superior Performance ◽

Combinatorial Structure ◽

Asymptotic Performance ◽

Trade Off ◽

Fundamental Limit ◽

Cover Problem ◽

Combine Information ◽

Exploration Versus Exploitation

When moving from the traditional to combinatorial multiarmed bandit setting, addressing the classical exploration versus exploitation trade-off is a challenging task. In “Learning in Combinatorial Optimization: What and How to Explore,” Modaresi, Sauré, and Vielma show that the combinatorial setting has salient features that distinguish it from the traditional bandit. In particular, combinatorial structure induces correlation between cost of different solutions, thus raising the questions of what parameters to estimate and how to collect and combine information. The authors answer such questions by developing a novel optimization problem called the lower-bound problem (LBP). They establish a fundamental limit on asymptotic performance of any admissible policy and propose near-optimal LBP-based policies. Because LBP is likely intractable in practice, they propose policies that instead solve a proxy for LBP, which they call the optimality cover problem (OCP). They provide strong evidence of practical tractability of OCP and illustrate the markedly superior performance of OCP-based policies numerically.

Download Full-text

Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms

Proceedings of the International Conference on Agents and Artificial Intelligence ◽

10.5220/0005195500660077 ◽

2015 ◽

Cited By ~ 2

Author(s):

Madalina Drugan ◽

Bernard Manderick

Keyword(s):

Infinite Horizon ◽

Trade Off ◽

Exploration Versus Exploitation

Download Full-text

A Flow-Preserving Algorithm for the Time-Cost Trade-Off Problem

IIE Transactions ◽

10.1080/05695558208974589 ◽

1982 ◽

Vol 14 (2) ◽

pp. 109-113 ◽

Cited By ~ 4

Author(s):

Suleyman Tufekci

Keyword(s):

Time Cost ◽

Trade Off

Download Full-text

Dual Goals for Speed and Accuracy on the Same Performance Task

Journal of Personnel Psychology ◽

10.1027/1866-5888/a000063 ◽

2012 ◽

Vol 11 (3) ◽

pp. 118-126 ◽

Cited By ~ 3

Author(s):

Olive Emil Wetter ◽

Jürgen Wegge ◽

Klaus Jonas ◽

Klaus-Helmut Schmidt

Keyword(s):

Memory Scanning ◽

Performance Task ◽

Performance Tasks ◽

Trade Off ◽

Test Experiment ◽

Trade Offs ◽

New Finding ◽

Sternberg Paradigm ◽

Speed Accuracy ◽

Speed And Accuracy

In most work contexts, several performance goals coexist, and conflicts between them and trade-offs can occur. Our paper is the first to contrast a dual goal for speed and accuracy with a single goal for speed on the same task. The Sternberg paradigm (Experiment 1, n = 57) and the d2 test (Experiment 2, n = 19) were used as performance tasks. Speed measures and errors revealed in both experiments that dual as well as single goals increase performance by enhancing memory scanning. However, the single speed goal triggered a speed-accuracy trade-off, favoring speed over accuracy, whereas this was not the case with the dual goal. In difficult trials, dual goals slowed down scanning processes again so that errors could be prevented. This new finding is particularly relevant for security domains, where both aspects have to be managed simultaneously.

Download Full-text