scholarly journals Joint Venture Breakup and the Exploration-Exploitation Trade-off

2009 ◽  
Author(s):  
Antoine Soubeyran ◽  
Raphael Soubeyran ◽  
Ngo Van Long
Author(s):  
Julian Berk ◽  
Sunil Gupta ◽  
Santu Rana ◽  
Svetha Venkatesh

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.


2013 ◽  
Vol 10 (85) ◽  
pp. 20130352 ◽  
Author(s):  
Dimitri Volchenkov ◽  
Jonathan Helbach ◽  
Marko Tscherepanow ◽  
Sina Kühnel

Searching experiments conducted in different virtual environments over a gender-balanced group of people revealed a gender irrelevant scale-free spread of searching activity on large spatio-temporal scales. We have suggested and solved analytically a simple statistical model of the coherent-noise type describing the exploration–exploitation trade-off in humans (‘should I stay’ or ‘should I go’). The model exhibits a variety of saltatory behaviours, ranging from Lévy flights occurring under uncertainty to Brownian walks performed by a treasure hunter confident of the eventual success.


2021 ◽  
Author(s):  
Ketika Garg ◽  
Christopher T. Kello ◽  
Paul E Smaldino

Search requires balancing exploring for more options and exploiting the ones previously found. Individuals foraging in a group face another trade-off: whether to engage in social learning to exploit the solutions found by others or to solitarily search for unexplored solutions. Social learning can decrease the costs of finding new resources, but excessive social learning can decrease the exploration for new solutions. We study how these two trade-offs interact to influence search efficiency in a model of collective foraging under conditions of varying resource abundance, resource density, and group size. We modeled individual search strategies as Lévy walks, where a power-law exponent (μ) controlled the trade-off between exploitative and explorative movements in individual search. We modulated the trade-off between individual search and social learning using a selectivity parameter that determined how agents responded to social cues in terms of distance and likely opportunity costs. Our results show that social learning is favored in rich and clustered environments, but also that the benefits of exploiting social information are maximized by engaging in high levels of individual exploration. We show that selective use of social information can modulate the disadvantages of excessive social learning, especially in larger groups and with limited individual exploration. Finally, we found that the optimal combination of individual exploration and social learning gave rise to trajectories with μ ≈ 2 and provide support for the general optimality such patterns in search. Our work sheds light on the interplay between individual search and social learning, and has broader implications for collective search and problem-solving.


2020 ◽  
Author(s):  
M Dubois ◽  
A Bowler ◽  
ME Moses-Payne ◽  
J Habicht ◽  
N Steinbeis ◽  
...  

AbstractDuring childhood and adolescence, exploring the unknown is important to build a better model of the world. This means that youths have to regularly solve the exploration-exploitation trade-off, a dilemma in which adults are known to deploy a mixture of computationally light and heavy exploration strategies. In this developmental study, we investigated how youths (aged 8 to 17) performed an exploration task that allows us to dissociate these different exploration strategies. Using computational modelling, we demonstrate that tabula-rasa exploration, a computationally light exploration heuristic, is used to a higher degree in children and younger adolescents compared to older adolescents. Additionally, we show that this tabula-rasa exploration is more extensively used by youths with high attention-deficit/hyperactivity disorder (ADHD) traits. In the light of ongoing brain development, our findings show that children and younger adolescents use computationally less burdensome strategies, but that an excessive use thereof might be a risk for mental health conditions.


2021 ◽  
Vol 39 (4) ◽  
pp. 1-29
Author(s):  
Shijun Li ◽  
Wenqiang Lei ◽  
Qingyun Wu ◽  
Xiangnan He ◽  
Peng Jiang ◽  
...  

Static recommendation methods like collaborative filtering suffer from the inherent limitation of performing real-time personalization for cold-start users. Online recommendation, e.g., multi-armed bandit approach, addresses this limitation by interactively exploring user preference online and pursuing the exploration-exploitation (EE) trade-off. However, existing bandit-based methods model recommendation actions homogeneously. Specifically, they only consider the items as the arms, being incapable of handling the item attributes , which naturally provide interpretable information of user’s current demands and can effectively filter out undesired items. In this work, we consider the conversational recommendation for cold-start users, where a system can both ask the attributes from and recommend items to a user interactively. This important scenario was studied in a recent work  [54]. However, it employs a hand-crafted function to decide when to ask attributes or make recommendations. Such separate modeling of attributes and items makes the effectiveness of the system highly rely on the choice of the hand-crafted function, thus introducing fragility to the system. To address this limitation, we seamlessly unify attributes and items in the same arm space and achieve their EE trade-offs automatically using the framework of Thompson Sampling. Our Conversational Thompson Sampling (ConTS) model holistically solves all questions in conversational recommendation by choosing the arm with the maximal reward to play. Extensive experiments on three benchmark datasets show that ConTS outperforms the state-of-the-art methods Conversational UCB (ConUCB) [54] and Estimation—Action—Reflection model [27] in both metrics of success rate and average number of conversation turns.


2017 ◽  
Author(s):  
George Velentzas ◽  
Costas Tzafestas ◽  
Mehdi Khamassi

AbstractFast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). The problem of finding an efficient exploration-exploitation trade-off has been well studied both in the Machine Learning and Computational Neuroscience fields. Rather than using a fixed proportion, non-stationary multi-armed bandit methods in the former have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimal - bounded regret. In parallel, researches in the latter have investigated solutions such as progressively increasing exploitation in response to improvements of performance, transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when performing these actions. In this work, we first try to bridge some of these different methods from the two research fields by rewriting their decision process with a common formalism. We then show numerical simulations of a hybrid algorithm combining bio-inspired meta-learning, kalman filter and exploration bonuses compared to several state-of-the-art alternatives on a set of non-stationary stochastic multi-armed bandit tasks. While we find that different methods are appropriate in different scenarios, the hybrid algorithm displays a good combination of advantages from different methods and outperforms these methods in the studied scenarios.


Author(s):  
Hua Zhang ◽  
Youmin Xi

In previous studies on coordinating exploration-exploitation activities, much attention has been paid on network structures while the roles played by actors’ strategic behavior have been largely ignored. In this paper, the authors extend March’s simulation model on parallel problem solving by adding structurally equivalent imitation. In this way, one can examine how the interaction of network structure with agent behavior affects the knowledge process and finally influence group performance. This simulation experiment suggests that under the condition of regular network, the classical trade-off between exploration and exploitation will appear in the case of the preferentially attached network when agents adopt structure equivalence imitation. The whole organization implicitly would be divided into independent sub-groups that converge on different performance level and lead the organization to a lower performance level. The authors also explored the performance in the mixed organization and the management implication.


Sign in / Sign up

Export Citation Format

Share Document