scholarly journals Finding structure in multi-armed bandits

2018 ◽  
Author(s):  
Eric Schulz ◽  
Nicholas T. Franklin ◽  
Samuel J. Gershman

AbstractHow do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, which require participants to trade off exploration and exploitation. Standard multi-armed bandits assume that each option has an independent reward distribution. However, learning about options independently is unrealistic, since in the real world options often share an underlying structure. We study a class of structured bandit tasks, which we use to probe how generalization guides exploration. In a structured multi-armed bandit, options have a correlation structure dictated by a latent function. We focus on bandits in which rewards are linear functions of an option’s spatial position. Across 5 experiments, we find evidence that participants utilize functional structure to guide their exploration, and also exhibit a learning-to-learn effect across rounds, becoming progressively faster at identifying the latent function. Our experiments rule out several heuristic explanations and show that the same findings obtain with non-linear functions. Comparing several models of learning and decision making, we find that the best model of human behavior in our tasks combines three computational mechanisms: (1) function learning, (2) clustering of reward distributions across rounds, and (3) uncertainty-guided exploration. Our results suggest that human reinforcement learning can utilize latent structure in sophisticated ways to improve efficiency.

2017 ◽  
Author(s):  
Eric Schulz ◽  
Charley M. Wu ◽  
Quentin J. M. Huys ◽  
Andreas Krause ◽  
Maarten Speekenbrink

AbstractHow do people pursue rewards in risky environments, where some outcomes should be avoided at all costs? We investigate how participant search for spatially correlated rewards in scenarios where one must avoid sampling rewards below a given threshold. This requires not only the balancing of exploration and exploitation, but also reasoning about how to avoid potentially risky areas of the search space. Within risky versions of the spatially correlated multi-armed bandit task, we show that participants’ behavior is aligned well with a Gaussian process function learning algorithm, which chooses points based on a safe optimization routine. Moreover, using leave-one-block-out cross-validation, we find that participants adapt their sampling behavior to the riskiness of the task, although the underlying function learning mechanism remains relatively unchanged. These results show that participants can adapt their search behavior to the adversity of the environment and enrich our understanding of adaptive behavior in the face of risk and uncertainty.


Biostatistics ◽  
2021 ◽  
Author(s):  
Theresa A Alexander ◽  
Rafael A Irizarry ◽  
Héctor Corrada Bravo

Summary High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.


2020 ◽  
Author(s):  
Robert C Wilson ◽  
Elizabeth Bonawitz ◽  
Vincent Costa ◽  
Becket Ebitz

Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information (`directed exploration') and the randomization of choice (`random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.


Author(s):  
Bicky Shakya ◽  
Xiaolin Xu ◽  
Mark Tehranipoor ◽  
Domenic Forte

Logic locking has recently been proposed as a solution for protecting gatelevel semiconductor intellectual property (IP). However, numerous attacks have been mounted on this technique, which either compromise the locking key or restore the original circuit functionality. SAT attacks leverage golden IC information to rule out all incorrect key classes, while bypass and removal attacks exploit the limited output corruptibility and/or structural traces of SAT-resistant locking schemes. In this paper, we propose a new lightweight locking technique: CAS-Lock (cascaded locking) which nullifies both SAT and bypass attacks, while simultaneously maintaining nontrivial output corruptibility. This property of CAS-Lock is in stark contrast to the well-accepted notion that there is an inherent trade-off between output corruptibility and SAT resistance. We theoretically and experimentally validate the SAT resistance of CAS-Lock, and show that it reduces the attack to brute-force, regardless of its construction. Further, we evaluate its resistance to recently proposed approximate SAT attacks (i.e., AppSAT). We also propose a modified version of CAS-Lock (mirrored CAS-Lock or M-CAS) to protect against removal attacks. M-CAS allows a trade-off evaluation between removal attack and SAT attack resiliency, while incurring minimal area overhead. We also show how M-CAS parameters such as the implemented Boolean function and selected key can be tuned by the designer so that a desired level of protection against all known attacks can be achieved.


Author(s):  
Hua Zhang ◽  
Youmin Xi

In previous studies on coordinating exploration-exploitation activities, much attention has been paid on network structures while the roles played by actors’ strategic behavior have been largely ignored. In this paper, the authors extend March’s simulation model on parallel problem solving by adding structurally equivalent imitation. In this way, one can examine how the interaction of network structure with agent behavior affects the knowledge process and finally influence group performance. This simulation experiment suggests that under the condition of regular network, the classical trade-off between exploration and exploitation will appear in the case of the preferentially attached network when agents adopt structure equivalence imitation. The whole organization implicitly would be divided into independent sub-groups that converge on different performance level and lead the organization to a lower performance level. The authors also explored the performance in the mixed organization and the management implication.


Sign in / Sign up

Export Citation Format

Share Document