scholarly journals An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information

2014 ◽  
Vol 51 ◽  
pp. 829-866 ◽  
Author(s):  
B. Bosansky ◽  
C. Kiekintveld ◽  
V. Lisy ◽  
M. Pechoucek

Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.

Author(s):  
Trevor Davis ◽  
Kevin Waugh ◽  
Michael Bowling

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.


Author(s):  
Christian Kroer ◽  
Gabriele Farina ◽  
Tuomas Sandholm

Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.


Author(s):  
Jiri Cermak ◽  
Branislav Bošanský ◽  
Viliam Lisý

We solve large two-player zero-sum extensive-form games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.


Author(s):  
Chun Kai Ling ◽  
Fei Fang ◽  
J. Zico Kolter

With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.


Games ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 2
Author(s):  
Valeria Zahoransky ◽  
Julian Gutierrez ◽  
Paul Harrenstein ◽  
Michael Wooldridge

We introduce a non-cooperative game model in which players’ decision nodes are partially ordered by a dependence relation, which directly captures informational dependencies in the game. In saying that a decision node v is dependent on decision nodes v1,…,vk, we mean that the information available to a strategy making a choice at v is precisely the choices that were made at v1,…,vk. Although partial order games are no more expressive than extensive form games of imperfect information (we show that any partial order game can be reduced to a strategically equivalent extensive form game of imperfect information, though possibly at the cost of an exponential blowup in the size of the game), they provide a more natural and compact representation for many strategic settings of interest. After introducing the game model, we investigate the relationship to extensive form games of imperfect information, the problem of computing Nash equilibria, and conditions that enable backwards induction in this new model.


Author(s):  
Andrea Celli ◽  
Alberto Marchesi ◽  
Gabriele Farina ◽  
Nicola Gatti

The existence of uncoupled no-regret learning dynamics converging to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and the presence of private information, correlation in extensive-form games possesses significantly different properties than in normal-form games. The extensive-form correlated equilibrium (EFCE) is the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device ({\em a.k.a.} mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to this additional complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics which provably converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is guaranteed to be a O(T^-1/2)-approximate EFCE with high probability, and an EFCE almost surely in the limit.


2020 ◽  
Vol 34 (02) ◽  
pp. 2318-2325
Author(s):  
Youzhi Zhang ◽  
Bo An

The study of finding the equilibrium for multiplayer games is challenging. This paper focuses on computing Team-Maxmin Equilibria (TMEs) in zero-sum multiplayer Extensive-Form Games (EFGs), which describes the optimal strategies for a team of players who share the same goal but they take actions independently against an adversary. TMEs can capture many realistic scenarios, including: 1) a team of players play against a target player in poker games; and 2) defense resources schedule and patrol independently in security games. However, the study of efficiently finding TMEs within any given accuracy in EFGs is almost completely unexplored. To fill this gap, we first study the inefficiency caused by computing the equilibrium where team players correlate their strategies and then transforming it into the mixed strategy profile of the team and show that this inefficiency can be arbitrarily large. Second, to efficiently solve the non-convex program for finding TMEs directly, we develop the Associated Recursive Asynchronous Multiparametric Disaggregation Technique (ARAMDT) to approximate multilinear terms in the program with two novel techniques: 1) an asynchronous precision method to reduce the number of constraints and variables for approximation by using different precision levels to approximate these terms; and 2) an associated constraint method to reduce the feasible solution space of the mixed-integer linear program resulting from ARAMDT by exploiting the relation between these terms. Third, we develop a novel iterative algorithm to efficiently compute TMEs within any given accuracy based on ARAMDT. Our algorithm is orders of magnitude faster than baselines in the experimental evaluation.


Sign in / Sign up

Export Citation Format

Share Document