Integral reinforcement learning for zero-sum two-player games

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

Reinforcement learning and non-zero-sum game output regulation for multi-player linear uncertain systems

Automatica ◽

10.1016/j.automatica.2019.108672 ◽

2020 ◽

Vol 112 ◽

pp. 108672 ◽

Cited By ~ 4

Author(s):

Adedapo Odekunle ◽

Weinan Gao ◽

Masoud Davari ◽

Zhong-Ping Jiang

Keyword(s):

Reinforcement Learning ◽

Uncertain Systems ◽

Output Regulation ◽

Zero Sum

Download Full-text

Switching Competitors Reduces Win-Stay but Not Lose-Shift Behaviour: The Role of Outcome-Action Association Strength on Reinforcement Learning

Games ◽

10.3390/g11030025 ◽

2020 ◽

Vol 11 (3) ◽

pp. 25

Author(s):

Vincent Srihaput ◽

Kaylee Craplewe ◽

Benjamin James Dyson

Keyword(s):

Reinforcement Learning ◽

Internal Control ◽

Strong Association ◽

Poor Quality ◽

Group Data ◽

Learning Principles ◽

Zero Sum ◽

Quality Decision ◽

Number Of Individuals

Predictability is a hallmark of poor-quality decision-making during competition. One source of predictability is the strong association between current outcome and future action, as dictated by the reinforcement learning principles of win–stay and lose–shift. We tested the idea that predictability could be reduced during competition by weakening the associations between outcome and action. To do this, participants completed a competitive zero-sum game in which the opponent from the current trial was either replayed (opponent repeat) thereby strengthening the association, or, replaced (opponent change) by a different competitor thereby weakening the association. We observed that win–stay behavior was reduced during opponent change trials but lose–shiftbehavior remained reliably predictable. Consistent with the group data, the number of individuals who exhibited predictable behavior following wins decreased for opponent change relative to opponent repeat trials. Our data show that future actions are more under internal control following positive relative to negative outcomes, and that externally breaking the bonds between outcome and action via opponent association also allows us to become less prone to exploitation.

Download Full-text

Reinforcement Learning Algorithms for Uncertain, Dynamic, Zero-Sum Games

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2018.00015 ◽

2018 ◽

Cited By ~ 3

Author(s):

Snehasis Mukhopadhyay ◽

Omkar Tilak ◽

Subir Chakrabarti

Keyword(s):

Reinforcement Learning ◽

Learning Algorithms ◽

Zero Sum Games ◽

Zero Sum

Download Full-text

Anti-Attack Scheme for Edge Devices Based on Deep Reinforcement Learning

Wireless Communications and Mobile Computing ◽

10.1155/2021/6619715 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Rui Zhang ◽

Hui Xia ◽

Chao Liu ◽

Ruo-bing Jiang ◽

Xiang-guo Cheng

Keyword(s):

Reinforcement Learning ◽

Internet Of Things ◽

Real Time ◽

Optimal Strategy ◽

Experimental Results ◽

The Mean ◽

Current Sample ◽

Traditional Industry ◽

Zero Sum

Internet of Things realizes the leap from traditional industry to intelligent industry. However, it makes edge devices more vulnerable to attackers during processing perceptual data in real time. To solve the above problem, we use the zero-sum game to build the interactions between attackers and edge devices and propose an antiattack scheme based on deep reinforcement learning. Firstly, we make the k NN-DTW algorithm to find a sample that is similar to the current sample and use the weighted moving mean method to calculate the mean and the variance of the samples. Secondly, to solve the overestimation problem, we develop an optimal strategy algorithm to find the optimal strategy of the edge devices. Experimental results prove that the new scheme improves the payoff of attacked edge devices and decreases the payoff of attackers, thus forcing the attackers to give up the attack.

Download Full-text

Switching competitors reduces win-stay but not lose-shift behaviour: The role of outcome-action association strength on reinforcement learning

10.31234/osf.io/82y5e ◽

2020 ◽

Author(s):

Vincent Srihaput ◽

Kaylee Craplewe ◽

Ben Dyson

Keyword(s):

Reinforcement Learning ◽

Internal Control ◽

Strong Association ◽

Poor Quality ◽

Group Data ◽

Learning Principles ◽

Zero Sum ◽

Quality Decision ◽

Number Of Individuals

Predictability is a hallmark of poor-quality decision-making during competition. One source of predictability is the strong association between current outcome and future action, as dictated by the reinforcement learning principles of win-stay and lose-shift. We tested the idea that predictability could be reduced during competition by weakening the associations between outcome and action. To do this, participants completed a competitive zero-sum game in which the opponent from the current trial was either replayed (opponent repeat) thereby strengthening the association, or, replaced (opponent change) by a different competitor thereby weakening the association. We observed that win-stay behaviour was reduced during opponent change trials but lose-shift behaviour remained reliably predictable. Consistent with the group data, the number of individuals who exhibited predictable behaviour following wins decreased for opponent change relative to opponent repeat trials. Our data show that future actions are more under internal control following positive relative to negative outcomes, and that externally breaking the bonds between outcome and action via opponent association also allow us to become less prone to exploitation.

Download Full-text

Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning

Optimal Control Applications and Methods ◽

10.1002/oca.2597 ◽

2020 ◽

Vol 41 (4) ◽

pp. 1233-1250

Author(s):

Yinlei Wen ◽

Huaguang Zhang ◽

Hanguang Su ◽

He Ren

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Tracking Control ◽

Optimal Tracking ◽

Zero Sum Games ◽

Optimal Tracking Control ◽

Discrete Time Systems ◽

Zero Sum ◽

Time Systems

Download Full-text

Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems

Neurocomputing ◽

10.1016/j.neucom.2019.09.088 ◽

2020 ◽

Vol 377 ◽

pp. 243-255 ◽

Cited By ~ 3

Author(s):

Hanguang Su ◽

Huaguang Zhang ◽

Shaoxin Sun ◽

Yuliang Cai

Keyword(s):

Reinforcement Learning ◽

Nonlinear Systems ◽

Zero Sum Games ◽

Zero Sum ◽

Event Triggered

Download Full-text

, a simple reinforcement learning scheme for two-player zero-sum Markov games

Neurocomputing ◽

10.1016/j.neucom.2008.12.022 ◽

2009 ◽

Vol 72 (7-9) ◽

pp. 1494-1507

Author(s):

Benoît Frénay ◽

Marco Saerens

Keyword(s):

Reinforcement Learning ◽

Markov Games ◽

Learning Scheme ◽

Zero Sum

Download Full-text

Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics

IEEE Transactions on Automation Science and Engineering ◽

10.1109/tase.2014.2300532 ◽

2014 ◽

Vol 11 (3) ◽

pp. 706-714 ◽

Cited By ~ 92

Author(s):

Hongliang Li ◽

Derong Liu ◽

Ding Wang

Keyword(s):

Reinforcement Learning ◽

Continuous Time ◽

Zero Sum Games ◽

Time Zero ◽

Zero Sum

Download Full-text