A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics

Journal of Artificial Intelligence Research ◽

10.1613/jair.2628 ◽

2008 ◽

Vol 33 ◽

pp. 521-549 ◽

Cited By ~ 33

Author(s):

S. Abdallah ◽

V. Lesser

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Nash Equilibria ◽

Learning Algorithm ◽

A Priori ◽

Linear Dynamics ◽

Learning Agents ◽

Multiagent Reinforcement Learning ◽

Non Linear ◽

Linear Nature

Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents' decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called the Weighted Policy Learner (WPL), which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL, the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore, WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapley's game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore, we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge, but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPL's convergence is difficult, because of the non-linear nature of WPL's dynamics, unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead, we numerically solve WPL's dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms.

Download Full-text

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/317 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peng Zhang ◽

Jianye Hao ◽

Weixun Wang ◽

Hongyao Tang ◽

Yi Ma ◽

...

Keyword(s):

Reinforcement Learning ◽

Prior Knowledge ◽

Learning Process ◽

Learning Algorithm ◽

Fuzzy Rule ◽

Policy Network ◽

Human Knowledge ◽

Learning Agents ◽

The Common ◽

Low Performance

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

Download Full-text

Learning to Teach Reinforcement Learning Agents

Machine Learning and Knowledge Extraction ◽

10.3390/make1010002 ◽

2017 ◽

Vol 1 (1) ◽

pp. 21-42 ◽

Cited By ~ 9

Author(s):

Anestis Fachantidis ◽

Matthew Taylor ◽

Ioannis Vlahavas

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Relevant Literature ◽

Learning Approaches ◽

Training Time ◽

Factors Affecting ◽

Learning Agents ◽

Best Teachers ◽

Previous Learning ◽

Advice Quality

In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.

Download Full-text

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

IEEE Access ◽

10.1109/access.2020.3043806 ◽

2020 ◽

Vol 8 ◽

pp. 223743-223755

Author(s):

Yuannan Jiang ◽

Fuxiao Tan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Cooperative Game ◽

Learning Algorithm ◽

Model Free ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2016.2544866 ◽

2017 ◽

Vol 47 (6) ◽

pp. 1367-1379 ◽

Cited By ~ 27

Author(s):

Zhen Zhang ◽

Dongbin Zhao ◽

Junwei Gao ◽

Dongqing Wang ◽

Yujie Dai

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Cooperative Tasks ◽

Multiagent Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

BUILDING AN ARTIFICIAL STOCK MARKET POPULATED BY REINFORCEMENT‐LEARNING AGENTS

Journal of Business Economics and Management ◽

10.3846/1611-1699.2009.10.329-341 ◽

2009 ◽

Vol 10 (4) ◽

pp. 329-341 ◽

Cited By ~ 10

Author(s):

Aleksandras Vytautas Rutkauskas ◽

Tomas Ramanauskas

Keyword(s):

Reinforcement Learning ◽

Stock Market ◽

Learning Algorithm ◽

Self Regulation ◽

Market Model ◽

Emergent Properties ◽

Q Learning ◽

Evolutionary Selection ◽

Learning Agents ◽

Artificial Stock Market

In this paper we propose an artificial stock market model based on interaction of heterogeneous agents whose forward-looking behaviour is driven by the reinforcement-learning algorithm combined with some evolutionary selection mechanism. We use the model for the analysis of market self-regulation abilities, market efficiency and determinants of emergent properties of the financial market. Distinctive and novel features of the model include strong emphasis on the economic content of individual decision-making, application of the Q-learning algorithm for driving individual behaviour, and rich market setup. Along with that a parallel version of the model is presented, which is mainly based on research of current changes in the market, as well as on search of newly emerged consistent patterns, and which has been repeatedly used for optimal decisions’ search experiments in various capital markets.

Download Full-text

Research on Tensor-Based Cooperative and Competitive in Multi-Agent Reinforcement Learning

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.6.262 ◽

2020 ◽

Vol 4 (6) ◽

Author(s):

Tsega Weldu Araya ◽

Md Rashed Ibn Nawab ◽

A. P. Yuan Ling

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Data Representation ◽

Training Data ◽

Two Dimensional ◽

Multiple Agents ◽

Learning Agents ◽

Dimensional Array ◽

Multi Agent ◽

Agent Cooperation

As technology overgrows, the assortment of information and the density of work becomes demanding to manage. To resolve the density of employment and human labor, machine-learning (ML) technology developed. Reinforcement learning (RL) is the recent advancement of ML studies. Multi-agent reinforcement learning (MARL) is useful to train multiple agents in the surrounding environment. The previous research studies focused on two-agent cooperation. Their data representation was held in a two-dimensional array, which is called a matrix. The limitation of this two-dimensional array appears as the training data of agents increases. The growth in the training data of agents creates storage drawbacks and data redundancy. Our first aim in this research is to improve an algorithm that can represent MARL training in tensor. In MARL, multiple agents are work together to achieve joint work. To share the training records and data of numerous agents, we need to collect the previous cumulative experience of agents in tensor. Secondly, we will discover the agent's cooperation and competition, with local and global goals of agents in MARL. Local goals are the cooperation of agents in a group or team where we use the training model as a student and teacher agent. The global goal is the competition between two contrary teams to acquire the reward. All learning agents have their Q table for storing the individual agent's training data in an environment. The growth in the number of learning agents, their training experience in Q tables, and the requirement for representing multiple data become the most challenging issue. We introduce tensor to store various data to resolve the challenges for data representation in multiple agent associations. Tensor is expressed as the three-dimensional array, although it is an N-way array, which is useful for representing and accessing numerous data. Finally, we will implement an algorithm for learning three cooperative agents against the opposed team using a tensor-based framework in the Q learning algorithm. We will provide an algorithm that can store the training records and data of multiple agents. Tensor advances to get a small storage size than the matrix for the training records of agents. Although three agent cooperation benefits to having maximum optimal reward.

Download Full-text

A multiagent reinforcement learning algorithm to solve the maximum independent set problem

Multiagent and Grid Systems ◽

10.3233/mgs-200323 ◽

2020 ◽

Vol 16 (1) ◽

pp. 101-115

Author(s):

Mir Mohammad Alipour ◽

Mohsen Abdolhosseinzadeh

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Independent Set ◽

Maximum Independent Set ◽

Independent Set Problem ◽

Multiagent Reinforcement Learning ◽

Maximum Independent Set Problem ◽

Reinforcement Learning Algorithm

Download Full-text

A new multiagent reinforcement learning algorithm to solve the symmetric traveling salesman problem

Multiagent and Grid Systems ◽

10.3233/mgs-150232 ◽

2015 ◽

Vol 11 (2) ◽

pp. 107-119 ◽

Cited By ~ 4

Author(s):

Mir Mohammad Alipour ◽

Seyed Naser Razavi

Keyword(s):

Reinforcement Learning ◽

Traveling Salesman Problem ◽

Learning Algorithm ◽

Traveling Salesman ◽

Multiagent Reinforcement Learning ◽

Symmetric Traveling Salesman Problem ◽

Reinforcement Learning Algorithm

Download Full-text

A Novel Multiagent Reinforcement Learning Algorithm Combination with Quantum Computation

2006 6th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2006.1712835 ◽

2006 ◽

Author(s):

Xiangping Meng ◽

Yu Chen ◽

Yuzhen Pi ◽

Quande Yuan

Keyword(s):

Reinforcement Learning ◽

Quantum Computation ◽

Learning Algorithm ◽

Multiagent Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Opposition-Based Reinforcement Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2006.p0578 ◽

2006 ◽

Vol 10 (4) ◽

pp. 578-585 ◽

Cited By ~ 116

Author(s):

Hamid R. Tizhoosh ◽

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

A Priori ◽

Optimal Solution ◽

Machine Intelligence ◽

Training Data ◽

Superior Performance ◽

Major Drawback ◽

Opposition Based Learning ◽

Dimensional State Space

Reinforcement learning is a machine intelligence scheme for learning in highly dynamic, probabilistic environments. By interaction with the environment, reinforcement agents learn optimal control policies, especially in the absence of a priori knowledge and/or a sufficiently large amount of training data. Despite its advantages, however, reinforcement learning suffers from a major drawback - high calculation cost because convergence to an optimal solution usually requires that all states be visited frequently to ensure that policy is reliable. This is not always possible, however, due to the complex, high-dimensional state space in many applications. This paper introduces opposition-based reinforcement learning, inspired by opposition-based learning, to speed up convergence. Considering opposite actions simultaneously enables individual states to be updated more than once shortening exploration and expediting convergence. Three versions of Q-learning algorithm will be given as examples. Experimental results for the grid world problem of different sizes demonstrate the superior performance of the proposed approach.

Download Full-text