Adaptive Design of Role Differentiation by Division of Reward Function in Multi-Agent Reinforcement Learning

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.

Download Full-text

A Promoting Method of Role Differentiation using a Learning Rate that has a Periodically Negative Value in Multi-agent Reinforcement Learning

Journal of Robotics Networking and Artificial Life ◽

10.2991/jrnal.k.200222.003 ◽

2020 ◽

Vol 6 (4) ◽

pp. 221

Author(s):

Masato Nagayoshi ◽

Simon J. H. Elderton ◽

Hisashi Tamaki

Keyword(s):

Reinforcement Learning ◽

Learning Rate ◽

Role Differentiation ◽

Multi Agent

Download Full-text

Training and inferring neural network function with multi-agent reinforcement learning

10.1101/598086 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew Chalk ◽

Gasper Tkacik ◽

Olivier Marre

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Recurrent Network ◽

Neural Systems ◽

Neural Recordings ◽

Reward Function ◽

Network Function ◽

Structure Changes ◽

Theoretical Predictions ◽

Multi Agent

AbstractA central goal in systems neuroscience is to understand the functions performed by neural circuits. Previous top-down models addressed this question by comparing the behaviour of an ideal model circuit, optimised to perform a given function, with neural recordings. However, this requires guessing in advance what function is being performed, which may not be possible for many neural systems. To address this, we propose a new framework for optimising a recurrent network using multi-agent reinforcement learning (RL). In this framework, a reward function quantifies how desirable each state of the network is for performing a given function. Each neuron is treated as an ‘agent’, which optimises its responses so as to drive the network towards rewarded states. Three applications follow from this. First, one can use multi-agent RL algorithms to optimise a recurrent neural network to perform diverse functions (e.g. efficient sensory coding or motor control). Second, one could use inverse RL to infer the function of a recorded neural network from data. Third, the theory predicts how neural networks should adapt their dynamics to maintain the same function when the external environment or network structure changes. This could lead to theoretical predictions about how neural network dynamics adapt to deal with cell death and/or varying sensory stimulus statistics.

Download Full-text

Spatial-Temporal Traffic Flow Control on Motorways Using Distributed Multi-Agent Reinforcement Learning

Mathematics ◽

10.3390/math9233081 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3081

Author(s):

Krešimir Kušić ◽

Edouard Ivanjko ◽

Filip Vrbanić ◽

Martin Gregurić ◽

Ivana Dusparic

Keyword(s):

Reinforcement Learning ◽

Control Systems ◽

Traffic Control ◽

Adaptive Design ◽

Control Process ◽

Leading Edge ◽

Speed Limit ◽

Multi Agent ◽

Local Policies ◽

The Impact

The prevailing variable speed limit (VSL) systems as an effective strategy for traffic control on motorways have the disadvantage that they only work with static VSL zones. Under changing traffic conditions, VSL systems with static VSL zones may perform suboptimally. Therefore, the adaptive design of VSL zones is required in traffic scenarios where congestion characteristics vary widely over space and time. To address this problem, we propose a novel distributed spatial-temporal multi-agent VSL (DWL-ST-VSL) approach capable of dynamically adjusting the length and position of VSL zones to complement the adjustment of speed limits in current VSL control systems. To model DWL-ST-VSL, distributed W-learning (DWL), a reinforcement learning (RL)-based algorithm for collaborative agent-based self-optimization toward multiple policies, is used. Each agent uses RL to learn local policies, thereby maximizing travel speed and eliminating congestion. In addition to local policies, through the concept of remote policies, agents learn how their actions affect their immediate neighbours and which policy or action is preferred in a given situation. To assess the impact of deploying additional agents in the control loop and the different cooperation levels on the control process, DWL-ST-VSL is evaluated in a four-agent configuration (DWL4-ST-VSL). This evaluation is done via SUMO microscopic simulations using collaborative agents controlling four segments upstream of the congestion in traffic scenarios with medium and high traffic loads. DWL also allows for heterogeneity in agents’ policies; cooperating agents in DWL4-ST-VSL implement two speed limit sets with different granularity. DWL4-ST-VSL outperforms all baselines (W-learning-based VSL and simple proportional speed control), which use static VSL zones. Finally, our experiments yield insights into the new concept of VSL control. This may trigger further research on using advanced learning-based technology to design a new generation of adaptive traffic control systems to meet the requirements of operating in a nonstationary environment and at the leading edge of emerging connected and autonomous vehicles in general.

Download Full-text

A Method of Role Differentiation Using a State Space Filter with a Waveform Changing Parameter in Multi-agent Reinforcement Learning

Proceedings of International Conference on Artificial Life and Robotics ◽

10.5954/icarob.2021.gs1-1 ◽

2021 ◽

Vol 26 ◽

pp. 461-464

Author(s):

Masato Nagayoshi ◽

Simon Elderton ◽

Hisashi Tamaki

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Role Differentiation ◽

Multi Agent ◽

Space Filter

Download Full-text

A Multi-agent Reinforcement Learning Method for Role Differentiation Using State Space Filters with Fluctuation Parameters

Journal of Robotics Networking and Artificial Life ◽

10.2991/jrnal.k.210521.002 ◽

2021 ◽

Author(s):

Masato Nagayoshi ◽

Simon Elderton ◽

Hisashi Tamaki

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Method ◽

Role Differentiation ◽

Multi Agent

Download Full-text

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7247 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13949-13950

Author(s):

Wang Qisheng ◽

Wang Qichao ◽

Li Xiao

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Experimental Results ◽

State Action ◽

Reward Function ◽

Current State ◽

Learning Speed ◽

Communication Method ◽

Experience Replay ◽

Multi Agent

Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.

Download Full-text

A Novel Ship Collision Avoidance Awareness Approach for Cooperating Ships Using Multi-Agent Deep Reinforcement Learning

Journal of Marine Science and Engineering ◽

10.3390/jmse9101056 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1056

Author(s):

Chen Chen ◽

Feng Ma ◽

Xiaobin Xu ◽

Yuwang Chen ◽

Jin Wang

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Driving Forces ◽

Machine Learning Techniques ◽

Practical Significance ◽

Individual Agent ◽

Reward Function ◽

Learning Techniques ◽

International Regulations ◽

Multi Agent

Ships are special machineries with large inertias and relatively weak driving forces. Simulating the manual operations of manipulating ships with artificial intelligence (AI) and machine learning techniques becomes more and more common, in which avoiding collisions in crowded waters may be the most challenging task. This research proposes a cooperative collision avoidance approach for multiple ships using a multi-agent deep reinforcement learning (MADRL) algorithm. Specifically, each ship is modeled as an individual agent, controlled by a Deep Q-Network (DQN) method and described by a dedicated ship motion model. Each agent observes the state of itself and other ships as well as the surrounding environment. Then, agents analyze the navigation situation and make motion decisions accordingly. In particular, specific reward function schemas are designed to simulate the degree of cooperation among agents. According to the International Regulations for Preventing Collisions at Sea (COLREGs), three typical scenarios of simulation, which are head-on, overtaking and crossing, are established to validate the proposed approach. With sufficient training of MADRL, the ship agents were capable of avoiding collisions through cooperation in narrow crowded waters. This method provides new insights for bionic modeling of ship operations, which is of important theoretical and practical significance.

Download Full-text

A Promoting Method of Role Differentiation Using a Learning Rate that Has a Periodically Negative Value in Multi-agent Reinforcement Learning

Proceedings of International Conference on Artificial Life and Robotics ◽

10.5954/icarob.2020.gs6-1 ◽

2020 ◽

Vol 25 ◽

pp. 56-59

Author(s):

Masato Nagayoshi ◽

Simon Elderton ◽

Hisashi Tamaki

Keyword(s):

Reinforcement Learning ◽

Learning Rate ◽

Role Differentiation ◽

Multi Agent

Download Full-text