Adaptive Design of Role Differentiation by Division of Reward Function in Multi-Agent Reinforcement Learning

2010 ◽  
Vol 3 (1) ◽  
pp. 26-34
Author(s):  
Tadahiro TANIGUCHI ◽  
Kazuma TABUCHI ◽  
Tetsuo SAWARAGI
Author(s):  
Thomas Recchia ◽  
Jae Chung ◽  
Kishore Pochiraju

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.


2019 ◽  
Author(s):  
Matthew Chalk ◽  
Gasper Tkacik ◽  
Olivier Marre

AbstractA central goal in systems neuroscience is to understand the functions performed by neural circuits. Previous top-down models addressed this question by comparing the behaviour of an ideal model circuit, optimised to perform a given function, with neural recordings. However, this requires guessing in advance what function is being performed, which may not be possible for many neural systems. To address this, we propose a new framework for optimising a recurrent network using multi-agent reinforcement learning (RL). In this framework, a reward function quantifies how desirable each state of the network is for performing a given function. Each neuron is treated as an ‘agent’, which optimises its responses so as to drive the network towards rewarded states. Three applications follow from this. First, one can use multi-agent RL algorithms to optimise a recurrent neural network to perform diverse functions (e.g. efficient sensory coding or motor control). Second, one could use inverse RL to infer the function of a recorded neural network from data. Third, the theory predicts how neural networks should adapt their dynamics to maintain the same function when the external environment or network structure changes. This could lead to theoretical predictions about how neural network dynamics adapt to deal with cell death and/or varying sensory stimulus statistics.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3081
Author(s):  
Krešimir Kušić ◽  
Edouard Ivanjko ◽  
Filip Vrbanić ◽  
Martin Gregurić ◽  
Ivana Dusparic

The prevailing variable speed limit (VSL) systems as an effective strategy for traffic control on motorways have the disadvantage that they only work with static VSL zones. Under changing traffic conditions, VSL systems with static VSL zones may perform suboptimally. Therefore, the adaptive design of VSL zones is required in traffic scenarios where congestion characteristics vary widely over space and time. To address this problem, we propose a novel distributed spatial-temporal multi-agent VSL (DWL-ST-VSL) approach capable of dynamically adjusting the length and position of VSL zones to complement the adjustment of speed limits in current VSL control systems. To model DWL-ST-VSL, distributed W-learning (DWL), a reinforcement learning (RL)-based algorithm for collaborative agent-based self-optimization toward multiple policies, is used. Each agent uses RL to learn local policies, thereby maximizing travel speed and eliminating congestion. In addition to local policies, through the concept of remote policies, agents learn how their actions affect their immediate neighbours and which policy or action is preferred in a given situation. To assess the impact of deploying additional agents in the control loop and the different cooperation levels on the control process, DWL-ST-VSL is evaluated in a four-agent configuration (DWL4-ST-VSL). This evaluation is done via SUMO microscopic simulations using collaborative agents controlling four segments upstream of the congestion in traffic scenarios with medium and high traffic loads. DWL also allows for heterogeneity in agents’ policies; cooperating agents in DWL4-ST-VSL implement two speed limit sets with different granularity. DWL4-ST-VSL outperforms all baselines (W-learning-based VSL and simple proportional speed control), which use static VSL zones. Finally, our experiments yield insights into the new concept of VSL control. This may trigger further research on using advanced learning-based technology to design a new generation of adaptive traffic control systems to meet the requirements of operating in a nonstationary environment and at the leading edge of emerging connected and autonomous vehicles in general.


2020 ◽  
Vol 34 (10) ◽  
pp. 13949-13950
Author(s):  
Wang Qisheng ◽  
Wang Qichao ◽  
Li Xiao

Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.


2021 ◽  
Vol 9 (10) ◽  
pp. 1056
Author(s):  
Chen Chen ◽  
Feng Ma ◽  
Xiaobin Xu ◽  
Yuwang Chen ◽  
Jin Wang

Ships are special machineries with large inertias and relatively weak driving forces. Simulating the manual operations of manipulating ships with artificial intelligence (AI) and machine learning techniques becomes more and more common, in which avoiding collisions in crowded waters may be the most challenging task. This research proposes a cooperative collision avoidance approach for multiple ships using a multi-agent deep reinforcement learning (MADRL) algorithm. Specifically, each ship is modeled as an individual agent, controlled by a Deep Q-Network (DQN) method and described by a dedicated ship motion model. Each agent observes the state of itself and other ships as well as the surrounding environment. Then, agents analyze the navigation situation and make motion decisions accordingly. In particular, specific reward function schemas are designed to simulate the degree of cooperation among agents. According to the International Regulations for Preventing Collisions at Sea (COLREGs), three typical scenarios of simulation, which are head-on, overtaking and crossing, are established to validate the proposed approach. With sufficient training of MADRL, the ship agents were capable of avoiding collisions through cooperation in narrow crowded waters. This method provides new insights for bionic modeling of ship operations, which is of important theoretical and practical significance.


Sign in / Sign up

Export Citation Format

Share Document