Using Intelligent Multi-Agent Systems to Model and Foster Self-Regulated Learning: A Theoretically-Based Approach Using Markov Decision Process

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Download Full-text

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/65 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yong Liu ◽

Yujing Hu ◽

Yang Gao ◽

Yingfeng Chen ◽

Changjie Fan

Keyword(s):

Reinforcement Learning ◽

Knowledge Transfer ◽

Value Function ◽

Single Agent ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Dimensional State Space ◽

Multi Agent ◽

Function Transfer

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Multi-agent Web Service Composition using Partially Observable Markov Decision Process

Proceedings of the International Conference on Advances in Information Communication Technology & Computing - AICTC '16 ◽

10.1145/2979779.2979849 ◽

2016 ◽

Author(s):

Joel Christian ◽

Mohammed Hussain Bohara

Keyword(s):

Web Service ◽

Markov Decision Process ◽

Service Composition ◽

Decision Process ◽

Web Service Composition ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Interval-Based Markov Decision Processes for Regulating Interactions Between Two Agents in Multi-agent Systems

Applied Parallel Computing. State of the Art in Scientific Computing - Lecture Notes in Computer Science ◽

10.1007/11558958_12 ◽

2006 ◽

pp. 102-111 ◽

Cited By ~ 3

Author(s):

Graçaliz P. Dimuro ◽

Antônio C. R. Costa

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Multi Agent Systems ◽

Agent Systems ◽

Two Agents ◽

Markov Decision ◽

Multi Agent

Download Full-text

Cooperation in Adaptive Multi-Agent Systems through System of Systems modeling

10.29007/kqfk ◽

2018 ◽

Author(s):

Teddy Bouziat ◽

Valérie Camps ◽

Stéphanie Combettes

Keyword(s):

Decision Process ◽

Transportation Problem ◽

System Of Systems ◽

Generic Model ◽

Multi Agent System ◽

Multi Agent Systems ◽

Systems Of Systems ◽

Agent Systems ◽

Proposed Model ◽

Multi Agent

This paper addresses the modeling and design of Systems of Systems (SoS) as well as inter multi-agent systems cooperation. It presents and illustrates a new generic model to describe formally SoS. Then, this model is used to propose a study of inter-AMAS (Adaptive Multi-Agent System) cooperation. Each AMAS, reified as a component-system of a SoS, uses a cooperative decision process in order to interact with other AMAS and to collectively give rise to a relevant overall function at the SoS level. The proposed model as well as the inter-AMAS study are instantiated to a simulated resources transportation problem.

Download Full-text

Online learning for Markov decision processes applied to multi-agent systems

2017 IEEE 56th Annual Conference on Decision and Control (CDC) ◽

10.1109/cdc.2017.8263879 ◽

2017 ◽

Author(s):

Mahmoud El Chamie ◽

Behcet Acikmese ◽

Mehran Mesbahi

Keyword(s):

Online Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Multi Agent

Download Full-text

Markov Decision Process Based Multi-agent System Applied to Aeroengine Maintenance Policy Optimization

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2008.427 ◽

2008 ◽

Cited By ~ 3

Author(s):

Jianrong Wang ◽

Shouming Hou ◽

Yingying Su ◽

Jianwei Du ◽

Wanshan Wang

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Multi Agent System ◽

Maintenance Policy ◽

Agent System ◽

Markov Decision ◽

Multi Agent ◽

Policy Optimization

Download Full-text

The Convergence of a Cooperation Markov Decision Process System

Entropy ◽

10.3390/e22090955 ◽

2020 ◽

Vol 22 (9) ◽

pp. 955

Author(s):

Xiaoling Mo ◽

Daoyun Xu ◽

Zufeng Fu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Optimal Strategy ◽

Value Function ◽

Single Agent ◽

Two Agents ◽

Markov Decision ◽

Process System ◽

Multi Agent ◽

Game Environment

In a general Markov decision progress system, only one agent’s learning evolution is considered. However, considering the learning evolution of a single agent in many problems has some limitations, more and more applications involve multi-agent. There are two types of cooperation, game environment among multi-agent. Therefore, this paper introduces a Cooperation Markov Decision Process (CMDP) system with two agents, which is suitable for the learning evolution of cooperative decision between two agents. It is further found that the value function in the CMDP system also converges in the end, and the convergence value is independent of the choice of the value of the initial value function. This paper presents an algorithm for finding the optimal strategy pair (πk0,πk1) in the CMDP system, whose fundamental task is to find an optimal strategy pair and form an evolutionary system CMDP(πk0,πk1). Finally, an example is given to support the theoretical results.

Download Full-text