Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention

Delali Kwasi Dake; James Dzisi Gadze; Griffith Selorm Klogo; Henry Nunoo-Mensah

doi:10.3390/technologies9030044

Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention

Technologies ◽

10.3390/technologies9030044 ◽

2021 ◽

Vol 9 (3) ◽

pp. 44

Author(s):

Delali Kwasi Dake ◽

James Dzisi Gadze ◽

Griffith Selorm Klogo ◽

Henry Nunoo-Mensah

Keyword(s):

Reinforcement Learning ◽

Mobile Network ◽

Complex Data ◽

State Action ◽

Routing Optimization ◽

Optimization Task ◽

Delay Jitter ◽

Markov Decision ◽

Iot Devices ◽

Network Metrics

The fast emergence of IoT devices and its accompanying big and complex data has necessitated a shift from the traditional networking architecture to software-defined networks (SDNs) in recent times. Routing optimization and DDoS protection in the network has become a necessity for mobile network operators in maintaining a good QoS and QoE for customers. Inspired by the recent advancement in Machine Learning and Deep Reinforcement Learning (DRL), we propose a novel MADDPG integrated Multiagent framework in SDN for efficient multipath routing optimization and malicious DDoS traffic detection and prevention in the network. The two MARL agents cooperate within the same environment to accomplish network optimization task within a shorter time. The state, action, and reward of the proposed framework were further modelled mathematically using the Markov Decision Process (MDP) and later integrated into the MADDPG algorithm. We compared the proposed MADDPG-based framework to DDPG for network metrics: delay, jitter, packet loss rate, bandwidth usage, and intrusion detection. The results show a significant improvement in network metrics with the two agents.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Reinforcement Learning for Optimizing Driving Policies on Cruising Taxis Services

Sustainability ◽

10.3390/su12218883 ◽

2020 ◽

Vol 12 (21) ◽

pp. 8883

Author(s):

Kun Jin ◽

Wei Wang ◽

Xuedong Hua ◽

Wei Zhou

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

State Action ◽

Future Reward ◽

Long Run ◽

Markov Decision ◽

Action Value ◽

Data Expansion ◽

Taking Action ◽

The Value Function

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Download Full-text

Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation

Information ◽

10.3390/info10110341 ◽

2019 ◽

Vol 10 (11) ◽

pp. 341 ◽

Cited By ~ 2

Author(s):

Hu ◽

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Transfer Learning ◽

Action Function ◽

State Action ◽

Markov Decision ◽

Decision Making System ◽

Multi Agent ◽

Function Approximator ◽

Multi Robot

Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.

Download Full-text

Access Control in NB-IoT Networks: A Deep Reinforcement Learning Strategy

Information ◽

10.3390/info11110541 ◽

2020 ◽

Vol 11 (11) ◽

pp. 541

Author(s):

Yassine Hadjadj-Aoul ◽

Soraya Ait-Chellouche

Keyword(s):

Reinforcement Learning ◽

Access Control ◽

Learning Strategy ◽

Random Access ◽

Accurate Information ◽

High Concentration ◽

Access Problem ◽

Learning Techniques ◽

Markov Decision ◽

Iot Devices

The Internet of Things (IoT) is a key enabler of the digital mutation of our society. Driven by various services and applications, Machine Type Communications (MTC) will become an integral part of our daily life, over the next few years. Meeting the ITU-T requirements, in terms of density, battery longevity, coverage, price, and supported mechanisms and functionalities, Cellular IoT, and particularly Narrowband-IoT (NB-IoT), is identified as a promising candidate to handle massive MTC accesses. However, this massive connectivity would pose a huge challenge for network operators in terms of scalability. Indeed, the connection to the network in cellular IoT passes through a random access procedure and a high concentration of IoT devices would, very quickly, lead to a bottleneck. The latter procedure needs, then, to be enhanced as the connectivity would be considerable. With this in mind, we propose, in this paper, to apply the access class barring (ACB) mechanism to regulate the number of devices competing for the access. In order to derive the blocking factor, we formulated the access problem as a Markov decision process that we were able to solve using one of the most advanced deep reinforcement learning techniques. The evaluation of the proposed access control, through simulations, shows the effectiveness of our approach compared to existing approaches such as the adaptive one and the Proportional Integral Derivative (PID) controller. Indeed, it manages to keep the proportion of access attempts close to the optimum, despite the lack of accurate information on the number of access attempts.

Download Full-text

Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3070584 ◽

2021 ◽

pp. 1-13

Author(s):

Long Chen ◽

Bin Hu ◽

Zhi-Hong Guan ◽

Lian Zhao ◽

Xuemin Shen

Keyword(s):

Reinforcement Learning ◽

Multipath Routing ◽

Routing Optimization

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

A Comparative Study of AI-Based Intrusion Detection Techniques in Critical Infrastructures

ACM Transactions on Internet Technology ◽

10.1145/3406093 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-22

Author(s):

Safa Otoum ◽

Burak Kantarci ◽

Hussein Mouftah

Keyword(s):

Reinforcement Learning ◽

Intrusion Detection ◽

Comparative Study ◽

Performance Metrics ◽

Action Learning ◽

Smart Devices ◽

Critical Infrastructures ◽

State Action ◽

Detection Techniques ◽

Depth Analysis

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD’99 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of .

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text