Access Control in NB-IoT Networks: A Deep Reinforcement Learning Strategy

Yassine Hadjadj-Aoul; Soraya Ait-Chellouche

doi:10.3390/info11110541

Access Control in NB-IoT Networks: A Deep Reinforcement Learning Strategy

Information ◽

10.3390/info11110541 ◽

2020 ◽

Vol 11 (11) ◽

pp. 541

Author(s):

Yassine Hadjadj-Aoul ◽

Soraya Ait-Chellouche

Keyword(s):

Reinforcement Learning ◽

Access Control ◽

Learning Strategy ◽

Random Access ◽

Accurate Information ◽

High Concentration ◽

Access Problem ◽

Learning Techniques ◽

Markov Decision ◽

Iot Devices

The Internet of Things (IoT) is a key enabler of the digital mutation of our society. Driven by various services and applications, Machine Type Communications (MTC) will become an integral part of our daily life, over the next few years. Meeting the ITU-T requirements, in terms of density, battery longevity, coverage, price, and supported mechanisms and functionalities, Cellular IoT, and particularly Narrowband-IoT (NB-IoT), is identified as a promising candidate to handle massive MTC accesses. However, this massive connectivity would pose a huge challenge for network operators in terms of scalability. Indeed, the connection to the network in cellular IoT passes through a random access procedure and a high concentration of IoT devices would, very quickly, lead to a bottleneck. The latter procedure needs, then, to be enhanced as the connectivity would be considerable. With this in mind, we propose, in this paper, to apply the access class barring (ACB) mechanism to regulate the number of devices competing for the access. In order to derive the blocking factor, we formulated the access problem as a Markov decision process that we were able to solve using one of the most advanced deep reinforcement learning techniques. The evaluation of the proposed access control, through simulations, shows the effectiveness of our approach compared to existing approaches such as the adaptive one and the Proportional Integral Derivative (PID) controller. Indeed, it manages to keep the proportion of access attempts close to the optimum, despite the lack of accurate information on the number of access attempts.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques

International Journal of General Systems ◽

10.1080/03081079.2014.883387 ◽

2014 ◽

Vol 43 (6) ◽

pp. 649-669 ◽

Cited By ~ 7

Author(s):

Abhijit Gosavi

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Techniques ◽

Markov Decision

Download Full-text

Towards effectively feature graph-based IoT botnet detection via reinforcement learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210699 ◽

2021 ◽

pp. 1-14

Author(s):

Quoc-Dung Ngo ◽

Huy-Trung Nguyen ◽

Le-Cuong Nguyen

Keyword(s):

Reinforcement Learning ◽

Source Code ◽

Training Dataset ◽

Future Research ◽

Botnet Detection ◽

Extraction Mechanism ◽

Time Consumption ◽

Learning Techniques ◽

Security Challenges ◽

Iot Devices

Over the last decade, due to exponential growth in IoT devices and weak security mechanisms, the IoT is now facing more security challenges than ever before, especially botnet malware. There are many security solutions in detecting botnet malware on IoT devices. However, detecting IoT botnet malware, particularly multi-architecture botnets, is challenging. This paper proposes a graphically structured feature extraction mechanism integrated with reinforcement learning techniques in multi-architecture IoT botnet detection. We then evaluate the proposed approach using a dataset of 22849 samples, including actual IoT botnet malware, and achieve a detection rate of 98.03 with low time consumption. The proposed approach also achieves reliable results in detecting the new IoT botnet (has a new architecture-processor) not appearing in the training dataset at 96.69. To promote future research in the field, we share relevant datasets and source code.

Download Full-text

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_30 ◽

2021 ◽

pp. 651-673

Author(s):

Ernst Moritz Hahn ◽

Mateo Perez ◽

Sven Schewe ◽

Fabio Somenzi ◽

Ashutosh Trivedi ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Control Strategy ◽

Natural Extension ◽

Decision Processes ◽

Optimal Control Strategy ◽

Model Free ◽

Learning Techniques ◽

Markov Decision

AbstractWe study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Download Full-text

An Overview of Inverse Reinforcement Learning Techniques

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210097 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Giuseppe De Pietro

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Decision Process ◽

Autonomous Agents ◽

Theoretical Background ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Learning Techniques ◽

Markov Decision ◽

Potential Use

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.

Download Full-text

Deep Dyna-Reinforcement Learning Based on Random Access Control in LEO Satellite IoT Networks

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3112907 ◽

2021 ◽

pp. 1-1

Author(s):

Xiangnan Liu ◽

Haijun Zhang ◽

Keping Long ◽

Arumugam Nallanathan ◽

Victor C. M. Leung

Keyword(s):

Reinforcement Learning ◽

Access Control ◽

Random Access ◽

Leo Satellite

Download Full-text

Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention

Technologies ◽

10.3390/technologies9030044 ◽

2021 ◽

Vol 9 (3) ◽

pp. 44

Author(s):

Delali Kwasi Dake ◽

James Dzisi Gadze ◽

Griffith Selorm Klogo ◽

Henry Nunoo-Mensah

Keyword(s):

Reinforcement Learning ◽

Mobile Network ◽

Complex Data ◽

State Action ◽

Routing Optimization ◽

Optimization Task ◽

Delay Jitter ◽

Markov Decision ◽

Iot Devices ◽

Network Metrics

The fast emergence of IoT devices and its accompanying big and complex data has necessitated a shift from the traditional networking architecture to software-defined networks (SDNs) in recent times. Routing optimization and DDoS protection in the network has become a necessity for mobile network operators in maintaining a good QoS and QoE for customers. Inspired by the recent advancement in Machine Learning and Deep Reinforcement Learning (DRL), we propose a novel MADDPG integrated Multiagent framework in SDN for efficient multipath routing optimization and malicious DDoS traffic detection and prevention in the network. The two MARL agents cooperate within the same environment to accomplish network optimization task within a shorter time. The state, action, and reward of the proposed framework were further modelled mathematically using the Markov Decision Process (MDP) and later integrated into the MADDPG algorithm. We compared the proposed MADDPG-based framework to DDPG for network metrics: delay, jitter, packet loss rate, bandwidth usage, and intrusion detection. The results show a significant improvement in network metrics with the two agents.

Download Full-text

Reinforcement Learning for Efficient Network Penetration Testing

Information ◽

10.3390/info11010006 ◽

2019 ◽

Vol 11 (1) ◽

pp. 6 ◽

Cited By ~ 3

Author(s):

Mohamed C. Ghanem ◽

Thomas M. Chen

Keyword(s):

Reinforcement Learning ◽

Computer Network ◽

Complex Problem ◽

Machine Learning Techniques ◽

Testing System ◽

Penetration Testing ◽

Learning Module ◽

Learning Techniques ◽

Partially Observed ◽

Markov Decision

Penetration testing (also known as pentesting or PT) is a common practice for actively assessing the defenses of a computer network by planning and executing all possible attacks to discover and exploit existing vulnerabilities. Current penetration testing methods are increasingly becoming non-standard, composite and resource-consuming despite the use of evolving tools. In this paper, we propose and evaluate an AI-based pentesting system which makes use of machine learning techniques, namely reinforcement learning (RL) to learn and reproduce average and complex pentesting activities. The proposed system is named Intelligent Automated Penetration Testing System (IAPTS) consisting of a module that integrates with industrial PT frameworks to enable them to capture information, learn from experience, and reproduce tests in future similar testing cases. IAPTS aims to save human resources while producing much-enhanced results in terms of time consumption, reliability and frequency of testing. IAPTS takes the approach of modeling PT environments and tasks as a partially observed Markov decision process (POMDP) problem which is solved by POMDP-solver. Although the scope of this paper is limited to network infrastructures PT planning and not the entire practice, the obtained results support the hypothesis that RL can enhance PT beyond the capabilities of any human PT expert in terms of time consumed, covered attacking vectors, accuracy and reliability of the outputs. In addition, this work tackles the complex problem of expertise capturing and re-use by allowing the IAPTS learning module to store and re-use PT policies in the same way that a human PT expert would learn but in a more efficient way.

Download Full-text

UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy

Sensors ◽

10.3390/s20195630 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5630

Author(s):

Jingyi Xie ◽

Xiaodong Peng ◽

Haijiao Wang ◽

Wenlong Niu ◽

Xiao Zheng

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Control Method ◽

Heuristic Rules ◽

Learning Framework ◽

Model Free ◽

Simulation Engine ◽

Markov Decision ◽

Moving Platform ◽

Partially Observable

Unmanned aerial vehicle (UAV) autonomous tracking and landing is playing an increasingly important role in military and civil applications. In particular, machine learning has been successfully introduced to robotics-related tasks. A novel UAV autonomous tracking and landing approach based on a deep reinforcement learning strategy is presented in this paper, with the aim of dealing with the UAV motion control problem in an unpredictable and harsh environment. Instead of building a prior model and inferring the landing actions based on heuristic rules, a model-free method based on a partially observable Markov decision process (POMDP) is proposed. In the POMDP model, the UAV automatically learns the landing maneuver by an end-to-end neural network, which combines the Deep Deterministic Policy Gradients (DDPG) algorithm and heuristic rules. A Modular Open Robots Simulation Engine (MORSE)-based reinforcement learning framework is designed and validated with a continuous UAV tracking and landing task on a randomly moving platform in high sensor noise and intermittent measurements. The simulation results show that when the moving platform is moving in different trajectories, the average landing success rate of the proposed algorithm is about 10% higher than that of the Proportional-Integral-Derivative (PID) method. As an indirect result, a state-of-the-art deep reinforcement learning-based UAV control method is validated, where the UAV can learn the optimal strategy of a continuously autonomous landing and perform properly in a simulation environment.

Download Full-text