Reinforcement Learning Rebirth, Techniques, Challenges, and Resolutions

Wasswa Shafik; Mojtaba Matinkhah; Parisa Etemadinejad; Mammann Nur Sanda

doi:10.30630/joiv.4.3.376

Reinforcement Learning Rebirth, Techniques, Challenges, and Resolutions

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.4.3.376 ◽

2020 ◽

Vol 4 (3) ◽

Author(s):

Wasswa Shafik ◽

Mojtaba Matinkhah ◽

Parisa Etemadinejad ◽

Mammann Nur Sanda

Keyword(s):

Reinforcement Learning ◽

Electronic Devices ◽

Learning Automata ◽

The Internet ◽

Q Learning ◽

Hands On ◽

Learning Technique ◽

Markov Decision ◽

Artificial Neural Network Ann ◽

The Internet Of Things

Reinforcement learning (RL) is a new propitious research space that is well-known nowadays on the internet of things (IoT), media and social sensing computing are addressing a broad and pertinent task through making decisions sequentially by deterministic and stochastic evolutions. The IoTs extend world connectivity to physical devices like electronic devices network by use interconnect with others over the Internet with the possibility of remotely being supervised and meticulous. In this paper, we comprehensively survey an in-depth assessment of RL techniques in IoT systems focusing on the main known RL techniques like artificial neural network (ANN), Q-learning, Markov Decision Process (MDP), Learning Automata (LA). This study examines and analyses learning technique with focusing on challenges, models performance, similarities and the differences in IoTs accomplish with most correlated proposed state of the art models. The results obtained can be used as a foundation for designing, a model implementation based on the bottlenecks currently assessed with an evaluation of the most fashionable hands-on utility of current methods for reinforcement learning.

Download Full-text

Personalized project recommendations: using reinforcement learning

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1619-6 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Faxin Qi ◽

Xiangrong Tong ◽

Lei Yu ◽

Yingjie Wang

Keyword(s):

Reinforcement Learning ◽

User Behavior ◽

Collaborative Work ◽

Recursive Least Squares ◽

The Internet ◽

Dynamic Impact ◽

Rls Algorithm ◽

Trust Value ◽

Q Learning ◽

Actual Evaluation

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences

Symmetry ◽

10.3390/sym12101685 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1685 ◽

Cited By ~ 1

Author(s):

Chayoung Kim

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Real Time ◽

Temporal Difference ◽

Q Learning ◽

State Action ◽

Proposed Model ◽

Reward Functions ◽

And Performance ◽

The Internet Of Things

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.

Download Full-text

An efficient route planning model for mobile agents on the internet of things using Markov decision process

Ad Hoc Networks ◽

10.1016/j.adhoc.2019.102053 ◽

2020 ◽

Vol 98 ◽

pp. 102053 ◽

Cited By ~ 4

Author(s):

Shamim Yousefi ◽

Farnaz Derakhshan ◽

Hadis Karimipour ◽

Hadi S. Aghdasi

Keyword(s):

Internet Of Things ◽

Markov Decision Process ◽

Decision Process ◽

Mobile Agents ◽

Route Planning ◽

The Internet ◽

Planning Model ◽

Markov Decision ◽

Efficient Route ◽

The Internet Of Things

Download Full-text

Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

IEEE Access ◽

10.1109/access.2018.2885950 ◽

2019 ◽

Vol 7 ◽

pp. 2782-2798 ◽

Cited By ~ 9

Author(s):

Lucileide M. D. Da Silva ◽

Matheus F. Torquato ◽

Marcelo A. C. Fernandes

Keyword(s):

Reinforcement Learning ◽

Parallel Implementation ◽

Q Learning ◽

Learning Technique

Download Full-text

Challenges and prospects of 3D micro-supercapacitors for powering the internet of things

Energy & Environmental Science ◽

10.1039/c8ee02029a ◽

2019 ◽

Vol 12 (1) ◽

pp. 96-115 ◽

Cited By ~ 88

Author(s):

Christophe Lethien ◽

Jean Le Bideau ◽

Thierry Brousse

Keyword(s):

Energy Storage ◽

Solid State ◽

Internet Of Things ◽

High Performance ◽

Storage Systems ◽

Electronic Devices ◽

Electrochemical Energy Storage ◽

The Internet ◽

The Internet Of Things ◽

Internet Of Thing

The fabrication of miniaturized electrochemical energy storage systems is essential for the development of future electronic devices for Internet of Thing applications. This paper aims at reviewing the current micro-supercapacitor technologies and at defining the guidelines to produce high performance micro-devices with special focuses onto the 3D designs as well as the fabrication of solid state miniaturized devices to solve the packaging issue.

Download Full-text

Using Deep Reinforcement Learning to Improve Sensor Selection in the Internet of Things

IEEE Access ◽

10.1109/access.2020.2994600 ◽

2020 ◽

Vol 8 ◽

pp. 95208-95222

Author(s):

Hootan Rashtian ◽

Sathish Gopalakrishnan

Keyword(s):

Reinforcement Learning ◽

Internet Of Things ◽

Sensor Selection ◽

The Internet ◽

The Internet Of Things

Download Full-text

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior ◽

10.1177/105971239500400102 ◽

1995 ◽

Vol 4 (1) ◽

pp. 3-28 ◽

Cited By ~ 15

Author(s):

Mance E. Harmon ◽

Leemon C. Baird ◽

A. Harry Klopf

Keyword(s):

Reinforcement Learning ◽

Differential Game ◽

Learning Algorithm ◽

Learning System ◽

Test Bed ◽

Linear Quadratic ◽

Time Step ◽

Q Learning ◽

Step Duration ◽

Markov Decision

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Download Full-text

Information Processing and Key Technology Based on Internet of Thing Architecture for Intelligent Refrigerator

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.278-280.2012 ◽

2013 ◽

Vol 278-280 ◽

pp. 2012-2015

Author(s):

Lian Shi Lin ◽

Qing Hu ◽

Yu Ping Qui

Keyword(s):

Internet Of Things ◽

Large Scale ◽

Electronic Devices ◽

The Internet ◽

Reference Architecture ◽

Virtual Networks ◽

Key Technology ◽

Quantification Method ◽

The Internet Of Things ◽

Internet Of Thing

The Internet of things is a massive electronic equipment with internet interconnection of large scale virtual networks, including RFID, sensor and actuator electronic devices by the internet interconnection. In order to solve internet of things architecture intelligent refrigerator key technologies, The paper had discussed the internet of things architecture intelligent refrigerator definition, characteristic as well as reference architecture, focused on analysis intelligent refrigerator information space definition, information quantification method and mobile platform equipment internet of things key technology main problems and corresponding solution ways.

Download Full-text

Deep Reinforcement Learning for Cybersecurity Applications

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i12.005 ◽

2021 ◽

Vol 10 (12) ◽

pp. 32-38

Author(s):

Alex Mathew

Keyword(s):

Reinforcement Learning ◽

Internet Of Things ◽

Rapid Growth ◽

Smart Devices ◽

The Internet ◽

Complex Problems ◽

The Internet Of Things ◽

Security Concern

There has been a rapid growth of the devices connected to the internet in the last decade for the various internet (IoT) of things applications. The increase of these smart devices has posed a great security concern in the internet of things ecosystem. The internet of things ecosystem must be protected from these threats. Reinforcement learning has been proposed by the cybersecurity professionals to provide the needed security tools for securing the IoT system since it is able to interact with the environment and learn how to detect the threats. This paper presents a comprehensive research on cybersecurity threats to the IoT system applications. The RL algorithms are also presented to understand the attacks on the IoT. Reinforcement learning is widely employed in cybersecurity because it can learn on its own experience by investigating and capitalizing on the unknown ecosystem, this enables it solve many complex problems. The RL capabilities on dealing with cybercrime challenges are also exploited in this paper.

Download Full-text