Reinforcement Learning Rebirth, Techniques, Challenges, and Resolutions

Author(s):  
Wasswa Shafik ◽  
Mojtaba Matinkhah ◽  
Parisa Etemadinejad ◽  
Mammann Nur Sanda

Reinforcement learning (RL) is a new propitious research space that is well-known nowadays on the internet of things (IoT), media and social sensing computing are addressing a broad and pertinent task through making decisions sequentially by deterministic and stochastic evolutions. The IoTs extend world connectivity to physical devices like electronic devices network by use interconnect with others over the Internet with the possibility of remotely being supervised and meticulous. In this paper, we comprehensively survey an in-depth assessment of RL techniques in IoT systems focusing on the main known RL techniques like artificial neural network (ANN), Q-learning, Markov Decision Process (MDP), Learning Automata (LA). This study examines and analyses learning technique with focusing on challenges, models performance, similarities and the differences in IoTs accomplish with most correlated proposed state of the art models. The results obtained can be used as a foundation for designing, a model implementation based on the bottlenecks currently assessed with an evaluation of the most fashionable hands-on utility of current methods for reinforcement learning.

Author(s):  
Faxin Qi ◽  
Xiangrong Tong ◽  
Lei Yu ◽  
Yingjie Wang

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1685 ◽  
Author(s):  
Chayoung Kim

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 2782-2798 ◽  
Author(s):  
Lucileide M. D. Da Silva ◽  
Matheus F. Torquato ◽  
Marcelo A. C. Fernandes

2019 ◽  
Vol 12 (1) ◽  
pp. 96-115 ◽  
Author(s):  
Christophe Lethien ◽  
Jean Le Bideau ◽  
Thierry Brousse

The fabrication of miniaturized electrochemical energy storage systems is essential for the development of future electronic devices for Internet of Thing applications. This paper aims at reviewing the current micro-supercapacitor technologies and at defining the guidelines to produce high performance micro-devices with special focuses onto the 3D designs as well as the fabrication of solid state miniaturized devices to solve the packaging issue.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


2013 ◽  
Vol 278-280 ◽  
pp. 2012-2015
Author(s):  
Lian Shi Lin ◽  
Qing Hu ◽  
Yu Ping Qui

The Internet of things is a massive electronic equipment with internet interconnection of large scale virtual networks, including RFID, sensor and actuator electronic devices by the internet interconnection. In order to solve internet of things architecture intelligent refrigerator key technologies, The paper had discussed the internet of things architecture intelligent refrigerator definition, characteristic as well as reference architecture, focused on analysis intelligent refrigerator information space definition, information quantification method and mobile platform equipment internet of things key technology main problems and corresponding solution ways.


Author(s):  
Alex Mathew

There has been a rapid growth of the devices connected to the internet in the last decade for the various internet (IoT) of things applications. The increase of these smart devices has posed a great security concern in the internet of things ecosystem. The internet of things ecosystem must be protected from these threats. Reinforcement learning has been proposed by the cybersecurity professionals to provide the needed security tools for securing the IoT system since it is able to interact with the environment and learn how to detect the threats. This paper presents a comprehensive research on cybersecurity threats to the IoT system applications. The RL algorithms are also presented to understand the attacks on the IoT. Reinforcement learning is widely employed in cybersecurity because it can learn on its own experience by investigating and capitalizing on the unknown ecosystem, this enables it solve many complex problems. The RL capabilities on dealing with cybercrime challenges are also exploited in this paper.


Sign in / Sign up

Export Citation Format

Share Document