scholarly journals Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 2782-2798 ◽  
Author(s):  
Lucileide M. D. Da Silva ◽  
Matheus F. Torquato ◽  
Marcelo A. C. Fernandes
Telecom ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 255-270
Author(s):  
Saeid Pourroostaei Ardakani ◽  
Ali Cheshmehzangi

UAV path planning for remote sensing aims to find the best-fitted routes to complete a data collection mission. UAVs plan the routes and move through them to remotely collect environmental data from particular target zones by using sensory devices such as cameras. Route planning may utilize machine learning techniques to autonomously find/select cost-effective and/or best-fitted routes and achieve optimized results including: minimized data collection delay, reduced UAV power consumption, decreased flight traversed distance and maximized number of collected data samples. This paper utilizes a reinforcement learning technique (location and energy-aware Q-learning) to plan UAV routes for remote sensing in smart farms. Through this, the UAV avoids heuristically or blindly moving throughout a farm, but this takes the benefits of environment exploration–exploitation to explore the farm and find the shortest and most cost-effective paths into target locations with interesting data samples to collect. According to the simulation results, utilizing the Q-learning technique increases data collection robustness and reduces UAV resource consumption (e.g., power), traversed paths, and remote sensing latency as compared to two well-known benchmarks, IEMF and TBID, especially if the target locations are dense and crowded in a farm.


Author(s):  
Masaya Nakata ◽  
◽  
Tomoki Hamagami

The XCS classifier system is an evolutionary rule-based learning technique powered by a Q-learning like learning mechanism. It employs a global deletion scheme to delete rules from all rules covering all state-action pairs. However, the optimality of this scheme remains unclear owing to the lack of intensive analysis. We here introduce two deletion schemes: 1) local deletion, which can be applied to a subset of rules covering each state (a match set), and 2) stronger local deletion, which can be applied to a more specific subset covering each state-action pair (an action set). The aim of this paper is to reveal how the above three deletion schemes affect the performance of XCS. Our analysis shows that the local deletion schemes promote the elimination of inaccurate rules compared with the global deletion scheme. However, the stronger local deletion scheme occasionally deletes a good rule. We further show that the two local deletion schemes greatly improve the performance of XCS on a set of noisy maze problems. Although the localization strength of the proposed deletion schemes may require consideration, they can be adequate for XCS rather than the original global deletion scheme.


Author(s):  
Wasswa Shafik ◽  
Mojtaba Matinkhah ◽  
Parisa Etemadinejad ◽  
Mammann Nur Sanda

Reinforcement learning (RL) is a new propitious research space that is well-known nowadays on the internet of things (IoT), media and social sensing computing are addressing a broad and pertinent task through making decisions sequentially by deterministic and stochastic evolutions. The IoTs extend world connectivity to physical devices like electronic devices network by use interconnect with others over the Internet with the possibility of remotely being supervised and meticulous. In this paper, we comprehensively survey an in-depth assessment of RL techniques in IoT systems focusing on the main known RL techniques like artificial neural network (ANN), Q-learning, Markov Decision Process (MDP), Learning Automata (LA). This study examines and analyses learning technique with focusing on challenges, models performance, similarities and the differences in IoTs accomplish with most correlated proposed state of the art models. The results obtained can be used as a foundation for designing, a model implementation based on the bottlenecks currently assessed with an evaluation of the most fashionable hands-on utility of current methods for reinforcement learning.


Author(s):  
Faxin Qi ◽  
Xiangrong Tong ◽  
Lei Yu ◽  
Yingjie Wang

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.


Author(s):  
Jun Long ◽  
Yueyi Luo ◽  
Xiaoyu Zhu ◽  
Entao Luo ◽  
Mingfeng Huang

AbstractWith the developing of Internet of Things (IoT) and mobile edge computing (MEC), more and more sensing devices are widely deployed in the smart city. These sensing devices generate various kinds of tasks, which need to be sent to cloud to process. Usually, the sensing devices do not equip with wireless modules, because it is neither economical nor energy saving. Thus, it is a challenging problem to find a way to offload tasks for sensing devices. However, many vehicles are moving around the city, which can communicate with sensing devices in an effective and low-cost way. In this paper, we propose a computation offloading scheme through mobile vehicles in IoT-edge-cloud network. The sensing devices generate tasks and transmit the tasks to vehicles, then the vehicles decide to compute the tasks in the local vehicle, MEC server or cloud center. The computation offloading decision is made based on the utility function of the energy consumption and transmission delay, and the deep reinforcement learning technique is adopted to make decisions. Our proposed method can make full use of the existing infrastructures to implement the task offloading of sensing devices, the experimental results show that our proposed solution can achieve the maximum reward and decrease delay.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


Sign in / Sign up

Export Citation Format

Share Document