A Foremost-Policy Reinforcement Learning Based ART2 Neural Network and Its Learning Algorithm

Author(s):  
Jian Fan ◽  
Gengfeng Wu
2021 ◽  
Vol 10 (1) ◽  
pp. 21
Author(s):  
Omar Nassef ◽  
Toktam Mahmoodi ◽  
Foivos Michelinakis ◽  
Kashif Mahmood ◽  
Ahmed Elmokashfi

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.


1994 ◽  
Vol 6 (2) ◽  
pp. 215-219 ◽  
Author(s):  
Gerald Tesauro

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e., given only a “raw” description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network's input representation, the result is a truly staggering level of performance: the latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.


2020 ◽  
Vol 2020 (4) ◽  
pp. 43-54
Author(s):  
S.V. Khoroshylov ◽  
◽  
M.O. Redka ◽  

The aim of the article is to approximate optimal relative control of an underactuated spacecraft using reinforcement learning and to study the influence of various factors on the quality of such a solution. In the course of this study, methods of theoretical mechanics, control theory, stability theory, machine learning, and computer modeling were used. The problem of in-plane spacecraft relative control using only control actions applied tangentially to the orbit is considered. This approach makes it possible to reduce the propellant consumption of reactive actuators and to simplify the architecture of the control system. However, in some cases, methods of the classical control theory do not allow one to obtain acceptable results. In this regard, the possibility of solving this problem by reinforcement learning methods has been investigated, which allows designers to find control algorithms close to optimal ones as a result of interactions of the control system with the plant using a reinforcement signal characterizing the quality of control actions. The well-known quadratic criterion is used as a reinforcement signal, which makes it possible to take into account both the accuracy requirements and the control costs. A search for control actions based on reinforcement learning is made using the policy iteration algorithm. This algorithm is implemented using the actor–critic architecture. Various representations of the actor for control law implementation and the critic for obtaining value function estimates using neural network approximators are considered. It is shown that the optimal control approximation accuracy depends on a number of features, namely, an appropriate structure of the approximators, the neural network parameter updating method, and the learning algorithm parameters. The investigated approach makes it possible to solve the considered class of control problems for controllers of different structures. Moreover, the approach allows the control system to refine its control algorithms during the spacecraft operation.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yulong Luo

Gas turbine is widely used because of its advantages of fast start and stop, no pollution, and high thermal efficiency. However, the working environment of high temperature, high pressure, and high speed makes the gas turbine prone to failure. The traditional gas path fault intelligent diagnosis scheme of the gas turbine has the problems of poor control effect and low scheduling accuracy. Experiment studies the application of neural network and reinforcement learning algorithm in gas path fault intelligent diagnosis of the gas turbine. The accurate control of fault diagnosis planning is realized from gas path fault diagnosis, daily maintenance, service condition monitoring, power utilization rate, and other aspects of the gas turbine. The reinforcement learning model can realize the intelligent diagnosis and record of gas path fault of the gas turbine, to achieve diversified analysis and intelligent diagnosis scheme. Through neural network algorithm and deep learning technology, the whole process monitoring of the gas turbine is realized, and the failure rate of the gas turbine in the working process is reduced. The experimental results show that, compared with the thermal fault diagnosis method and the fault diagnosis method of the electric percussion drill, using thermal imaging, the gas turbine gas path fault intelligent diagnosis model based on the reinforcement learning algorithm can complete the data information in the process of real-time data transmission. The quantified conversion and processing of the system has the advantages of higher control accuracy and faster response speed, which can effectively improve the diagnostic efficiency and accuracy.


Sign in / Sign up

Export Citation Format

Share Document