scholarly journals Relative control of an underactuated spacecraft using reinforcement learning

2020 ◽  
Vol 2020 (4) ◽  
pp. 43-54
Author(s):  
S.V. Khoroshylov ◽  
◽  
M.O. Redka ◽  

The aim of the article is to approximate optimal relative control of an underactuated spacecraft using reinforcement learning and to study the influence of various factors on the quality of such a solution. In the course of this study, methods of theoretical mechanics, control theory, stability theory, machine learning, and computer modeling were used. The problem of in-plane spacecraft relative control using only control actions applied tangentially to the orbit is considered. This approach makes it possible to reduce the propellant consumption of reactive actuators and to simplify the architecture of the control system. However, in some cases, methods of the classical control theory do not allow one to obtain acceptable results. In this regard, the possibility of solving this problem by reinforcement learning methods has been investigated, which allows designers to find control algorithms close to optimal ones as a result of interactions of the control system with the plant using a reinforcement signal characterizing the quality of control actions. The well-known quadratic criterion is used as a reinforcement signal, which makes it possible to take into account both the accuracy requirements and the control costs. A search for control actions based on reinforcement learning is made using the policy iteration algorithm. This algorithm is implemented using the actor–critic architecture. Various representations of the actor for control law implementation and the critic for obtaining value function estimates using neural network approximators are considered. It is shown that the optimal control approximation accuracy depends on a number of features, namely, an appropriate structure of the approximators, the neural network parameter updating method, and the learning algorithm parameters. The investigated approach makes it possible to solve the considered class of control problems for controllers of different structures. Moreover, the approach allows the control system to refine its control algorithms during the spacecraft operation.

2021 ◽  
pp. 201-205
Author(s):  
С.А. Гордин ◽  
И.В. Зайченко ◽  
К.Д. Хряпенко ◽  
В.В. Бажеряну

В статье рассмотрен вопрос повышения точности и качества управления приводом сетевых насосов в составе судовых тепловых установок в системе отопления судна путем применения адаптивной системы автоматического управления. При использовании классических систем управления на основе ПИД-регуляторов для управления мощностью электродвигателя по критерию обеспечения заданного давления в системе теплоснабжения в условиях резкопеременных тепловых нагрузок могут возникать ситуации разрегулирования системы вследствии возникновения дополнительного давления в тепловой установке при термическом расширении теплоносителя. Для обеспечения надежности и безаварийности работы судовых тепловых установок при резкоперменных нагрузках авторами рассматривается возможность использования для управления мощностью электропривода адаптивной системы управления. В статье рассмотрена схема управления с адаптацией коэффициентов ПИД-регулятора на базе нейронной сети (нейросетевой оптимизатор). Нейросетевой оптимизатор был применен как надстройка над ПИД-регулятором в схеме управления мощностью сетевого насоса в составе судовой тепловой установки. Рассмотрены зависимости характеристик систем управления от структуры и параметров модифицированных критериев точности и качества управления. Адаптация параметров регулирования позволяет обеспечить достижение желаемых параметров с меньшими затратами мощности при сохранении уровня надежности и исключить разрегулирование системы управления при резкопеременных тепловых нагрузках. The article discusses the issue of improving the accuracy and quality of control of the drive of network pumps as part of ship thermal installations in the ship's heating system by using an adaptive automatic control system. When using classical control systems based on PID regulators to control the power of the electric motor according to the criterion of providing a given pressure in the heat supply system under conditions of sharply varying thermal loads, situations of system maladjustment may occur due to the appearance of additional pressure in the thermal installation during thermal expansion of the coolant. To ensure the reliability and trouble-free operation of ship thermal installations under abruptly variable loads, the authors consider the possibility of using an adaptive control system to control the power of an electric drive. The article describes a control scheme with adaptation of the PID controller coefficients based on a neural network (neural network optimizer). The neural network optimizer was used as a superstructure over the PID controller in the power control circuit of a network pump as part of a ship's thermal installation. The dependences of the characteristics of control systems on the structure and parameters of the modified criteria for the accuracy and quality of control are considered. Adaptation of control parameters allows achieving the desired parameters with lower power consumption while maintaining the level of reliability and eliminating deregulation of the control system at abruptly varying thermal loads.


2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


2021 ◽  
Vol 10 (1) ◽  
pp. 21
Author(s):  
Omar Nassef ◽  
Toktam Mahmoodi ◽  
Foivos Michelinakis ◽  
Kashif Mahmood ◽  
Ahmed Elmokashfi

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.


Author(s):  
Damien Ernst ◽  
Mevludin Glavic ◽  
Pierre Geurts ◽  
Louis Wehenkel

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.


2000 ◽  
Author(s):  
Magdy Mohamed Abdelhameed ◽  
Sabri Cetinkunt

Abstract Cerebellar model articulation controller (CMAC) is a useful neural network learning technique. It was developed two decades ago but yet lacks an adequate learning algorithm, especially when it is used in a hybrid- type controller. This work is intended to introduce a simulation study for examining the performance of a hybrid-type control system based on the conventional learning algorithm of CMAC neural network. This study showed that the control system is unstable. Then a new adaptive learning algorithm of a CMAC based hybrid- type controller is proposed. The main features of the proposed learning algorithm, as well as the effects of the newly introduced parameters of this algorithm have been studied extensively via simulation case studies. The simulation results showed that the proposed learning algorithm is a robust in stabilizing the control system. Also, this proposed learning algorithm preserved all the known advantages of the CMAC neural network. Part II of this work is dedicated to validate the effectiveness of the proposed CMAC learning algorithm experimentally.


Author(s):  
Amro Shafik ◽  
Magdy Abdelhameed ◽  
Ahmed Kassem

Automation based electrohydraulic servo systems have a wide range of applications in nowadays industry. However, they still suffer from several nonlinearities like deadband in electrohydraulic valves, hysteresis, stick-slip friction in valves and cylinders. In addition, all hydraulic system parameters have uncertainties in their values due to the change of temperature while working. This paper addresses these problems by designing a suitable intelligent control system that has the ability to deal with the system nonlinearities and parameters uncertainties using a fast and online learning algorithm. A novel hybrid control system based on Cerebellar Model Articulation Controller (CMAC) neural network is presented. The proposed controller is composed of two parallel controllers. The first is a conventional Proportional-Velocity (PV) servo type controller which is used to decrease the large initial error of the closed-loop system. The second is a CMAC neural network which is used as an intelligent controller to overcome nonlinear characteristics of the electrohydraulic system. A fourth order model for the electrohydraulic system is introduced. PV controller parameters are tuned to get optimal values. Simulation and experimental results show a good tracking performance obtained using the proposed controller. The controller shows its robustness in two working environments. The first is by adding different inertia loads and the second is working with noisy level input signals.


1994 ◽  
Vol 6 (2) ◽  
pp. 215-219 ◽  
Author(s):  
Gerald Tesauro

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e., given only a “raw” description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network's input representation, the result is a truly staggering level of performance: the latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.


Sign in / Sign up

Export Citation Format

Share Document