Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

Generating an optimal control policy via apprenticeship and reinforcement learning

31st European Symposium on Computer Aided Process Engineering - Computer Aided Chemical Engineering ◽

10.1016/b978-0-323-88506-5.50187-x ◽

2021 ◽

pp. 1215-1220

Author(s):

Max Mowbray ◽

Robin Smith ◽

Antonio Del Río Chanona ◽

Dongda Zhang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy

Download Full-text

Decentralized Reinforcement Learning Robust Optimal Tracking Control for Time Varying Constrained Reconfigurable Modular Robot Based on ACI andQ-Function

Mathematical Problems in Engineering ◽

10.1155/2013/387817 ◽

2013 ◽

Vol 2013 ◽

pp. 1-16 ◽

Cited By ~ 9

Author(s):

Bo Dong ◽

Yuanchun Li

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Theory ◽

Continuous Time ◽

Tracking Control ◽

Control Policy ◽

Time Varying ◽

Optimal Tracking ◽

Tracking Controller ◽

Global Uncertainty

A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurable modular robots based on action-critic-identifier (ACI) and state-action value function (Q-function) has been presented to solve the problem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system. The dynamics of time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem, and continuous time state equation andQ-function have been designed in this paper. Combining with ACI and RBF network, the global uncertainty of the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated, where critic-NN and action-NN are used to approximate the optimalQ-function and the optimal control policy, and the identifier is adopted to identify the global uncertainty as well as RBF-NN which is used to update the weights of ACI-NN. On this basis, a novel decentralized robust optimal tracking controller of the subsystem is proposed, so that the subsystem can track the desired trajectory and the tracking error can converge to zero in a finite time. The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory. Finally, comparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory.

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740.v2 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Near optimal control policy for controlling power system stabilizers using reinforcement learning

2009 IEEE Power & Energy Society General Meeting ◽

10.1109/pes.2009.5275575 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ramtin Hadidi ◽

Benjamin Jeyasurya

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Power System ◽

Control Policy ◽

Power System Stabilizers

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

Sensors ◽

10.3390/s20051359 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1359 ◽

Cited By ~ 4

Author(s):

Hyun-Kyo Lim ◽

Ju-Bong Kim ◽

Joo-Seong Heo ◽

Youn-Hee Han

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Learning Process ◽

Learning Experience ◽

Control Policy ◽

Model Parameters ◽

Policy Model ◽

Iot Devices ◽

Policy Optimization ◽

Training Control

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740.v1 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Undiscounted Control Policy Generation for Continuous-Valued Optimal Control by Approximate Dynamic Programming

International Journal of Control ◽

10.1080/00207179.2021.1939892 ◽

2021 ◽

pp. 1-35

Author(s):

Jonathan Lock ◽

Tomas McKelvey

Keyword(s):

Optimal Control ◽

Dynamic Programming ◽

Approximate Dynamic Programming ◽

Control Policy

Download Full-text

Convergence results for an averaged LQR problem with applications to reinforcement learning

Mathematics of Control Signals and Systems ◽

10.1007/s00498-021-00294-y ◽

2021 ◽

Author(s):

Andrea Pesare ◽

Michele Palladino ◽

Maurizio Falcone

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Optimal Control Problem ◽

Control Problem ◽

Current System ◽

Numerical Test ◽

Linear Quadratic ◽

Linear Quadratic Optimal Control ◽

Lqr Problem ◽

Quadratic Optimal Control

AbstractIn this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $$\pi $$ π on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain $$\pi $$ π converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Download Full-text

Data-driven dynamic multi-objective optimal control: A Hamiltonian-inequality driven satisficing reinforcement learning approach

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2275 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8070-8075

Author(s):

Majid Mazouchi ◽

Yongliang Yang ◽

Hamidreza Modares

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Data Driven ◽

Learning Approach ◽

Multi Objective

Download Full-text