Introduction to optimal control, adaptive control and reinforcement learning

AbstractIn this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $$\pi $$ π on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain $$\pi $$ π converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Download Full-text

Data-driven dynamic multi-objective optimal control: A Hamiltonian-inequality driven satisficing reinforcement learning approach

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2275 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8070-8075

Author(s):

Majid Mazouchi ◽

Yongliang Yang ◽

Hamidreza Modares

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Data Driven ◽

Learning Approach ◽

Multi Objective

Download Full-text

Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3045087 ◽

2021 ◽

pp. 1-10

Author(s):

Tao Bian ◽

Zhong-Ping Jiang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Nonlinear Systems ◽

Continuous Time ◽

Value Iteration ◽

Adaptive Optimal Control ◽

A Value

Download Full-text

Hierarchical Terrain-Aware Control for Quadrupedal Locomotion by Combining Deep Reinforcement Learning and Optimal Control

10.1109/iros51168.2021.9636738 ◽

2021 ◽

Author(s):

Qingfeng Yao ◽

Jilong Wang ◽

Donglin Wang ◽

Shuyu Yang ◽

Hongyin Zhang ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Quadrupedal Locomotion

Download Full-text

Adaptive Control of a Marine Vessel Based on Reinforcement Learning

2018 37th Chinese Control Conference (CCC) ◽

10.23919/chicc.2018.8482656 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhao Yin ◽

Wei He ◽

Changyin Sun ◽

Guang Li ◽

Chenguang Yang

Keyword(s):

Adaptive Control ◽

Reinforcement Learning

Download Full-text

Reinforcement Learning-Based Approximate Optimal Control for Attitude Reorientation Under State Constraints

IEEE Transactions on Control Systems Technology ◽

10.1109/tcst.2020.3007401 ◽

2020 ◽

pp. 1-10

Author(s):

Hongyang Dong ◽

Xiaowei Zhao ◽

Haoyang Yang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

State Constraints

Download Full-text

Online Optimal Control of Robotic Systems with Single Critic NN-Based Reinforcement Learning

Complexity ◽

10.1155/2021/8839391 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Xiaoyi Long ◽

Zheng He ◽

Zhongyuan Wang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Tracking Control ◽

Learning Algorithm ◽

Tracking Error ◽

Adaptive Dynamic Programming ◽

Robotic Systems ◽

Control Synthesis ◽

Optimal Tracking ◽

Optimal Tracking Control

This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)-based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state-space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady-state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton-Jacobi-Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Reinforcement Learning Based Optimal Control of Linear Singularly Perturbed Systems

IEEE Transactions on Circuits & Systems II Express Briefs ◽

10.1109/tcsii.2021.3105652 ◽

2021 ◽

pp. 1-1

Author(s):

Jianguo Zhao ◽

Chunyu Yang ◽

Weinan Gao

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Singularly Perturbed ◽

Singularly Perturbed Systems ◽

Perturbed Systems

Download Full-text

Optimal Control of Nonlinear Time-Delay Systems with Input Constraints Using Reinforcement Learning

Neural Computing for Advanced Applications - Communications in Computer and Information Science ◽

10.1007/978-981-15-7670-6_28 ◽

2020 ◽

pp. 332-344

Author(s):

Jing Zhu ◽

Peng Zhang ◽

Yijing Hou

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Time Delay ◽

Delay Systems ◽

Time Delay Systems ◽

Input Constraints

Download Full-text