Reliability-Based Reinforcement Learning Under Uncertainty

Author(s):  
Zequn Wang ◽  
Narendra Patwardhan

Abstract Despite the numerous advances, reinforcement learning remains away from widespread acceptance for autonomous controller design as compared to classical methods due to lack of ability to effectively tackle uncertainty. The reliance on absolute or deterministic reward as a metric for optimization process renders reinforcement learning highly susceptible to changes in problem dynamics. We introduce a novel framework that effectively quantify the uncertainty in the design space and induces robustness in controllers by switching to a reliability-based optimization routine. A model-based approach is used to improve the data efficiency of the method while predicting the system dynamics. We prove the stability of learned neuro-controllers in both static and dynamic environments on classical reinforcement learning tasks such as Cart Pole balancing and Inverted Pendulum.

2013 ◽  
Vol 631-632 ◽  
pp. 1342-1347
Author(s):  
Xu Cao ◽  
Nian Feng Li ◽  
Hua Xun Zhang

For the high order, unstable, multivariable, nonlinear and strong coupling characteristics, robust stability is an important indicator of inverted pendulum system. In this paper an LQR robust controller of inverter pendulum system is designed. The simulation and the experimental results showed that the stability of the robust LQR controller is better than the original LQR controller. When the system departure counterpoise for all kinds of reasons, it get back equilibrium state without depleting any energy, and approach state of equilibrium of all state component.


Author(s):  
Tung-Long Vuong ◽  
Do-Van Nguyen ◽  
Tai-Long Nguyen ◽  
Cong-Minh Bui ◽  
Hai-Dang Kieu ◽  
...  

In multitask reinforcement learning, tasks often have sub-tasks that share the same solution, even though the overall tasks are different. If the shared-portions could be effectively identified, then the learning process could be improved since all the samples between tasks in the shared space could be used. In this paper, we propose a Sharing Experience Framework (SEF) for simultaneously training of multiple tasks. In SEF, a confidence sharing agent uses task-specific rewards from the environment to identify similar parts that should be shared across tasks and defines those parts as shared-regions between tasks. The shared-regions are expected to guide task-policies sharing their experience during the learning process. The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Jian Sun ◽  
Jie Li

The large scale, time varying, and diversification of physically coupled networked infrastructures such as power grid and transportation system lead to the complexity of their controller design, implementation, and expansion. For tackling these challenges, we suggest an online distributed reinforcement learning control algorithm with the one-layer neural network for each subsystem or called agents to adapt the variation of the networked infrastructures. Each controller includes a critic network and action network for approximating strategy utility function and desired control law, respectively. For avoiding a large number of trials and improving the stability, the training of action network introduces supervised learning mechanisms into reduction of long-term cost. The stability of the control system with learning algorithm is analyzed; the upper bound of the tracking error and neural network weights are also estimated. The effectiveness of our proposed controller is illustrated in the simulation; the results indicate the stability under communication delay and disturbances as well.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


Author(s):  
Gokhan Demirkiran ◽  
Ozcan Erdener ◽  
Onay Akpinar ◽  
Pelin Demirtas ◽  
M. Yagiz Arik ◽  
...  

2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


Processes ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 823
Author(s):  
Wen-Jer Chang ◽  
Yu-Wei Lin ◽  
Yann-Horng Lin ◽  
Chin-Lin Pen ◽  
Ming-Hsuan Tsai

In many practical systems, stochastic behaviors usually occur and need to be considered in the controller design. To ensure the system performance under the effect of stochastic behaviors, the controller may become bigger even beyond the capacity of practical applications. Therefore, the actuator saturation problem also must be considered in the controller design. The type-2 Takagi-Sugeno (T-S) fuzzy model can describe the parameter uncertainties more completely than the type-1 T-S fuzzy model for a class of nonlinear systems. A fuzzy controller design method is proposed in this paper based on the Interval Type-2 (IT2) T-S fuzzy model for stochastic nonlinear systems subject to actuator saturation. The stability analysis and some corresponding sufficient conditions for the IT2 T-S fuzzy model are developed using Lyapunov theory. Via transferring the stability and control problem into Linear Matrix Inequality (LMI) problem, the proposed fuzzy control problem can be solved by the convex optimization algorithm. Finally, a nonlinear ship steering system is considered in the simulations to verify the feasibility and efficiency of the proposed fuzzy controller design method.


2013 ◽  
Vol 2013 ◽  
pp. 1-15
Author(s):  
Wei Xu ◽  
Ke Zhao ◽  
Yatao Li ◽  
Peitao Cheng

This paper addresses the functional representation based on the event model. In the event model, the ontology is defined based on the theory of propositional logic to describe the connotation of the event, and the variant is defined based on the theories of domain relational calculus and set theory to express the variation range of the event, which is alterable part of the event under the constraints of the ontology. Function is an important concept in conceptual design and has its connotation and extension. The functional representation is proposed based on the event model. The ontology of event is used to describe the connotation of function and to reflect the stability of function. The variant of the event is used to represent the extension and to incarnate the variety of function. The extension of function is the change range of function under the constraints of the connotation. The proposed functional representation divides the function into the immutable part and the alterable part, facilitating the expansion of design space. A functional reasoning model is also put forward based on the event model to support the function reasoning on the computers. Finally, a simple case validates the feasibility of the model.


Sign in / Sign up

Export Citation Format

Share Document