Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning

Ao Xi; Chao Chen

doi:10.3390/s20164468

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning

Sensors ◽

10.3390/s20164468 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4468

Author(s):

Ao Xi ◽

Chao Chen

Keyword(s):

Reinforcement Learning ◽

Center Of Pressure ◽

Stable State ◽

Biped Robot ◽

Action Space ◽

Training Procedure ◽

Joint Angles ◽

Model Free ◽

Initial Control ◽

Hybrid Reinforcement

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

Download Full-text

Routing of Electric Vehicles With Intermediary Charging Stations: A Reinforcement Learning Approach

Frontiers in Big Data ◽

10.3389/fdata.2021.586481 ◽

2021 ◽

Vol 4 ◽

Author(s):

Marina Dorokhova ◽

Christophe Ballif ◽

Nicolas Wyrsch

Keyword(s):

Reinforcement Learning ◽

Electric Vehicles ◽

Mathematical Formulation ◽

Route Planning ◽

Learning Approach ◽

Training Procedure ◽

Routing Problem ◽

Policy Model ◽

Model Free ◽

Charging Stations

In the past few years, the importance of electric mobility has increased in response to growing concerns about climate change. However, limited cruising range and sparse charging infrastructure could restrain a massive deployment of electric vehicles (EVs). To mitigate the problem, the need for optimal route planning algorithms emerged. In this paper, we propose a mathematical formulation of the EV-specific routing problem in a graph-theoretical context, which incorporates the ability of EVs to recuperate energy. Furthermore, we consider a possibility to recharge on the way using intermediary charging stations. As a possible solution method, we present an off-policy model-free reinforcement learning approach that aims to generate energy feasible paths for EV from source to target. The algorithm was implemented and tested on a case study of a road network in Switzerland. The training procedure requires low computing and memory demands and is suitable for online applications. The results achieved demonstrate the algorithm’s capability to take recharging decisions and produce desired energy feasible paths.

Download Full-text

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2007.4399531 ◽

2007 ◽

Cited By ~ 17

Author(s):

Axel Rottmann ◽

Christian Plagemann ◽

Peter Hilgers ◽

Wolfram Burgard

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Model Free ◽

Continuous State

Download Full-text

Path-Integral-Based Reinforcement Learning Algorithm for Goal-Directed Locomotion of Snake-Shaped Robot

Discrete Dynamics in Nature and Society ◽

10.1155/2021/8824377 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Qi Yongqiang ◽

Yang Hailan ◽

Rong Dan ◽

Ke Yi ◽

Lu Dongchen ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Integral ◽

Learning Algorithm ◽

Control Parameters ◽

Complex Environment ◽

Joint Angles ◽

Q Learning ◽

Model Free ◽

Action Strategies ◽

Simulation Results

This paper proposes a goal-directed locomotion method for a snake-shaped robot in 3D complex environment based on path-integral reinforcement learning. This method uses a model-free online Q-learning algorithm to evaluate action strategies and optimize decision-making through repeated “exploration-learning-utilization” processes to complete snake-shaped robot goal-directed locomotion in 3D complex environment. The proper locomotion control parameters such as joint angles and screw-drive velocities can be learned by path-integral reinforcement learning, and the learned parameters were successfully transferred to the snake-shaped robot. Simulation results show that the planned path can avoid all obstacles and reach the destination smoothly and swiftly.

Download Full-text

Walking Control of a Biped Robot on Static and Rotating Platforms Based on Hybrid Reinforcement Learning

IEEE Access ◽

10.1109/access.2020.3015506 ◽

2020 ◽

Vol 8 ◽

pp. 148411-148424

Author(s):

Ao Xi ◽

Chao Chen

Keyword(s):

Reinforcement Learning ◽

Biped Robot ◽

Walking Control ◽

Hybrid Reinforcement

Download Full-text

Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization

International Journal of Humanoid Robotics ◽

10.1142/s0219843614500248 ◽

2014 ◽

Vol 11 (03) ◽

pp. 1450024 ◽

Cited By ~ 6

Author(s):

Paweł Wawrzyński

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Control Systems ◽

Humanoid Robot ◽

Learning Algorithm ◽

Robot Dynamics ◽

Fast Walking ◽

Model Free ◽

Initial Control ◽

Experience Replay

In this paper, a control system for humanoid robot walking is approximately optimized by means of reinforcement learning. Given is a 18 DOF humanoid whose gait is based on replaying a simple trajectory. This trajectory is translated into a reactive policy. A neural network whose input represents the robot state learns to produce appropriate output that additively modifies the initial control. The learning algorithm applied is actor–critic with experience replay. In 50 min of learning, the slow initial gait changes to a dexterous and fast walking. No model of the robot dynamics is engaged. The methodology in use is generic and can be applied to optimize control systems for diverse robots of comparable complexity.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text