scholarly journals Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning

Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4468
Author(s):  
Ao Xi ◽  
Chao Chen

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

2021 ◽  
Vol 4 ◽  
Author(s):  
Marina Dorokhova ◽  
Christophe Ballif ◽  
Nicolas Wyrsch

In the past few years, the importance of electric mobility has increased in response to growing concerns about climate change. However, limited cruising range and sparse charging infrastructure could restrain a massive deployment of electric vehicles (EVs). To mitigate the problem, the need for optimal route planning algorithms emerged. In this paper, we propose a mathematical formulation of the EV-specific routing problem in a graph-theoretical context, which incorporates the ability of EVs to recuperate energy. Furthermore, we consider a possibility to recharge on the way using intermediary charging stations. As a possible solution method, we present an off-policy model-free reinforcement learning approach that aims to generate energy feasible paths for EV from source to target. The algorithm was implemented and tested on a case study of a road network in Switzerland. The training procedure requires low computing and memory demands and is suitable for online applications. The results achieved demonstrate the algorithm’s capability to take recharging decisions and produce desired energy feasible paths.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Qi Yongqiang ◽  
Yang Hailan ◽  
Rong Dan ◽  
Ke Yi ◽  
Lu Dongchen ◽  
...  

This paper proposes a goal-directed locomotion method for a snake-shaped robot in 3D complex environment based on path-integral reinforcement learning. This method uses a model-free online Q-learning algorithm to evaluate action strategies and optimize decision-making through repeated “exploration-learning-utilization” processes to complete snake-shaped robot goal-directed locomotion in 3D complex environment. The proper locomotion control parameters such as joint angles and screw-drive velocities can be learned by path-integral reinforcement learning, and the learned parameters were successfully transferred to the snake-shaped robot. Simulation results show that the planned path can avoid all obstacles and reach the destination smoothly and swiftly.


2014 ◽  
Vol 11 (03) ◽  
pp. 1450024 ◽  
Author(s):  
Paweł Wawrzyński

In this paper, a control system for humanoid robot walking is approximately optimized by means of reinforcement learning. Given is a 18 DOF humanoid whose gait is based on replaying a simple trajectory. This trajectory is translated into a reactive policy. A neural network whose input represents the robot state learns to produce appropriate output that additively modifies the initial control. The learning algorithm applied is actor–critic with experience replay. In 50 min of learning, the slow initial gait changes to a dexterous and fast walking. No model of the robot dynamics is engaged. The methodology in use is generic and can be applied to optimize control systems for diverse robots of comparable complexity.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


Sign in / Sign up

Export Citation Format

Share Document