scholarly journals Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay

2020 ◽  
Vol 10 (2) ◽  
pp. 575 ◽  
Author(s):  
MyeongSeop Kim ◽  
Dong-Ki Han ◽  
Jae-Han Park ◽  
Jung-Su Kim

In order to enhance performance of robot systems in the manufacturing industry, it is essential to develop motion and task planning algorithms. Especially, it is important for the motion plan to be generated automatically in order to deal with various working environments. Although PRM (Probabilistic Roadmap) provides feasible paths when the starting and goal positions of a robot manipulator are given, the path might not be smooth enough, which can lead to inefficient performance of the robot system. This paper proposes a motion planning algorithm for robot manipulators using a twin delayed deep deterministic policy gradient (TD3) which is a reinforcement learning algorithm tailored to MDP with continuous action. Besides, hindsight experience replay (HER) is employed in the TD3 to enhance sample efficiency. Since path planning for a robot manipulator is an MDP (Markov Decision Process) with sparse reward and HER can deal with such a problem, this paper proposes a motion planning algorithm using TD3 with HER. The proposed algorithm is applied to 2-DOF and 3-DOF manipulators and it is shown that the designed paths are smoother and shorter than those designed by PRM.

2020 ◽  
Vol 17 (4) ◽  
pp. 172988142093685
Author(s):  
Zhaolong Gao ◽  
Rongyu Tang ◽  
Luyao Chen ◽  
Qiang Huang ◽  
Jiping He

Grasp using a prosthetic hand in real life can be a difficult task. The amputee users are often capable of planning the reaching trajectory and hand grasp location selection, however, failed in precise finger movements, such as adapting the fingers to the surface of the object without excessive force. It is much efficient to leave that part to the machine autonomy. In order to combine the intention and planning ability of users with robotic control, the shared control is introduced in which users’ inputs and robot control methods are combined to achieve a goal. The shared control problem can be formulated as a Partially Observable Markov Decision Process. To find the optimal control policy, we adopt an adaptive dynamic programming and reinforcement learning-based control algorithm-Deep Deterministic Policy Gradient combined with Hindsight Experience Replay. We proposed the algorithm with a prediction layer using the reparameterization technique. The system was tested in a modified simulation environment for the ability to follow the user’s intention and keep the contact force in boundary for safety.


Author(s):  
Jessica Leu ◽  
Masayoshi Tomizuka

Abstract Real-time, safe, and stable motion planning in co-robot systems involving dynamic human robot interaction (HRI) remains challenging due to the time varying nature of the problem. One of the biggest challenges is to guarantee closed-loop stability of the planning algorithm in dynamic environments. Typically, this can be addressed if there exists a perfect predictor that precisely predicts the future motions of the obstacles. Unfortunately, a perfect predictor is not possible to achieve. In HRI environments in this paper, human workers and other robots are the obstacles to the ego robot. We discuss necessary conditions for the closed-loop stability of a planning problem using the framework of model predictive control (MPC). It is concluded that the predictor needs to be able to detect the obstacles’ movement mode change within a time delay allowance and the MPC needs to have a sufficient prediction horizon and a proper cost function. These allow MPC to have an uncertainty tolerance for closed-loop stability, and still avoid collision when the obstacles’ movement is not within the tolerance. Also, the closed-loop performance is investigated using a notion of M-convergence, which guarantees finite local convergence (at least M steps ahead) of the open-loop trajectories toward the closed-loop trajectory. With this notion, we verify the performance of the proposed MPC with stability enhanced prediction through simulations and experiments. With the proposed method, the robot can better deal with dynamic environments and the closed-loop cost is reduced.


Robotica ◽  
2018 ◽  
Vol 36 (5) ◽  
pp. 655-675 ◽  
Author(s):  
Dongsheng Guo ◽  
Kene Li ◽  
Bolin Liao

SUMMARYThis study proposes and investigates a new type of bi-criteria minimization (BCM) for the motion planning and control of redundant robot manipulators to address the discontinuity problem in the infinity-norm acceleration minimization (INAM) scheme and to guarantee the final joint velocity of motion to be approximate to zero. This new type is based on the combination of minimum weighted velocity norm (MWVN) and INAM criteria, and thus is called the MWVN–INAM–BCM scheme. In formulating such a scheme, joint-angle, joint-velocity, and joint-acceleration limits are incorporated. The proposed MWVN–INAM–BCM scheme is reformulated as a quadratic programming problem solved at the joint-acceleration level. Simulation results based on the PUMA560 robot manipulator validate the efficacy and applicability of the proposed MWVN–INAM–BCM scheme in robotic redundancy resolution. In addition, the physical realizability of the proposed scheme is verified in practical application based on a six-link planar robot manipulator.


2013 ◽  
Vol 479-480 ◽  
pp. 729-736 ◽  
Author(s):  
Chih Jer Lin ◽  
Chii Ruey Lin ◽  
Shen Kai Yu ◽  
Cheng Chin Han

The purpose of this study is to design a redundant robot meeting the specific task and tracking the specific trajectory. Although the Moore-Penrose pseudo-inverse kinematic is commonly using to solve the inverse kinematic problem, it cannot solve the singularity exists in some situations that the rank of Jacobian matrix is not full rank for the redundant robot. Thus, a fuzzy motion planning algorithm is proposed to solve the inverse kinematic with singularity. Finally, we can obtain the position of five axes robot manipulator using fuzzy motion planning mapping method, and the errors of the singular points are approximately to zero. The results prove the fuzzy inverse kinematic mapping method can robust singular point when the tracking path with singular points.


2021 ◽  
Vol 18 (4) ◽  
pp. 172988142110192
Author(s):  
Ben Zhang ◽  
Denglin Zhu

Innovative applications in rapidly evolving domains such as robotic navigation and autonomous (driverless) vehicles rely on motion planning systems that meet the shortest path and obstacle avoidance requirements. This article proposes a novel path planning algorithm based on jump point search and Bezier curves. The proposed algorithm consists of two main steps. In the front end, the improved heuristic function based on distance and direction is used to reduce the cost, and the redundant turning points are trimmed. In the back end, a novel trajectory generation method based on Bezier curves and a straight line is proposed. Our experimental results indicate that the proposed algorithm provides a complete motion planning solution from the front end to the back end, which can realize an optimal trajectory from the initial point to the target point used for robot navigation.


2021 ◽  
Vol 9 (3) ◽  
pp. 252
Author(s):  
Yushan Sun ◽  
Xiaokun Luo ◽  
Xiangrui Ran ◽  
Guocheng Zhang

This research aims to solve the safe navigation problem of autonomous underwater vehicles (AUVs) in deep ocean, which is a complex and changeable environment with various mountains. When an AUV reaches the deep sea navigation, it encounters many underwater canyons, and the hard valley walls threaten its safety seriously. To solve the problem on the safe driving of AUV in underwater canyons and address the potential of AUV autonomous obstacle avoidance in uncertain environments, an improved AUV path planning algorithm based on the deep deterministic policy gradient (DDPG) algorithm is proposed in this work. This method refers to an end-to-end path planning algorithm that optimizes the strategy directly. It takes sensor information as input and driving speed and yaw angle as outputs. The path planning algorithm can reach the predetermined target point while avoiding large-scale static obstacles, such as valley walls in the simulated underwater canyon environment, as well as sudden small-scale dynamic obstacles, such as marine life and other vehicles. In addition, this research aims at the multi-objective structure of the obstacle avoidance of path planning, modularized reward function design, and combined artificial potential field method to set continuous rewards. This research also proposes a new algorithm called deep SumTree-deterministic policy gradient algorithm (SumTree-DDPG), which improves the random storage and extraction strategy of DDPG algorithm experience samples. According to the importance of the experience samples, the samples are classified and stored in combination with the SumTree structure, high-quality samples are extracted continuously, and SumTree-DDPG algorithm finally improves the speed of the convergence model. Finally, this research uses Python language to write an underwater canyon simulation environment and builds a deep reinforcement learning simulation platform on a high-performance computer to conduct simulation learning training for AUV. Data simulation verified that the proposed path planning method can guide the under-actuated underwater robot to navigate to the target without colliding with any obstacles. In comparison with the DDPG algorithm, the stability, training’s total reward, and robustness of the improved Sumtree-DDPG algorithm planner in this study are better.


2021 ◽  
Vol 11 (6) ◽  
pp. 803
Author(s):  
Jie Chai ◽  
Xiaogang Ruan ◽  
Jing Huang

Neurophysiological studies have shown that the hippocampus, striatum, and prefrontal cortex play different roles in animal navigation, but it is still less clear how these structures work together. In this paper, we establish a navigation learning model based on the hippocampal–striatal circuit (NLM-HS), which provides a possible explanation for the navigation mechanism in the animal brain. The hippocampal model generates a cognitive map of the environment and performs goal-directed navigation by using a place cell sequence planning algorithm. The striatal model performs reward-related habitual navigation by using the classic temporal difference learning algorithm. Since the two models may produce inconsistent behavioral decisions, the prefrontal cortex model chooses the most appropriate strategies by using a strategy arbitration mechanism. The cognitive and learning mechanism of the NLM-HS works in two stages of exploration and navigation. First, the agent uses a hippocampal model to construct the cognitive map of the unknown environment. Then, the agent uses the strategy arbitration mechanism in the prefrontal cortex model to directly decide which strategy to choose. To test the validity of the NLM-HS, the classical Tolman detour experiment was reproduced. The results show that the NLM-HS not only makes agents show environmental cognition and navigation behavior similar to animals, but also makes behavioral decisions faster and achieves better adaptivity than hippocampal or striatal models alone.


Sign in / Sign up

Export Citation Format

Share Document