Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay

MyeongSeop Kim; Dong-Ki Han; Jae-Han Park; Jung-Su Kim

doi:10.3390/app10020575

Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay

Applied Sciences ◽

10.3390/app10020575 ◽

2020 ◽

Vol 10 (2) ◽

pp. 575 ◽

Cited By ~ 2

Author(s):

MyeongSeop Kim ◽

Dong-Ki Han ◽

Jae-Han Park ◽

Jung-Su Kim

Keyword(s):

Motion Planning ◽

Manufacturing Industry ◽

Learning Algorithm ◽

Robot Manipulator ◽

Robot Manipulators ◽

Robot Systems ◽

Policy Gradient ◽

Markov Decision ◽

Experience Replay ◽

Planning Algorithm

In order to enhance performance of robot systems in the manufacturing industry, it is essential to develop motion and task planning algorithms. Especially, it is important for the motion plan to be generated automatically in order to deal with various working environments. Although PRM (Probabilistic Roadmap) provides feasible paths when the starting and goal positions of a robot manipulator are given, the path might not be smooth enough, which can lead to inefficient performance of the robot system. This paper proposes a motion planning algorithm for robot manipulators using a twin delayed deep deterministic policy gradient (TD3) which is a reinforcement learning algorithm tailored to MDP with continuous action. Besides, hindsight experience replay (HER) is employed in the TD3 to enhance sample efficiency. Since path planning for a robot manipulator is an MDP (Markov Decision Process) with sparse reward and HER can deal with such a problem, this paper proposes a motion planning algorithm using TD3 with HER. The proposed algorithm is applied to 2-DOF and 3-DOF manipulators and it is shown that the designed paths are smoother and shorter than those designed by PRM.

Download Full-text

Continuous shared control in prosthetic hand grasp tasks by Deep Deterministic Policy Gradient with Hindsight Experience Replay

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420936851 ◽

2020 ◽

Vol 17 (4) ◽

pp. 172988142093685

Author(s):

Zhaolong Gao ◽

Rongyu Tang ◽

Luyao Chen ◽

Qiang Huang ◽

Jiping He

Keyword(s):

Real Life ◽

Control Policy ◽

Adaptive Dynamic Programming ◽

Shared Control ◽

Prosthetic Hand ◽

Location Selection ◽

Policy Gradient ◽

Markov Decision ◽

Experience Replay ◽

Hand Grasp

Grasp using a prosthetic hand in real life can be a difficult task. The amputee users are often capable of planning the reaching trajectory and hand grasp location selection, however, failed in precise finger movements, such as adapting the fingers to the surface of the object without excessive force. It is much efficient to leave that part to the machine autonomy. In order to combine the intention and planning ability of users with robotic control, the shared control is introduced in which users’ inputs and robot control methods are combined to achieve a goal. The shared control problem can be formulated as a Partially Observable Markov Decision Process. To find the optimal control policy, we adopt an adaptive dynamic programming and reinforcement learning-based control algorithm-Deep Deterministic Policy Gradient combined with Hindsight Experience Replay. We proposed the algorithm with a prediction layer using the reparameterization technique. The system was tested in a modified simulation environment for the ability to follow the user’s intention and keep the contact force in boundary for safety.

Download Full-text

Motion Planning for Industrial Mobile Robots With Closed-Loop Stability Enhanced Prediction

Volume 3, Rapid Fire Interactive Presentations: Advances in Control Systems; Advances in Robotics and Mechatronics; Automotive and Transportation Systems; Motion Planning and Trajectory Tracking; Soft Mechatronic Actuators and Sensors; Unmanned Ground and Aerial Vehicles ◽

10.1115/dscc2019-9208 ◽

2019 ◽

Author(s):

Jessica Leu ◽

Masayoshi Tomizuka

Keyword(s):

Motion Planning ◽

Closed Loop ◽

Dynamic Environments ◽

Open Loop ◽

Human Robot Interaction ◽

Planning Problem ◽

Prediction Horizon ◽

Robot Systems ◽

Planning Algorithm ◽

Loop Stability

Abstract Real-time, safe, and stable motion planning in co-robot systems involving dynamic human robot interaction (HRI) remains challenging due to the time varying nature of the problem. One of the biggest challenges is to guarantee closed-loop stability of the planning algorithm in dynamic environments. Typically, this can be addressed if there exists a perfect predictor that precisely predicts the future motions of the obstacles. Unfortunately, a perfect predictor is not possible to achieve. In HRI environments in this paper, human workers and other robots are the obstacles to the ego robot. We discuss necessary conditions for the closed-loop stability of a planning problem using the framework of model predictive control (MPC). It is concluded that the predictor needs to be able to detect the obstacles’ movement mode change within a time delay allowance and the MPC needs to have a sufficient prediction horizon and a proper cost function. These allow MPC to have an uncertainty tolerance for closed-loop stability, and still avoid collision when the obstacles’ movement is not within the tolerance. Also, the closed-loop performance is investigated using a notion of M-convergence, which guarantees finite local convergence (at least M steps ahead) of the open-loop trajectories toward the closed-loop trajectory. With this notion, we verify the performance of the proposed MPC with stability enhanced prediction through simulations and experiments. With the proposed method, the robot can better deal with dynamic environments and the closed-loop cost is reduced.

Download Full-text

A simple motion-planning algorithm for general robot manipulators

IEEE Journal on Robotics and Automation ◽

10.1109/jra.1987.1087095 ◽

1987 ◽

Vol 3 (3) ◽

pp. 224-238 ◽

Cited By ~ 434

Author(s):

T. Lozano-Perez

Keyword(s):

Motion Planning ◽

Robot Manipulators ◽

Simple Motion ◽

Planning Algorithm

Download Full-text

Bi-criteria minimization with MWVN–INAM type for motion planning and control of redundant robot manipulators

Robotica ◽

10.1017/s0263574717000625 ◽

2018 ◽

Vol 36 (5) ◽

pp. 655-675 ◽

Cited By ~ 6

Author(s):

Dongsheng Guo ◽

Kene Li ◽

Bolin Liao

Keyword(s):

Motion Planning ◽

Robot Manipulator ◽

Robot Manipulators ◽

Redundancy Resolution ◽

Redundant Robot ◽

Planning And Control ◽

New Type ◽

Joint Velocity ◽

And Control ◽

Joint Acceleration

SUMMARYThis study proposes and investigates a new type of bi-criteria minimization (BCM) for the motion planning and control of redundant robot manipulators to address the discontinuity problem in the infinity-norm acceleration minimization (INAM) scheme and to guarantee the final joint velocity of motion to be approximate to zero. This new type is based on the combination of minimum weighted velocity norm (MWVN) and INAM criteria, and thus is called the MWVN–INAM–BCM scheme. In formulating such a scheme, joint-angle, joint-velocity, and joint-acceleration limits are incorporated. The proposed MWVN–INAM–BCM scheme is reformulated as a quadratic programming problem solved at the joint-acceleration level. Simulation results based on the PUMA560 robot manipulator validate the efficacy and applicability of the proposed MWVN–INAM–BCM scheme in robotic redundancy resolution. In addition, the physical realizability of the proposed scheme is verified in practical application based on a six-link planar robot manipulator.

Download Full-text

Singularity Avoidance for a Redundant Robot Using Fuzzy Motion Planning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.479-480.729 ◽

2013 ◽

Vol 479-480 ◽

pp. 729-736 ◽

Cited By ~ 2

Author(s):

Chih Jer Lin ◽

Chii Ruey Lin ◽

Shen Kai Yu ◽

Cheng Chin Han

Keyword(s):

Motion Planning ◽

Robot Manipulator ◽

Full Rank ◽

Inverse Kinematic ◽

Mapping Method ◽

Singular Points ◽

Inverse Kinematic Problem ◽

Redundant Robot ◽

Planning Algorithm ◽

Five Axes

The purpose of this study is to design a redundant robot meeting the specific task and tracking the specific trajectory. Although the Moore-Penrose pseudo-inverse kinematic is commonly using to solve the inverse kinematic problem, it cannot solve the singularity exists in some situations that the rank of Jacobian matrix is not full rank for the redundant robot. Thus, a fuzzy motion planning algorithm is proposed to solve the inverse kinematic with singularity. Finally, we can obtain the position of five axes robot manipulator using fuzzy motion planning mapping method, and the errors of the singular points are approximately to zero. The results prove the fuzzy inverse kinematic mapping method can robust singular point when the tracking path with singular points.

Download Full-text

Risk-Informed-RRT*: A Sampling-based Human-friendly Motion Planning Algorithm for Mobile Service Robots in Indoor Environments

2018 IEEE International Conference on Information and Automation (ICIA) ◽

10.1109/icinfa.2018.8812396 ◽

2018 ◽

Author(s):

Wenzheng Chi ◽

Jiankun Wang ◽

Max Qing-Hu Meng

Keyword(s):

Motion Planning ◽

Service Robots ◽

Mobile Service ◽

Indoor Environments ◽

Planning Algorithm ◽

Mobile Service Robots

Download Full-text

A new method on motion planning for mobile robots using jump point search and Bezier curves

International Journal of Advanced Robotic Systems ◽

10.1177/17298814211019220 ◽

2021 ◽

Vol 18 (4) ◽

pp. 172988142110192

Author(s):

Ben Zhang ◽

Denglin Zhu

Keyword(s):

Motion Planning ◽

Initial Point ◽

Target Point ◽

Jump Point ◽

Bezier Curves ◽

Bézier Curves ◽

Front End ◽

Planning Algorithm ◽

Point Search ◽

Path Planning Algorithm

Innovative applications in rapidly evolving domains such as robotic navigation and autonomous (driverless) vehicles rely on motion planning systems that meet the shortest path and obstacle avoidance requirements. This article proposes a novel path planning algorithm based on jump point search and Bezier curves. The proposed algorithm consists of two main steps. In the front end, the improved heuristic function based on distance and direction is used to reduce the cost, and the redundant turning points are trimmed. In the back end, a novel trajectory generation method based on Bezier curves and a straight line is proposed. Our experimental results indicate that the proposed algorithm provides a complete motion planning solution from the front end to the back end, which can realize an optimal trajectory from the initial point to the target point used for robot navigation.

Download Full-text

A 2D Optimal Path Planning Algorithm for Autonomous Underwater Vehicle Driving in Unknown Underwater Canyons

Journal of Marine Science and Engineering ◽

10.3390/jmse9030252 ◽

2021 ◽

Vol 9 (3) ◽

pp. 252

Author(s):

Yushan Sun ◽

Xiaokun Luo ◽

Xiangrui Ran ◽

Guocheng Zhang

Keyword(s):

Path Planning ◽

Obstacle Avoidance ◽

Autonomous Underwater Vehicles ◽

Optimal Path ◽

Small Scale ◽

Target Point ◽

Safe Driving ◽

Policy Gradient ◽

Planning Algorithm ◽

Path Planning Algorithm

This research aims to solve the safe navigation problem of autonomous underwater vehicles (AUVs) in deep ocean, which is a complex and changeable environment with various mountains. When an AUV reaches the deep sea navigation, it encounters many underwater canyons, and the hard valley walls threaten its safety seriously. To solve the problem on the safe driving of AUV in underwater canyons and address the potential of AUV autonomous obstacle avoidance in uncertain environments, an improved AUV path planning algorithm based on the deep deterministic policy gradient (DDPG) algorithm is proposed in this work. This method refers to an end-to-end path planning algorithm that optimizes the strategy directly. It takes sensor information as input and driving speed and yaw angle as outputs. The path planning algorithm can reach the predetermined target point while avoiding large-scale static obstacles, such as valley walls in the simulated underwater canyon environment, as well as sudden small-scale dynamic obstacles, such as marine life and other vehicles. In addition, this research aims at the multi-objective structure of the obstacle avoidance of path planning, modularized reward function design, and combined artificial potential field method to set continuous rewards. This research also proposes a new algorithm called deep SumTree-deterministic policy gradient algorithm (SumTree-DDPG), which improves the random storage and extraction strategy of DDPG algorithm experience samples. According to the importance of the experience samples, the samples are classified and stored in combination with the SumTree structure, high-quality samples are extracted continuously, and SumTree-DDPG algorithm finally improves the speed of the convergence model. Finally, this research uses Python language to write an underwater canyon simulation environment and builds a deep reinforcement learning simulation platform on a high-performance computer to conduct simulation learning training for AUV. Data simulation verified that the proposed path planning method can guide the under-actuated underwater robot to navigate to the target without colliding with any obstacles. In comparison with the DDPG algorithm, the stability, training’s total reward, and robustness of the improved Sumtree-DDPG algorithm planner in this study are better.

Download Full-text

On Probabilistic Completeness of the Generalized Shape Expansion-Based Motion Planning Algorithm

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303792 ◽

2020 ◽

Author(s):

Adhvaith Ramkumar ◽

Vrushabh Zinage ◽

Satadal Ghosh

Keyword(s):

Motion Planning ◽

Planning Algorithm

Download Full-text

NLM-HS: Navigation Learning Model Based on a Hippocampal–Striatal Circuit for Explaining Navigation Mechanisms in Animal Brains

Brain Sciences ◽

10.3390/brainsci11060803 ◽

2021 ◽

Vol 11 (6) ◽

pp. 803

Author(s):

Jie Chai ◽

Xiaogang Ruan ◽

Jing Huang

Keyword(s):

Prefrontal Cortex ◽

Learning Algorithm ◽

Learning Model ◽

Cognitive Map ◽

Model Based ◽

Animal Navigation ◽

Environmental Cognition ◽

Neurophysiological Studies ◽

Planning Algorithm ◽

Two Stages

Neurophysiological studies have shown that the hippocampus, striatum, and prefrontal cortex play different roles in animal navigation, but it is still less clear how these structures work together. In this paper, we establish a navigation learning model based on the hippocampal–striatal circuit (NLM-HS), which provides a possible explanation for the navigation mechanism in the animal brain. The hippocampal model generates a cognitive map of the environment and performs goal-directed navigation by using a place cell sequence planning algorithm. The striatal model performs reward-related habitual navigation by using the classic temporal difference learning algorithm. Since the two models may produce inconsistent behavioral decisions, the prefrontal cortex model chooses the most appropriate strategies by using a strategy arbitration mechanism. The cognitive and learning mechanism of the NLM-HS works in two stages of exploration and navigation. First, the agent uses a hippocampal model to construct the cognitive map of the unknown environment. Then, the agent uses the strategy arbitration mechanism in the prefrontal cortex model to directly decide which strategy to choose. To test the validity of the NLM-HS, the classical Tolman detour experiment was reproduced. The results show that the NLM-HS not only makes agents show environmental cognition and navigation behavior similar to animals, but also makes behavioral decisions faster and achieves better adaptivity than hippocampal or striatal models alone.

Download Full-text