An end-to-end reinforcement learning method for automated guided vehicle path planning

Author(s):  
Yu Sun ◽  
Haisheng Li
Author(s):  
Zhuo Wang ◽  
Shiwei Zhang ◽  
Xiaoning Feng ◽  
Yancheng Sui

The environmental adaptability of autonomous underwater vehicles is always a problem for its path planning. Although reinforcement learning can improve the environmental adaptability, the slow convergence of reinforcement learning is caused by multi-behavior coupling, so it is difficult for autonomous underwater vehicle to avoid moving obstacles. This article proposes a multi-behavior critic reinforcement learning algorithm applied to autonomous underwater vehicle path planning to overcome problems associated with oscillating amplitudes and low learning efficiency in the early stages of training which are common in traditional actor–critic algorithms. Behavior critic reinforcement learning assesses the actions of the actor from perspectives such as energy saving and security, combining these aspects into a whole evaluation of the actor. In this article, the policy gradient method is selected as the actor part, and the value function method is selected as the critic part. The strategy gradient and the value function methods for actor and critic, respectively, are approximated by a backpropagation neural network, the parameters of which are updated using the gradient descent method. The simulation results show that the method has the ability of optimizing learning in the environment and can improve learning efficiency, which meets the needs of real time and adaptability for autonomous underwater vehicle dynamic obstacle avoidance.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yunmei Yuan ◽  
Hongyu Li ◽  
Lili Ji

Nowadays, finding the optimal route for vehicles through online vehicle path planning is one of the main problems that the logistics industry needs to solve. Due to the uncertainty of the transportation system, especially the last-mile delivery problem of small packages in uncertain logistics transportation, the calculation of logistics vehicle routing planning becomes more complex than before. Most of the existing solutions are less applied to new technologies such as machine learning, and most of them use a heuristic algorithm. This kind of solution not only needs to set a lot of constraints but also requires much calculation time in the logistics network with high demand density. To design the uncertain logistics transportation path with minimum time, this paper proposes a new optimization strategy based on deep reinforcement learning that converts the uncertain online logistics routing problems into vehicle path planning problems and designs an embedded pointer network for obtaining the optimal solution. Considering the long time to solve the neural network, it is unrealistic to train parameters through supervised data. This article uses an unsupervised method to train the parameters. Because the process of parameter training is offline, this strategy can avoid the high delay. Through the simulation part, it is not difficult to see that the strategy proposed in this paper will effectively solve the uncertain logistics scheduling problem under the limited computing time, and it is significantly better than other strategies. Compared with traditional mathematical procedures, the algorithm proposed in this paper can reduce the driving distance by 60.71%. In addition, this paper also studies the impact of some key parameters on the effect of the program.


2010 ◽  
Vol 44-47 ◽  
pp. 2116-2120
Author(s):  
Liang Tong

Because of the dynamic characteristic of high nonlinear,strong coupling and variable structure,it is difficult to perform effective controlling on the robot manipulator by conventional controlling theory.In this paper,a new approach of multi-agent reinforcement learning method based on Kohonen net is proposed which is used in the multi-agent environment of robot manipulator path-planning and the simulation experiment shows the validity of this method.


Author(s):  
Jean Phelipe De Oliveira Lima ◽  
Raimundo Correa de Oliveira ◽  
Cleinaldo de Almeida Costa

Autonomous vehicle path planning aims to allow safe and rapid movement in an environment without human interference. Recently, Reinforcement Learning methods have been used to solve this problem and have achieved satisfactory results. This work presents the use of Deep Reinforcement Learning for the task of path planning for autonomous vehicles through trajectory simulation, to define routes that offer greater safety (without collisions) and less distance for the displacement between two points. A method for creating simulation environments was developed to analyze the performance of the proposed models in different difficult degrees of circumstances. The decision-making strategy implemented was based on the use of Artificial Neural Networks of the Multilayer Perceptron type with parameters and hyperparameters determined from a grid search. The models were evaluated for their reward charts resulting from their learning process. Such evaluation occurred in two phases: isolated evaluation, in which the models were inserted into the environment without prior knowledge; and incremental evaluation, in which models were inserted in unknown environments with previous intelligence accumulated in other conditions. The results obtained are competitive with state-of-the-art works and highlight the adaptive characteristic of the models presented, which, when inserted with prior knowledge in environments, can reduce the convergence time by up to 89.47% when compared to related works.


Author(s):  
Milton Calderón ◽  
Esperanza Camargo Casallas

The mobile robots are devices with great boom given the possibilities that their utilities offer, and to a greater extent, those freelancers who do not require an operator to perform their functions. In order to consolidate the autonomy it is necessary to generate a system of planning of ways that allows a viable route and as far as possible optimal. This study develops a reactive two-dimensional path planning method with neural networks trained under the reinforcement learning method. The complexity of the scenario between the initial and final point is due to warning and forbidden obstacle zones, and the experimentation is carried out on different neural network architectures, each one as an agent of the learning-by-reinforcement algorithm, being these DQN and DDQN types. The best results are obtained with the DDQN training, reaching the objective in 89% in the validation episodes, although the DQN method shows to be 15.63% faster in its success cases. This work was carried out within the research group DIGITI of the Universidad Distrital Francisco José de Caldas.


2009 ◽  
Vol 129 (7) ◽  
pp. 1253-1263
Author(s):  
Toru Eguchi ◽  
Takaaki Sekiai ◽  
Akihiro Yamada ◽  
Satoru Shimizu ◽  
Masayuki Fukai

Sign in / Sign up

Export Citation Format

Share Document