Application of Deep Reinforcement Learning Algorithm in Uncertain Logistics Transportation Scheduling

Computational Intelligence and Neuroscience ◽

10.1155/2021/5672227 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yunmei Yuan ◽

Hongyu Li ◽

Lili Ji

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

New Technologies ◽

Learning Algorithm ◽

Computing Time ◽

Optimal Solution ◽

Optimization Strategy ◽

Logistics Industry ◽

Vehicle Path ◽

The Impact

Nowadays, finding the optimal route for vehicles through online vehicle path planning is one of the main problems that the logistics industry needs to solve. Due to the uncertainty of the transportation system, especially the last-mile delivery problem of small packages in uncertain logistics transportation, the calculation of logistics vehicle routing planning becomes more complex than before. Most of the existing solutions are less applied to new technologies such as machine learning, and most of them use a heuristic algorithm. This kind of solution not only needs to set a lot of constraints but also requires much calculation time in the logistics network with high demand density. To design the uncertain logistics transportation path with minimum time, this paper proposes a new optimization strategy based on deep reinforcement learning that converts the uncertain online logistics routing problems into vehicle path planning problems and designs an embedded pointer network for obtaining the optimal solution. Considering the long time to solve the neural network, it is unrealistic to train parameters through supervised data. This article uses an unsupervised method to train the parameters. Because the process of parameter training is offline, this strategy can avoid the high delay. Through the simulation part, it is not difficult to see that the strategy proposed in this paper will effectively solve the uncertain logistics scheduling problem under the limited computing time, and it is significantly better than other strategies. Compared with traditional mathematical procedures, the algorithm proposed in this paper can reduce the driving distance by 60.71%. In addition, this paper also studies the impact of some key parameters on the effect of the program.

Download Full-text

Exploring optimal control of epidemic spread using reinforcement learning

Scientific Reports ◽

10.1038/s41598-020-79147-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Abu Quwsar Ohi ◽

M. F. Mridha ◽

Muhammad Mostafa Monowar ◽

Md. Abdul Hamid

Keyword(s):

Reinforcement Learning ◽

Economic Crisis ◽

Transmission Rate ◽

Optimal Solution ◽

Short Length ◽

Reproduction Rate ◽

Economic Factors ◽

Epidemic Spread ◽

The Impact ◽

Individual Human

AbstractPandemic defines the global outbreak of a disease having a high transmission rate. The impact of a pandemic situation can be lessened by restricting the movement of the mass. However, one of its concomitant circumstances is an economic crisis. In this article, we demonstrate what actions an agent (trained using reinforcement learning) may take in different possible scenarios of a pandemic depending on the spread of disease and economic factors. To train the agent, we design a virtual pandemic scenario closely related to the present COVID-19 crisis. Then, we apply reinforcement learning, a branch of artificial intelligence, that deals with how an individual (human/machine) should interact on an environment (real/virtual) to achieve the cherished goal. Finally, we demonstrate what optimal actions the agent perform to reduce the spread of disease while considering the economic factors. In our experiment, we let the agent find an optimal solution without providing any prior knowledge. After training, we observed that the agent places a long length lockdown to reduce the first surge of a disease. Furthermore, the agent places a combination of cyclic lockdowns and short length lockdowns to halt the resurgence of the disease. Analyzing the agent’s performed actions, we discover that the agent decides movement restrictions not only based on the number of the infectious population but also considering the reproduction rate of the disease. The estimation and policy of the agent may improve the human-strategy of placing lockdown so that an economic crisis may be avoided while mitigating an infectious disease.

Download Full-text

An end-to-end reinforcement learning method for automated guided vehicle path planning

International Symposium on Artificial Intelligence and Robotics 2020 ◽

10.1117/12.2579792 ◽

2020 ◽

Author(s):

Yu Sun ◽

Haisheng Li

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Automated Guided Vehicle ◽

Learning Method ◽

End To End ◽

Vehicle Path

Download Full-text

Path Planning Collision Avoidance using Reinforcement Learning

10.48011/asba.v2i1.1597 ◽

2020 ◽

Author(s):

Josias G. Batista ◽

Felipe J. S. Vasconcelos ◽

Kaio M. Ramos ◽

Darielson A. Souza ◽

José L. N. Silva

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Production Process ◽

Collision Avoidance ◽

Production Systems ◽

Learning Algorithm ◽

Computational Cost ◽

Trajectory Generation ◽

Industrial Robots ◽

Q Learning

Industrial robots have grown over the years making production systems more and more efficient, requiring the need for efficient trajectory generation algorithms that optimize and, if possible, generate collision-free trajectories without interrupting the production process. In this work is presented the use of Reinforcement Learning (RL), based on the Q-Learning algorithm, in the trajectory generation of a robotic manipulator and also a comparison of its use with and without constraints of the manipulator kinematics, in order to generate collisionfree trajectories. The results of the simulations are presented with respect to the efficiency of the algorithm and its use in trajectory generation, a comparison of the computational cost for the use of constraints is also presented.

Download Full-text

A Review of Mobile Robot Path Planning Based on Deep Reinforcement Learning Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012011 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012011

Author(s):

Yanwei Zhao ◽

Yinong Zhang ◽

Shuying Wang

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Path Planning ◽

Mobile Robot ◽

Video Game ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Basic Knowledge ◽

Target Point ◽

Reinforcement Learning Algorithm

Abstract Path planning refers to that the mobile robot can obtain the surrounding environment information and its own state information through the sensor carried by itself, which can avoid obstacles and move towards the target point. Deep reinforcement learning consists of two parts: reinforcement learning and deep learning, mainly used to deal with perception and decision-making problems, has become an important research branch in the field of artificial intelligence. This paper first introduces the basic knowledge of deep learning and reinforcement learning. Then, the research status of deep reinforcement learning algorithm based on value function and strategy gradient in path planning is described, and the application research of deep reinforcement learning in computer game, video game and autonomous navigation is described. Finally, I made a brief summary and outlook on the algorithms and applications of deep reinforcement learning.

Download Full-text

Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

Journal of Artificial Intelligence Research ◽

10.1613/jair.1666 ◽

2005 ◽

Vol 24 ◽

pp. 81-108 ◽

Cited By ~ 65

Author(s):

P. Geibel ◽

F. Wysotzki

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Optimal Solution ◽

Feed Tank ◽

Model Free ◽

Constrained Problem ◽

Risk Sensitive ◽

Markov Decision ◽

The Value Function

In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

Download Full-text

Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/0959651820937085 ◽

2020 ◽

pp. 095965182093708

Author(s):

Zhuo Wang ◽

Shiwei Zhang ◽

Xiaoning Feng ◽

Yancheng Sui

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Value Function ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicle ◽

Learning Efficiency ◽

Environmental Adaptability ◽

Vehicle Path ◽

The Value Function

The environmental adaptability of autonomous underwater vehicles is always a problem for its path planning. Although reinforcement learning can improve the environmental adaptability, the slow convergence of reinforcement learning is caused by multi-behavior coupling, so it is difficult for autonomous underwater vehicle to avoid moving obstacles. This article proposes a multi-behavior critic reinforcement learning algorithm applied to autonomous underwater vehicle path planning to overcome problems associated with oscillating amplitudes and low learning efficiency in the early stages of training which are common in traditional actor–critic algorithms. Behavior critic reinforcement learning assesses the actions of the actor from perspectives such as energy saving and security, combining these aspects into a whole evaluation of the actor. In this article, the policy gradient method is selected as the actor part, and the value function method is selected as the critic part. The strategy gradient and the value function methods for actor and critic, respectively, are approximated by a backpropagation neural network, the parameters of which are updated using the gradient descent method. The simulation results show that the method has the ability of optimizing learning in the environment and can improve learning efficiency, which meets the needs of real time and adaptability for autonomous underwater vehicle dynamic obstacle avoidance.

Download Full-text

Research on task offloading based on deep reinforcement learning in mobile edge environment

MATEC Web of Conferences ◽

10.1051/matecconf/202030903026 ◽

2020 ◽

Vol 309 ◽

pp. 03026

Author(s):

Xia Gao ◽

Fangqin Xu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Edge Computing ◽

Internet Technology ◽

Good Effect ◽

Mobile Edge Computing ◽

Data Set ◽

The Impact ◽

Task Offloading ◽

Reinforcement Learning Algorithm

With the rapid development of Internet technology and mobile terminals, users’ demand for high-speed networks is increasing. Mobile edge computing proposes a distributed caching approach to deal with the impact of massive data traffic on communication networks, in order to reduce network latency and improve user service quality. In this paper, a deep reinforcement learning algorithm is proposed to solve the task unloading problem of multi-service nodes. The simulation platform iFogSim and data set Google Cluster Trace are used to carry out experiments. The final results show that the task offloading strategy based on DDQN algorithm has a good effect on energy consumption and cost, it has verified the application prospect of deep reinforcement learning algorithm in mobile edge computing.

Download Full-text

Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle

Science Progress ◽

10.1177/0036850419879024 ◽

2019 ◽

Vol 103 (1) ◽

pp. 003685041987902 ◽

Cited By ~ 2

Author(s):

Ronglei Xie ◽

Zhijun Meng ◽

Yaoming Zhou ◽

Yunpeng Ma ◽

Zhe Wu

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Learning Algorithm ◽

Three Dimensional ◽

Convergence Speed ◽

Average Reward ◽

Heuristic Function ◽

Experience Replay ◽

Aerial Vehicle

In order to solve the problem that the existing reinforcement learning algorithm is difficult to converge due to the excessive state space of the three-dimensional path planning of the unmanned aerial vehicle, this article proposes a reinforcement learning algorithm based on the heuristic function and the maximum average reward value of the experience replay mechanism. The knowledge of track performance is introduced to construct heuristic function to guide the unmanned aerial vehicles’ action selection and reduce the useless exploration. Experience replay mechanism based on maximum average reward increases the utilization rate of excellent samples and the convergence speed of the algorithm. The simulation results show that the proposed three-dimensional path planning algorithm has good learning efficiency, and the convergence speed and training performance are significantly improved.

Download Full-text

Research on Reinforcement Learning Algorithm for Path Planning of Multiple Mobile Robots

Journal of Physics Conference Series ◽

10.1088/1742-6596/1915/4/042022 ◽

2021 ◽

Vol 1915 (4) ◽

pp. 042022

Author(s):

Ya Xu

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Mobile Robots ◽

Learning Algorithm ◽

Multiple Mobile Robots ◽

Reinforcement Learning Algorithm

Download Full-text

Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments

IEEE Access ◽

10.1109/access.2021.3057485 ◽

2021 ◽

Vol 9 ◽

pp. 24884-24900

Author(s):

Ronglei Xie ◽

Zhijun Meng ◽

Lifeng Wang ◽

Haochen Li ◽

Kaipeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Large Scale ◽

Dynamic Environments ◽

Planning Algorithm ◽

Aerial Vehicle ◽

Vehicle Path ◽

Path Planning Algorithm

Download Full-text