scholarly journals Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection

2020 ◽  
Vol 10 (16) ◽  
pp. 5722 ◽  
Author(s):  
Duy Quang Tran ◽  
Sang-Hoon Bae

Advanced deep reinforcement learning shows promise as an approach to addressing continuous control tasks, especially in mixed-autonomy traffic. In this study, we present a deep reinforcement-learning-based model that considers the effectiveness of leading autonomous vehicles in mixed-autonomy traffic at a non-signalized intersection. This model integrates the Flow framework, the simulation of urban mobility simulator, and a reinforcement learning library. We also propose a set of proximal policy optimization hyperparameters to obtain reliable simulation performance. First, the leading autonomous vehicles at the non-signalized intersection are considered with varying autonomous vehicle penetration rates that range from 10% to 100% in 10% increments. Second, the proximal policy optimization hyperparameters are input into the multiple perceptron algorithm for the leading autonomous vehicle experiment. Finally, the superiority of the proposed model is evaluated using all human-driven vehicle and leading human-driven vehicle experiments. We demonstrate that full-autonomy traffic can improve the average speed and delay time by 1.38 times and 2.55 times, respectively, compared with all human-driven vehicle experiments. Our proposed method generates more positive effects when the autonomous vehicle penetration rate increases. Additionally, the leading autonomous vehicle experiment can be used to dissipate the stop-and-go waves at a non-signalized intersection.

2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


Author(s):  
I-Ming Chen ◽  
Ching-Yao Chan

Path tracking is an essential task for autonomous vehicles (AV), for which controllers are designed to issue commands so that the AV will follow the planned path properly to ensure operational safety, comfort, and efficiency. While solving the time-varying nonlinear vehicle dynamic problem is still challenging today, deep neural network (NN) methods, with their capability to deal with nonlinear systems, provide an alternative approach to tackle the difficulties. This study explores the potential of using deep reinforcement learning (DRL) for vehicle control and applies it to the path tracking task. In this study, proximal policy optimization (PPO) is selected as the DRL algorithm and is combined with the conventional pure pursuit (PP) method to structure the vehicle controller architecture. The PP method is used to generate a baseline steering control command, and the PPO is used to derive a correction command to mitigate the inaccuracy associated with the baseline from PP. The blend of the two controllers makes the overall operation more robust and adaptive and attains the optimality to improve tracking performance. In this paper, the structure, settings and training process of the PPO are described. Simulation experiments are carried out based on the proposed methodology, and the results show that the path tracking capability in a low-speed driving condition is significantly enhanced.


Author(s):  
Óscar Pérez-Gil ◽  
Rafael Barea ◽  
Elena López-Guillén ◽  
Luis M. Bergasa ◽  
Carlos Gómez-Huélamo ◽  
...  

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.


Author(s):  
Rui Li ◽  
Weitian Wang ◽  
Yi Chen ◽  
Srivatsan Srinivasan ◽  
Venkat N. Krovi

Fully automatic parking (FAP) is a key step towards the age of autonomous vehicle. Motivated by the contribution of human vision to human parking, in this paper, we propose a computer vision based FAP method for the autonomous vehicles. Based on the input images from a rear camera on the vehicle, a convolutional neural network (CNN) is trained to automatically output the steering and velocity commands for the vehicle controlling. The CNN is trained by Caffe deep learning framework. A 1/10th autonomous vehicle research platform (1/10-SAVRP), which configured with a vehicle controller unit, an automated driving processor, and a rear camera, is used for demonstrating the parking maneuver. The experimental results suggested that the proposed approach enabled the vehicle to gain the ability of parking independently without human input in different driving settings.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2032
Author(s):  
Sampo Kuutti ◽  
Richard Bowden ◽  
Saber Fallah

The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.


10.29007/dkzb ◽  
2018 ◽  
Author(s):  
Nishant Kheterpal ◽  
Kanaad Parvate ◽  
Cathy Wu ◽  
Aboudy Kreidieh ◽  
Eugene Vinitsky ◽  
...  

We detail the motivation and design decisions underpinning Flow, a computational framework integrating SUMO with the deep reinforcement learning libraries rllab and RLlib, allowing researchers to apply deep reinforcement learning (RL) methods to traffic scenarios, and permitting vehicle and infrastructure control in highly varied traffic envi- ronments. Users of Flow can rapidly design a wide variety of traffic scenarios in SUMO, enabling the development of controllers for autonomous vehicles and intelligent infrastruc- ture across a broad range of settings.Flow facilitates the use of policy optimization algorithms to train controllers that can optimize for highly customizable traffic metrics, such as traffic flow or system-wide average velocity. Training reinforcement learning agents using such methods requires a massive amount of data, thus simulator reliability and scalability were major challenges in the development of Flow. A contribution of this work is a variety of practical techniques for overcoming such challenges with SUMO, including parallelizing policy rollouts, smart exception and collision handling, and leveraging subscriptions to reduce computational overhead.To demonstrate the resulting performance and reliability of Flow, we introduce the canonical single-lane ring road benchmark and briefly discuss prior work regarding that task. We then pose a more complex and challenging multi-lane setting and present a trained controller for a single vehicle that stabilizes the system. Flow is an open-source tool and available online at https://github.com/cathywu/flow.


Author(s):  
Xiaoteng Ma ◽  
Xiaohang Tang ◽  
Li Xia ◽  
Jun Yang ◽  
Qianchuan Zhao

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.


Author(s):  
Hongbo Gao ◽  
Guanya Shi ◽  
Kelong Wang ◽  
Guotao Xie ◽  
Yuchao Liu

Purpose Over the past decades, there has been significant research effort dedicated to the development of autonomous vehicles. The decision-making system, which is responsible for driving safety, is one of the most important technologies for autonomous vehicles. The purpose of this study is the use of an intensive learning method combined with car-following data by a driving simulator to obtain an explanatory learning following algorithm and establish an anthropomorphic car-following model. Design/methodology/approach This paper proposed car-following method based on reinforcement learning for autonomous vehicles decision-making. An approximator is used to approximate the value function by determining state space, action space and state transition relationship. A gradient descent method is used to solve the parameter. Findings The effect of car-following on certain driving styles is initially achieved through the simulation of step conditions. The effect of car-following initially proves that the reinforcement learning system is more adaptive to car following and that it has certain explanatory and stability based on the explicit calculation of R. Originality/value The simulation results show that the car-following method based on reinforcement learning for autonomous vehicle decision-making realizes reliable car-following decision-making and has the advantages of simple sample, small amount of data, simple algorithm and good robustness.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2120
Author(s):  
Ying Ji ◽  
Jianhui Wang ◽  
Jiacan Xu ◽  
Donglin Li

The proliferation of distributed renewable energy resources (RESs) poses major challenges to the operation of microgrids due to uncertainty. Traditional online scheduling approaches relying on accurate forecasts become difficult to implement due to the increase of uncertain RESs. Although several data-driven methods have been proposed recently to overcome the challenge, they generally suffer from a scalability issue due to the limited ability to optimize high-dimensional continuous control variables. To address these issues, we propose a data-driven online scheduling method for microgrid energy optimization based on continuous-control deep reinforcement learning (DRL). We formulate the online scheduling problem as a Markov decision process (MDP). The objective is to minimize the operating cost of the microgrid considering the uncertainty of RESs generation, load demand, and electricity prices. To learn the optimal scheduling strategy, a Gated Recurrent Unit (GRU)-based network is designed to extract temporal features of uncertainty and generate the optimal scheduling decisions in an end-to-end manner. To optimize the policy with high-dimensional and continuous actions, proximal policy optimization (PPO) is employed to train the neural network-based policy in a data-driven fashion. The proposed method does not require any forecasting information on the uncertainty or a prior knowledge of the physical model of the microgrid. Simulation results using realistic power system data of California Independent System Operator (CAISO) demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document