scholarly journals PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5626
Author(s):  
Jie Chen ◽  
Tao Wu ◽  
Meiping Shi ◽  
Wei Jiang

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.

2021 ◽  
Author(s):  
Abhishek Gupta

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.


2021 ◽  
Author(s):  
Abhishek Gupta

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.


Author(s):  
Hao Ji ◽  
Yan Jin

Abstract Self-organizing systems (SOS) are developed to perform complex tasks in unforeseen situations with adaptability. Predefining rules for self-organizing agents can be challenging, especially in tasks with high complexity and changing environments. Our previous work has introduced a multiagent reinforcement learning (RL) model as a design approach to solving the rule generation problem of SOS. A deep multiagent RL algorithm was devised to train agents to acquire the task and self-organizing knowledge. However, the simulation was based on one specific task environment. Sensitivity of SOS to reward functions and systematic evaluation of SOS designed with multiagent RL remain an issue. In this paper, we introduced a rotation reward function to regulate agent behaviors during training and tested different weights of such reward on SOS performance in two case studies: box-pushing and T-shape assembly. Additionally, we proposed three metrics to evaluate the SOS: learning stability, quality of learned knowledge, and scalability. Results show that depending on the type of tasks; designers may choose appropriate weights of rotation reward to obtain the full potential of agents’ learning capability. Good learning stability and quality of knowledge can be achieved with an optimal range of team sizes. Scaling up to larger team sizes has better performance than scaling downwards.


Author(s):  
Yalda Rahmati ◽  
Mohammadreza Khajeh Hosseini ◽  
Alireza Talebpour ◽  
Benjamin Swain ◽  
Christopher Nelson

Despite numerous studies on general human–robot interactions, in the context of transportation, automated vehicle (AV)–human driver interaction is not a well-studied subject. These vehicles have fundamentally different decision-making logic compared with human drivers and the driving interactions between AVs and humans can potentially change traffic flow dynamics. Accordingly, through an experimental study, this paper investigates whether there is a difference between human–human and human–AV interactions on the road. This study focuses on car-following behavior and conducted several car-following experiments utilizing Texas A&M University’s automated Chevy Bolt. Utilizing NGSIM US-101 dataset, two scenarios for a platoon of three vehicles were considered. For both scenarios, the leader of the platoon follows a series of speed profiles extracted from the NGSIM dataset. The second vehicle in the platoon can be either another human-driven vehicle (scenario A) or an AV (scenario B). Data is collected from the third vehicle in the platoon to characterize the changes in driving behavior when following an AV. A data-driven and a model-based approach were used to identify possible changes in driving behavior from scenario A to scenario B. The findings suggested there is a statistically significant difference between human drivers’ behavior in these two scenarios and human drivers felt more comfortable following the AV. Simulation results also revealed the importance of capturing these changes in human behavior in microscopic simulation models of mixed driving environments.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4703
Author(s):  
Yookhyun Yoon ◽  
Taeyeon Kim ◽  
Ho Lee ◽  
Jahnghyon Park

For driving safely and comfortably, the long-term trajectory prediction of surrounding vehicles is essential for autonomous vehicles. For handling the uncertain nature of trajectory prediction, deep-learning-based approaches have been proposed previously. An on-road vehicle must obey road geometry, i.e., it should run within the constraint of the road shape. Herein, we present a novel road-aware trajectory prediction method which leverages the use of high-definition maps with a deep learning network. We developed a data-efficient learning framework for the trajectory prediction network in the curvilinear coordinate system of the road and a lane assignment for the surrounding vehicles. Then, we proposed a novel output-constrained sequence-to-sequence trajectory prediction network to incorporate the structural constraints of the road. Our method uses these structural constraints as prior knowledge for the prediction network. It is not only used as an input to the trajectory prediction network, but is also included in the constrained loss function of the maneuver recognition network. Accordingly, the proposed method can predict a feasible and realistic intention of the driver and trajectory. Our method has been evaluated using a real traffic dataset, and the results thus obtained show that it is data-efficient and can predict reasonable trajectories at merging sections.


2015 ◽  
Vol 27 (6) ◽  
pp. 660-670 ◽  
Author(s):  
Udara Eshan Manawadu ◽  
◽  
Masaaki Ishikawa ◽  
Mitsuhiro Kamezaki ◽  
Shigeki Sugano ◽  
...  

<div class=""abs_img""><img src=""[disp_template_path]/JRM/abst-image/00270006/08.jpg"" width=""300"" /> Driving simulator</div>Intelligent passenger vehicles with autonomous capabilities will be commonplace on our roads in the near future. These vehicles will reshape the existing relationship between the driver and vehicle. Therefore, to create a new type of rewarding relationship, it is important to analyze when drivers prefer autonomous vehicles to manually-driven (conventional) vehicles. This paper documents a driving simulator-based study conducted to identify the preferences and individual driving experiences of novice and experienced drivers of autonomous and conventional vehicles under different traffic and road conditions. We first developed a simplified driving simulator that could connect to different driver-vehicle interfaces (DVI). We then created virtual environments consisting of scenarios and events that drivers encounter in real-world driving, and we implemented fully autonomous driving. We then conducted experiments to clarify how the autonomous driving experience differed for the two groups. The results showed that experienced drivers opt for conventional driving overall, mainly due to the flexibility and driving pleasure it offers, while novices tend to prefer autonomous driving due to its inherent ease and safety. A further analysis indicated that drivers preferred to use both autonomous and conventional driving methods interchangeably, depending on the road and traffic conditions.


Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3318 ◽  
Author(s):  
Carlos Martínez ◽  
Felipe Jiménez

Autonomous driving is undergoing huge developments nowadays. It is expected that its implementation will bring many benefits. Autonomous cars must deal with tasks at different levels. Although some of them are currently solved, and perception systems provide quite an accurate and complete description of the environment, high-level decisions are hard to obtain in challenging scenarios. Moreover, they must comply with safety, reliability and predictability requirements, road user acceptance, and comfort specifications. This paper presents a path planning algorithm based on potential fields. Potential models are adjusted so that their behavior is appropriate to the environment and the dynamics of the vehicle and they can face almost any unexpected scenarios. The response of the system considers the road characteristics (e.g., maximum speed, lane line curvature, etc.) and the presence of obstacles and other users. The algorithm has been tested on an automated vehicle equipped with a GPS receiver, an inertial measurement unit and a computer vision system in real environments with satisfactory results.


Author(s):  
Zhenhai Gao ◽  
Xiangtong Yan ◽  
Fei Gao ◽  
Lei He

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.


2021 ◽  
Vol 11 (15) ◽  
pp. 6685
Author(s):  
Dongyeon Yu ◽  
Chanho Park ◽  
Hoseung Choi ◽  
Donggyu Kim ◽  
Sung-Ho Hwang

According to SAE J3016, autonomous driving can be divided into six levels, and partially automated driving is possible from level three up. A partially or highly automated vehicle can encounter situations involving total system failure. Here, we studied a strategy for safe takeover in such situations. A human-in-the-loop simulator, driver-vehicle interface, and driver monitoring system were developed, and takeover experiments were performed using various driving scenarios and realistic autonomous driving situations. The experiments allowed us to draw the following conclusions. The visual–auditory–haptic complex alarm effectively delivered warnings and had a clear correlation with the user’s subjective preferences. There were scenario types in which the system had to immediately enter minimum risk maneuvers or emergency maneuvers without requesting takeover. Lastly, the risk of accidents can be reduced by the driver monitoring system that prevents the driver from being completely immersed in non-driving-related tasks. We proposed a safe takeover strategy from these results, which provides meaningful guidance for the development of autonomous vehicles. Considering the subjective questionnaire evaluations of users, it is expected to improve the acceptance of autonomous vehicles and increase the adoption of autonomous vehicles.


Sensors ◽  
2019 ◽  
Vol 19 (21) ◽  
pp. 4711 ◽  
Author(s):  
Kewei Wang ◽  
Fuwu Yan ◽  
Bin Zou ◽  
Luqi Tang ◽  
Quan Yuan ◽  
...  

The deep convolutional neural network has led the trend of vision-based road detection, however, obtaining a full road area despite the occlusion from monocular vision remains challenging due to the dynamic scenes in autonomous driving. Inferring the occluded road area requires a comprehensive understanding of the geometry and the semantics of the visible scene. To this end, we create a small but effective dataset based on the KITTI dataset named KITTI-OFRS (KITTI-occlusion-free road segmentation) dataset and propose a lightweight and efficient, fully convolutional neural network called OFRSNet (occlusion-free road segmentation network) that learns to predict occluded portions of the road in the semantic domain by looking around foreground objects and visible road layout. In particular, the global context module is used to build up the down-sampling and joint context up-sampling block in our network, which promotes the performance of the network. Moreover, a spatially-weighted cross-entropy loss is designed to significantly increases the accuracy of this task. Extensive experiments on different datasets verify the effectiveness of the proposed approach, and comparisons with current excellent methods show that the proposed method outperforms the baseline models by obtaining a better trade-off between accuracy and runtime, which makes our approach is able to be applied to autonomous vehicles in real-time.


Sign in / Sign up

Export Citation Format

Share Document