scholarly journals Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 5991 ◽  
Author(s):  
Abhishek Gupta ◽  
Ahmed Shaharyar Khwaja ◽  
Alagan Anpalagan ◽  
Ling Guan ◽  
Bala Venkatesh

In this paper, we propose an environment perception framework for autonomous driving using state representation learning (SRL). Unlike existing Q-learning based methods for efficient environment perception and object detection, our proposed method takes the learning loss into account under deterministic as well as stochastic policy gradient. Through a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC), we focus on uninterrupted and reasonably safe autonomous driving without steering off the track for a considerable driving distance. Our proposed technique exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. To ensure the effectiveness of the scheme over a sustained period of time, we employ a reward-penalty based system where a negative reward is associated with an unfavourable action and a positive reward is awarded for favourable actions. The results obtained through simulations on DonKey simulator show the effectiveness of our proposed method by examining the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.

2021 ◽  
Author(s):  
Abhishek Gupta

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.


2021 ◽  
Author(s):  
Abhishek Gupta

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.


2021 ◽  
Vol 11 (5) ◽  
pp. 2305
Author(s):  
Yongsoon Choi ◽  
Seryong Baek ◽  
Cheonho Kim ◽  
Junkyu Yoon ◽  
Seongkwan Mark Lee

As smart cities become a global topic, interest in smart mobility, the core of smart cities, is also growing. The technology that comes closest to general users is “autonomous driving”. In particular, the successful market entry and establishment of some private companies proved that “autonomous driving” is not technology of the future but imminent reality. However, safety in autonomous vehicles that rely on sensors instead of the driver’s five senses has been the focus of attention from the beginning and continues to be so. In this study, we attempted to counter this interest. Based on the actual data of thirty traffic accidents, assuming the AEBS (Autonomous Emergency Braking System) was installed to assist the driver in safe driving, it was reinterpreted through simulation to see what changes occurred in the accident. In the computer program, PC-Crash, the results were first analyzed through simulation using Euro NCAP (New Car Assessment Program)’s AEBS test standards. Subsequently, the other variables in the AEBS were controlled and the accident was reinterpreted by changing only the angle of the radar detection sensor. As a result, it was confirmed that a total of 27 accidents out of thirty accidents could have been prevented with the AEBS. In addition, it proved that the crash avoidance rate of vehicles gradually increased as the radar angle increased.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4158 ◽  
Author(s):  
Yichao Cai ◽  
Dachuan Li ◽  
Xiao Zhou ◽  
Xingang Mou

Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a computer vision and neural networks-based drivable road region detection approach is proposed for fixed-route autonomous vehicles (e.g., shuttles, buses and other vehicles operating on fixed routes), using a vehicle-mounted camera, route map and real-time vehicle location. The key idea of the proposed approach is to fuse an image with its corresponding local route map to obtain the map-fusion image (MFI) where the information of the image and route map act as complementary to each other. The information of the image can be utilized in road regions with rich features, while local route map acts as critical heuristics that enable robust drivable road region detection in areas without clear lane marking or borders. A neural network model constructed upon the Convolutional Neural Networks (CNNs), namely FCN-VGG16, is utilized to extract the drivable road region from the fused MFI. The proposed approach is validated using real-world driving scenario videos captured by an industrial camera mounted on a testing vehicle. Experiments demonstrate that the proposed approach outperforms the conventional approach which uses non-fused images in terms of detection accuracy and robustness, and it achieves desirable robustness against undesirable illumination conditions and pavement appearance, as well as projection and map-fusion errors.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 928
Author(s):  
Man Kiat Wong ◽  
Tee Connie ◽  
Michael Kah Ong Goh ◽  
Li Pei Wong ◽  
Pin Shen Teh ◽  
...  

Background: Autonomous vehicles are important in smart transportation. Although exciting progress has been made, it remains challenging to design a safety mechanism for autonomous vehicles despite uncertainties and obstacles that occur dynamically on the road. Collision detection and avoidance are indispensable for a reliable decision-making module in autonomous driving. Methods: This study presents a robust approach for forward collision warning using vision data for autonomous vehicles on Malaysian public roads. The proposed architecture combines environment perception and lane localization to define a safe driving region for the ego vehicle. If potential risks are detected in the safe driving region, a warning will be triggered. The early warning is important to help avoid rear-end collision. Besides, an adaptive lane localization method that considers geometrical structure of the road is presented to deal with different road types. Results: Precision scores of mean average precision (mAP) 0.5, mAP 0.95 and recall of 0.14, 0.06979 and 0.6356 were found in this study. Conclusions: Experimental results have validated the effectiveness of the proposed approach under different lighting and environmental conditions.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252754
Author(s):  
Nesma M. Ashraf ◽  
Reham R. Mostafa ◽  
Rasha H. Sakr ◽  
M. Z. Rashad

Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG’s hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5053 ◽  
Author(s):  
Saba Arshad ◽  
Muhammad Sualeh ◽  
Dohyeong Kim ◽  
Dinh Van Nam ◽  
Gon-Woo Kim

In recent years, research and development of autonomous driving technology have gained much interest. Many autonomous driving frameworks have been developed in the past. However, building a safely operating fully functional autonomous driving framework is still a challenge. Several accidents have been occurred with autonomous vehicles, including Tesla and Volvo XC90, resulting in serious personal injuries and death. One of the major reasons is the increase in urbanization and mobility demands. The autonomous vehicle is expected to increase road safety while reducing road accidents that occur due to human errors. The accurate sensing of the environment and safe driving under various scenarios must be ensured to achieve the highest level of autonomy. This research presents Clothoid, a unified framework for fully autonomous vehicles, that integrates the modules of HD mapping, localization, environmental perception, path planning, and control while considering the safety, comfort, and scalability in the real traffic environment. The proposed framework enables obstacle avoidance, pedestrian safety, object detection, road blockage avoidance, path planning for single-lane and multi-lane routes, and safe driving of vehicles throughout the journey. The performance of each module has been validated in K-City under multiple scenarios where Clothoid has been driven safely from the starting point to the goal point. The vehicle was one of the top five to successfully finish the autonomous vehicle challenge (AVC) in the Hyundai AVC.


Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5626
Author(s):  
Jie Chen ◽  
Tao Wu ◽  
Meiping Shi ◽  
Wei Jiang

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.


Sign in / Sign up

Export Citation Format

Share Document