A Semi-Supervised Approach to Monocular Depth Estimation, Depth Refinement, and Semantic Segmentation of Driving Scenes using a Siamese Triple Decoder Architecture

John Paul Tan Yusiong; Prospero Clara Naval

doi:10.31449/inf.v44i4.3018

SemanticDepth: Fusing Semantic Segmentation and Monocular Depth Estimation for Enabling Autonomous Driving in Roads without Lane Lines

Sensors ◽

10.3390/s19143224 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3224 ◽

Cited By ~ 4

Author(s):

Pablo R. Palafox ◽

Johannes Betz ◽

Felix Nobis ◽

Konstantin Riedl ◽

Markus Lienkamp

Keyword(s):

Semantic Segmentation ◽

Depth Estimation ◽

Autonomous Driving ◽

Warning Systems ◽

The Road ◽

Lane Departure ◽

Rgb Images ◽

Monocular Depth ◽

On The Road ◽

The City

Typically, lane departure warning systems rely on lane lines being present on the road.However, in many scenarios, e.g., secondary roads or some streets in cities, lane lines are eithernot present or not sufficiently well signaled. In this work, we present a vision-based method tolocate a vehicle within the road when no lane lines are present using only RGB images as input.To this end, we propose to fuse together the outputs of a semantic segmentation and a monoculardepth estimation architecture to reconstruct locally a semantic 3D point cloud of the viewed scene.We only retain points belonging to the road and, additionally, to any kind of fences or walls thatmight be present right at the sides of the road. We then compute the width of the road at a certainpoint on the planned trajectory and, additionally, what we denote as the fence-to-fence distance.Our system is suited to any kind of motoring scenario and is especially useful when lane lines arenot present on the road or do not signal the path correctly. The additional fence-to-fence distancecomputation is complementary to the road’s width estimation. We quantitatively test our methodon a set of images featuring streets of the city of Munich that contain a road-fence structure, so asto compare our two proposed variants, namely the road’s width and the fence-to-fence distancecomputation. In addition, we also validate our system qualitatively on the Stuttgart sequence of thepublicly available Cityscapes dataset, where no fences or walls are present at the sides of the road,thus demonstrating that our system can be deployed in a standard city-like environment. For thebenefit of the community, we make our software open source.

Download Full-text

Monocular Depth Estimation Using Encoder-Decoder Architecture and Transfer Learning from Single RGB Image

2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) ◽

10.1109/upcon50219.2020.9376365 ◽

2020 ◽

Author(s):

Hritam Basak ◽

Sagnik Ghosal ◽

Mainak Sarkar ◽

Mayukhmali Das ◽

Soham Chattopadhyay

Keyword(s):

Transfer Learning ◽

Depth Estimation ◽

Decoder Architecture ◽

Monocular Depth ◽

Rgb Image

Download Full-text

SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches

Sensors ◽

10.3390/s21165476 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5476

Author(s):

Rui Wang ◽

Jialing Zou ◽

James Zhiqing Wen

Keyword(s):

Network Architecture ◽

Semantic Segmentation ◽

Depth Estimation ◽

Semantic Feature ◽

Semantic Features ◽

Feature Maps ◽

Task Learning ◽

Vision Sensors ◽

Monocular Depth ◽

Public Datasets

Monocular depth estimation based on unsupervised learning has attracted great attention due to the rising demand for lightweight monocular vision sensors. Inspired by multi-task learning, semantic information has been used to improve the monocular depth estimation models. However, multi-task learning is still limited by multi-type annotations. As far as we know, there are scarcely any large public datasets that provide all the necessary information. Therefore, we propose a novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics. Instead of using loss functions to relate the semantics and depth, the fusion of feature maps for semantics and depth is employed to predict the monocular depth. Therefore, two accessible datasets with similar topics for depth estimation and semantic segmentation can meet the requirements of SFA-MDEN for training sets. We explored the performance of the proposed SFA-MDEN with experiments on different datasets, including KITTI, Make3D, and our own dataset BHDE-v1. The experimental results demonstrate that SFA-MDEN achieves competitive accuracy and generalization capacity compared to state-of-the-art methods.

Download Full-text

Monocular Segment-Wise Depth: Monocular Depth Estimation Based on a Semantic Segmentation Prior

2019 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2019.8803551 ◽

2019 ◽

Cited By ~ 1

Author(s):

Amir Atapour-Abarghouei ◽

Toby P. Breckon

Keyword(s):

Semantic Segmentation ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.3dsap2_3dp2-2 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Download Full-text

On the Uncertainty of Self-Supervised Monocular Depth Estimation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00329 ◽

2020 ◽

Cited By ~ 1

Author(s):

Matteo Poggi ◽

Filippo Aleotti ◽

Fabio Tosi ◽

Stefano Mattoccia

Keyword(s):

Depth Estimation ◽

Monocular Depth

Download Full-text

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

European Conference on Visual Media Production ◽

10.1145/3429341.3429355 ◽

2020 ◽

Author(s):

Hang Zhou ◽

David Greenwood ◽

Sarah Taylor ◽

Han Gong

Keyword(s):

Constant Velocity ◽

Depth Estimation ◽

Monocular Depth ◽

Velocity Constraints

Download Full-text

Hierarchical Object Relationship Constrained Monocular Depth Estimation.

Pattern Recognition ◽

10.1016/j.patcog.2021.108116 ◽

2021 ◽

pp. 108116

Author(s):

Shuai Li ◽

Jiaying Shi ◽

Wenfeng Song ◽

Aimin Hao ◽

Hong Qin

Keyword(s):

Depth Estimation ◽

Monocular Depth ◽

Object Relationship

Download Full-text

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

Time- and Resource-Efficient Time-to-Collision Forecasting for Indoor Pedestrian Obstacles Avoidance

Journal of Imaging ◽

10.3390/jimaging7040061 ◽

2021 ◽

Vol 7 (4) ◽

pp. 61

Author(s):

David Urban ◽

Alice Caplier

Keyword(s):

Neural Network ◽

Autonomous Vehicles ◽

Depth Estimation ◽

Video Camera ◽

Obstacle Detection ◽

Navigation Systems ◽

Time To Collision ◽

Static Data ◽

Monocular Depth ◽

Fully Connected

As difficult vision-based tasks like object detection and monocular depth estimation are making their way in real-time applications and as more light weighted solutions for autonomous vehicles navigation systems are emerging, obstacle detection and collision prediction are two very challenging tasks for small embedded devices like drones. We propose a novel light weighted and time-efficient vision-based solution to predict Time-to-Collision from a monocular video camera embedded in a smartglasses device as a module of a navigation system for visually impaired pedestrians. It consists of two modules: a static data extractor made of a convolutional neural network to predict the obstacle position and distance and a dynamic data extractor that stacks the obstacle data from multiple frames and predicts the Time-to-Collision with a simple fully connected neural network. This paper focuses on the Time-to-Collision network’s ability to adapt to new sceneries with different types of obstacles with supervised learning.

Download Full-text