Self-Supervised Monocular Depth Learning in Low-Texture Areas

Wanpeng Xu; Ling Zou; Lingda Wu; Zhipeng Fu

doi:10.3390/rs13091673

Self-Supervised Monocular Depth Learning in Low-Texture Areas

Remote Sensing ◽

10.3390/rs13091673 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1673

Author(s):

Wanpeng Xu ◽

Ling Zou ◽

Lingda Wu ◽

Zhipeng Fu

Keyword(s):

Supervised Learning ◽

Network Architecture ◽

Depth Estimation ◽

Reference Image ◽

Camera Motion ◽

Feature Maps ◽

Data Set ◽

Edge Location ◽

Monocular Depth ◽

Texture Boundaries

For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.

Download Full-text

SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches

Sensors ◽

10.3390/s21165476 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5476

Author(s):

Rui Wang ◽

Jialing Zou ◽

James Zhiqing Wen

Keyword(s):

Network Architecture ◽

Semantic Segmentation ◽

Depth Estimation ◽

Semantic Feature ◽

Semantic Features ◽

Feature Maps ◽

Task Learning ◽

Vision Sensors ◽

Monocular Depth ◽

Public Datasets

Monocular depth estimation based on unsupervised learning has attracted great attention due to the rising demand for lightweight monocular vision sensors. Inspired by multi-task learning, semantic information has been used to improve the monocular depth estimation models. However, multi-task learning is still limited by multi-type annotations. As far as we know, there are scarcely any large public datasets that provide all the necessary information. Therefore, we propose a novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics. Instead of using loss functions to relate the semantics and depth, the fusion of feature maps for semantics and depth is employed to predict the monocular depth. Therefore, two accessible datasets with similar topics for depth estimation and semantic segmentation can meet the requirements of SFA-MDEN for training sets. We explored the performance of the proposed SFA-MDEN with experiments on different datasets, including KITTI, Make3D, and our own dataset BHDE-v1. The experimental results demonstrate that SFA-MDEN achieves competitive accuracy and generalization capacity compared to state-of-the-art methods.

Download Full-text

Attention-Based Self-Supervised Learning Monocular Depth Estimation With Edge Refinement

10.1109/icip42928.2021.9506510 ◽

2021 ◽

Author(s):

Chenweinan Jiang ◽

Haichun Liu ◽

Lanzhen Li ◽

Changchun Pan

Keyword(s):

Supervised Learning ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Self-Supervised Learning for Monocular Depth Estimation on Minimally Invasive Surgery Scenes

10.1109/icra48506.2021.9561508 ◽

2021 ◽

Author(s):

Shuwei Shao ◽

Zhongcai Pei ◽

Weihai Chen ◽

Baochang Zhang ◽

Xingming Wu ◽

...

Keyword(s):

Minimally Invasive Surgery ◽

Minimally Invasive ◽

Supervised Learning ◽

Invasive Surgery ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Monocular depth estimation for vision-based vehicles based on a self-supervised learning method

Autonomous Systems: Sensors, Processing, and Security for Vehicles and Infrastructure 2020 ◽

10.1117/12.2558478 ◽

2020 ◽

Author(s):

Marco Tektonidis ◽

David Monnin

Keyword(s):

Supervised Learning ◽

Depth Estimation ◽

Learning Method ◽

Monocular Depth

Download Full-text

Self-Supervised Learning of Monocular Depth Estimation Based on Progressive Strategy

IEEE Transactions on Computational Imaging ◽

10.1109/tci.2021.3069785 ◽

2021 ◽

pp. 1-1

Author(s):

Huachun Wang ◽

Xinzhu Sang ◽

Duo Chen ◽

Peng Wang ◽

Binbin Yan ◽

...

Keyword(s):

Supervised Learning ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

OptiDepthNet : A Real-time Unsupervised Monocular Depth Estimation Network

10.21203/rs.3.rs-812743/v1 ◽

2021 ◽

Author(s):

Feng Wei ◽

XingHui Yin ◽

Jie Shen ◽

HuiBin Wang

Keyword(s):

Neural Network ◽

Network Architecture ◽

Reference Value ◽

Depth Estimation ◽

Estimation Algorithm ◽

Development Direction ◽

Monocular Depth ◽

Small Robot ◽

Computing Speed ◽

Depth Learning

Abstract With the development of depth learning, the accuracy and effect of the algorithm applied to monocular depth estimation have been greatly improved, but the existing algorithms need a lot of computing resources. At present, how to apply the existing algorithms to UAV and its small robot is an urgent need.Based on full convolution neural network and Kitti dataset, this paper uses deep separable convolution to optimize the network architecture, reduce training parameters and improve computing speed. Experimental results show that our method is very effective and has a certain reference value in the development direction of monocular depth estimation algorithm.

Download Full-text

Non-Uniform Discretization-based Ordinal Regression for Monocular Depth Estimation of an Indoor Drone

Electronics ◽

10.3390/electronics9111767 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1767

Author(s):

Xiangzhu Zhang ◽

Lijia Zhang ◽

Frank L. Lewis ◽

Hailong Pei

Keyword(s):

Deep Learning ◽

Binary Classification ◽

Depth Estimation ◽

Ordinal Regression ◽

Classification Model ◽

Security Requirements ◽

Data Set ◽

Decision Algorithm ◽

Decision Area ◽

Monocular Depth

At present, the main methods of solving the monocular depth estimation for indoor drones are the simultaneous localization and mapping (SLAM) algorithm and the deep learning algorithm. SLAM requires the construction of a depth map of the unknown environment, which is slow to calculate and generally requires expensive sensors, whereas current deep learning algorithms are mostly based on binary classification or regression. The output of the binary classification model gives the decision algorithm relatively rough control over the unmanned aerial vehicle. The regression model solves the problem of the binary classification, but it carries out the same processing for long and short distances, resulting in a decline in short-range prediction performance. In order to solve the above problems, according to the characteristics of the strong order correlation of the distance value, we propose a non-uniform spacing-increasing discretization-based ordinal regression algorithm (NSIDORA) to solve the monocular depth estimation for indoor drone tasks. According to the security requirements of this task, the distance label of the data set is discretized into three major areas—the dangerous area, decision area, and safety area—and the decision area is discretized based on spacing-increasing discretization. Considering the inconsistency of ordinal regression, a new distance decoder is produced. Experimental evaluation shows that the root-mean-square error (RMSE) of NSIDORA in the decision area is 33.5% lower than that of non-uniform discretization (NUD)-based ordinal regression methods. Although it is higher overall than that of the state-of-the-art two-stream regression algorithm, the RMSE of the NSIDORA in the top 10 categories of the decision area is 21.8% lower than that of the two-stream regression algorithm. The inference speed of NSIDORA is 3.4 times faster than that of two-stream ordinal regression. Furthermore, the effectiveness of the decoder has been proved through ablation experiments.

Download Full-text

YOLO MDE: Object Detection with Monocular Depth Estimation

Electronics ◽

10.3390/electronics11010076 ◽

2021 ◽

Vol 11 (1) ◽

pp. 76

Author(s):

Jongsub Yu ◽

Hyukdoo Choi

Keyword(s):

Risk Assessment ◽

Object Detection ◽

Network Architecture ◽

Ground Truth ◽

Depth Estimation ◽

Autonomous Driving ◽

Depth Prediction ◽

Bounding Box ◽

Monocular Depth ◽

Bounding Boxes

This paper presents an object detector with depth estimation using monocular camera images. Previous detection studies have typically focused on detecting objects with 2D or 3D bounding boxes. A 3D bounding box consists of the center point, its size parameters, and heading information. However, predicting complex output compositions leads a model to have generally low performances, and it is not necessary for risk assessment for autonomous driving. We focused on predicting a single depth per object, which is essential for risk assessment for autonomous driving. Our network architecture is based on YOLO v4, which is a fast and accurate one-stage object detector. We added an additional channel to the output layer for depth estimation. To train depth prediction, we extract the closest depth from the 3D bounding box coordinates of ground truth labels in the dataset. Our model is compared with the latest studies on 3D object detection using the KITTI object detection benchmark. As a result, we show that our model achieves higher detection performance and detection speed than existing models with comparable depth accuracy.

Download Full-text

Multi-level Feature Maps Attention for Monocular Depth Estimation

10.1109/icce-asia53811.2021.9641955 ◽

2021 ◽

Author(s):

Seunghoon Lee ◽

Minhyeok Lee ◽

Sangyoon Lee

Keyword(s):

Depth Estimation ◽

Feature Maps ◽

Multi Level ◽

Monocular Depth

Download Full-text

Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Future Internet ◽

10.3390/fi10100092 ◽

2018 ◽

Vol 10 (10) ◽

pp. 92 ◽

Cited By ~ 2

Author(s):

Qianru Teng ◽

Yimin Chen ◽

Chen Huang

Keyword(s):

Optical Flow ◽

Empirical Evaluation ◽

Depth Estimation ◽

Bundle Adjustment ◽

Geometric Constraints ◽

Camera Motion ◽

Low Level ◽

Unsupervised Neural Network ◽

Consistency Constraints ◽

Monocular Depth

We present an occlusion-aware unsupervised neural network for jointly learning three low-level vision tasks from monocular videos: depth, optical flow, and camera motion. The system consists of three different predicting sub-networks simultaneously coupled by combined loss terms and is capable of computing each task independently on test samples. Geometric constraints extracted from scene geometry which have traditionally been used in bundle adjustment or pose-graph optimization are formed as various self-supervisory signals during our end-to-end learning approach. Different from prior works, our image reconstruction loss also takes account of optical flow. Moreover, we impose novel 3D flow consistency constraints over the predictions of all the three tasks. By explicitly modeling occlusion and taking utilization of both 2D and 3D geometry relationships, abundant geometric constraints are formed over estimated outputs, enabling the system to capture both low-level representations and high-level cues to infer thinner scene structures. Empirical evaluation on the KITTI dataset demonstrates the effectiveness and improvement of our approach: (1) monocular depth estimation outperforms state-of-the-art unsupervised methods and is comparable to stereo supervised ones; (2) optical flow prediction ranks top among prior works and even beats supervised and traditional ones especially in non-occluded regions; (3) pose estimation outperforms established SLAM systems under comparable input settings with a reasonable margin.

Download Full-text