Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Qianru Teng; Yimin Chen; Chen Huang

doi:10.3390/fi10100092

Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Future Internet ◽

10.3390/fi10100092 ◽

2018 ◽

Vol 10 (10) ◽

pp. 92 ◽

Cited By ~ 2

Author(s):

Qianru Teng ◽

Yimin Chen ◽

Chen Huang

Keyword(s):

Optical Flow ◽

Empirical Evaluation ◽

Depth Estimation ◽

Bundle Adjustment ◽

Geometric Constraints ◽

Camera Motion ◽

Low Level ◽

Unsupervised Neural Network ◽

Consistency Constraints ◽

Monocular Depth

We present an occlusion-aware unsupervised neural network for jointly learning three low-level vision tasks from monocular videos: depth, optical flow, and camera motion. The system consists of three different predicting sub-networks simultaneously coupled by combined loss terms and is capable of computing each task independently on test samples. Geometric constraints extracted from scene geometry which have traditionally been used in bundle adjustment or pose-graph optimization are formed as various self-supervisory signals during our end-to-end learning approach. Different from prior works, our image reconstruction loss also takes account of optical flow. Moreover, we impose novel 3D flow consistency constraints over the predictions of all the three tasks. By explicitly modeling occlusion and taking utilization of both 2D and 3D geometry relationships, abundant geometric constraints are formed over estimated outputs, enabling the system to capture both low-level representations and high-level cues to infer thinner scene structures. Empirical evaluation on the KITTI dataset demonstrates the effectiveness and improvement of our approach: (1) monocular depth estimation outperforms state-of-the-art unsupervised methods and is comparable to stereo supervised ones; (2) optical flow prediction ranks top among prior works and even beats supervised and traditional ones especially in non-occluded regions; (3) pose estimation outperforms established SLAM systems under comparable input settings with a reasonable margin.

Download Full-text

Windowed Bundle Adjustment Framework for Unsupervised Learning of Monocular Depth Estimation With U-Net Extension and Clip Loss

IEEE Robotics and Automation Letters ◽

10.1109/lra.2020.2976301 ◽

2020 ◽

Vol 5 (2) ◽

pp. 3283-3290 ◽

Cited By ~ 2

Author(s):

Lipu Zhou ◽

Michael Kaess

Keyword(s):

Unsupervised Learning ◽

Depth Estimation ◽

Bundle Adjustment ◽

Monocular Depth

Download Full-text

DENAO: Monocular Depth Estimation Network with Auxiliary Optical Flow

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2020.2977021 ◽

2020 ◽

pp. 1-1

Author(s):

Jingyu Chen ◽

Xin Yang ◽

Qizeng Jia ◽

Chunyuan Liao

Keyword(s):

Optical Flow ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Self-Supervised Monocular Depth Learning in Low-Texture Areas

Remote Sensing ◽

10.3390/rs13091673 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1673

Author(s):

Wanpeng Xu ◽

Ling Zou ◽

Lingda Wu ◽

Zhipeng Fu

Keyword(s):

Supervised Learning ◽

Network Architecture ◽

Depth Estimation ◽

Reference Image ◽

Camera Motion ◽

Feature Maps ◽

Data Set ◽

Edge Location ◽

Monocular Depth ◽

Texture Boundaries

For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.

Download Full-text

Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/134 ◽

2020 ◽

Author(s):

Mingkang Xiong ◽

Zhenghong Zhang ◽

Weilin Zhong ◽

Jinsheng Ji ◽

Jiyuan Liu ◽

...

Keyword(s):

Ground Truth ◽

Depth Estimation ◽

Visual Odometry ◽

Global Scale ◽

Geometric Constraints ◽

The Self ◽

Estimation Model ◽

Supervised Methods ◽

Monocular Depth ◽

Significant Attention

The self-supervised learning-based depth and visual odometry (VO) estimators trained on monocular videos without ground truth have drawn significant attention recently. Prior works use photometric consistency as supervision, which is fragile under complex realistic environments due to illumination variations. More importantly, it suffers from scale inconsistency in the depth and pose estimation results. In this paper, robust geometric losses are proposed to deal with this problem. Specifically, we first align the scales of two reconstructed depth maps estimated from the adjacent image frames, and then enforce forward-backward relative pose consistency to formulate scale-consistent geometric constraints. Finally, a novel training framework is constructed to implement the proposed losses. Extensive evaluations on KITTI and Make3D datasets demonstrate that, i) by incorporating the proposed constraints as supervision, the depth estimation model can achieve state-of-the-art (SOTA) performance among the self-supervised methods, and ii) it is effective to use the proposed training framework to obtain a uniform global scale VO model.

Download Full-text

Simultaneous Attack on CNN-Based Monocular Depth Estimation and Optical Flow Estimation

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2021edl8004 ◽

2021 ◽

Vol E104.D (5) ◽

pp. 785-788

Author(s):

Koichiro YAMANAKA ◽

Keita TAKAHASHI ◽

Toshiaki FUJII ◽

Ryuraroh MATSUMOTO

Keyword(s):

Optical Flow ◽

Depth Estimation ◽

Flow Estimation ◽

Optical Flow Estimation ◽

Monocular Depth ◽

Simultaneous Attack

Download Full-text

A variational approach for estimation of monocular depth and camera motion in autonomous driving

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211034332 ◽

2021 ◽

pp. 095440702110343

Author(s):

Huijuan Hu ◽

Chuan Hu ◽

Xuetao Zhang

Keyword(s):

Structure From Motion ◽

Variational Approach ◽

Depth Estimation ◽

Autonomous Driving ◽

Variational Model ◽

Camera Motion ◽

Soft Constraint ◽

Smoothness Constraint ◽

Monocular Depth ◽

Dense Correspondence

In this paper, a new direct computational approach to dense 3D reconstruction in autonomous driving is proposed to simultaneously estimate the depth and the camera motion for the motion stereo problem. A traditional Structure from Motion framework is utilized to establish geometric constrains for our variational model. The architecture is mainly composed of the texture constancy constraint, one-order motion smoothness constraint, a second-order depth regularize constraint and a soft constraint. The texture constancy constraint can improve the robustness against illumination changes. One-order motion smoothness constraint can reduce the noise in estimation of dense correspondence. The depth regularize constraint is used to handle inherent ambiguities and guarantee a smooth or piecewise smooth surface, and the soft constraint can provide a dense correspondence as initial estimation of the camera matrix to improve the robustness future. Compared to the traditional dense Structure from Motion approaches and popular stereo approaches, our monocular depth estimation results are more accurate and more robust. Even in contrast to the popular depth from single image networks, our variational approach still has good performance in estimation of monocular depth and camera motion.

Download Full-text

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.3dsap2_3dp2-2 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Download Full-text

On the Uncertainty of Self-Supervised Monocular Depth Estimation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00329 ◽

2020 ◽

Cited By ~ 1

Author(s):

Matteo Poggi ◽

Filippo Aleotti ◽

Fabio Tosi ◽

Stefano Mattoccia

Keyword(s):

Depth Estimation ◽

Monocular Depth

Download Full-text

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

European Conference on Visual Media Production ◽

10.1145/3429341.3429355 ◽

2020 ◽

Author(s):

Hang Zhou ◽

David Greenwood ◽

Sarah Taylor ◽

Han Gong

Keyword(s):

Constant Velocity ◽

Depth Estimation ◽

Monocular Depth ◽

Velocity Constraints

Download Full-text

Hierarchical Object Relationship Constrained Monocular Depth Estimation.

Pattern Recognition ◽

10.1016/j.patcog.2021.108116 ◽

2021 ◽

pp. 108116

Author(s):

Shuai Li ◽

Jiaying Shi ◽

Wenfeng Song ◽

Aimin Hao ◽

Hong Qin

Keyword(s):

Depth Estimation ◽

Monocular Depth ◽

Object Relationship

Download Full-text