scholarly journals Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

2018 ◽  
Vol 10 (10) ◽  
pp. 92 ◽  
Author(s):  
Qianru Teng ◽  
Yimin Chen ◽  
Chen Huang

We present an occlusion-aware unsupervised neural network for jointly learning three low-level vision tasks from monocular videos: depth, optical flow, and camera motion. The system consists of three different predicting sub-networks simultaneously coupled by combined loss terms and is capable of computing each task independently on test samples. Geometric constraints extracted from scene geometry which have traditionally been used in bundle adjustment or pose-graph optimization are formed as various self-supervisory signals during our end-to-end learning approach. Different from prior works, our image reconstruction loss also takes account of optical flow. Moreover, we impose novel 3D flow consistency constraints over the predictions of all the three tasks. By explicitly modeling occlusion and taking utilization of both 2D and 3D geometry relationships, abundant geometric constraints are formed over estimated outputs, enabling the system to capture both low-level representations and high-level cues to infer thinner scene structures. Empirical evaluation on the KITTI dataset demonstrates the effectiveness and improvement of our approach: (1) monocular depth estimation outperforms state-of-the-art unsupervised methods and is comparable to stereo supervised ones; (2) optical flow prediction ranks top among prior works and even beats supervised and traditional ones especially in non-occluded regions; (3) pose estimation outperforms established SLAM systems under comparable input settings with a reasonable margin.

2021 ◽  
Vol 13 (9) ◽  
pp. 1673
Author(s):  
Wanpeng Xu ◽  
Ling Zou ◽  
Lingda Wu ◽  
Zhipeng Fu

For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.


Author(s):  
Mingkang Xiong ◽  
Zhenghong Zhang ◽  
Weilin Zhong ◽  
Jinsheng Ji ◽  
Jiyuan Liu ◽  
...  

The self-supervised learning-based depth and visual odometry (VO) estimators trained on monocular videos without ground truth have drawn significant attention recently. Prior works use photometric consistency as supervision, which is fragile under complex realistic environments due to illumination variations. More importantly, it suffers from scale inconsistency in the depth and pose estimation results. In this paper, robust geometric losses are proposed to deal with this problem. Specifically, we first align the scales of two reconstructed depth maps estimated from the adjacent image frames, and then enforce forward-backward relative pose consistency to formulate scale-consistent geometric constraints. Finally, a novel training framework is constructed to implement the proposed losses. Extensive evaluations on KITTI and Make3D datasets demonstrate that, i) by incorporating the proposed constraints as supervision, the depth estimation model can achieve state-of-the-art (SOTA) performance among the self-supervised methods, and ii) it is effective to use the proposed training framework to obtain a uniform global scale VO model.


2021 ◽  
Vol E104.D (5) ◽  
pp. 785-788
Author(s):  
Koichiro YAMANAKA ◽  
Keita TAKAHASHI ◽  
Toshiaki FUJII ◽  
Ryuraroh MATSUMOTO

Author(s):  
Huijuan Hu ◽  
Chuan Hu ◽  
Xuetao Zhang

In this paper, a new direct computational approach to dense 3D reconstruction in autonomous driving is proposed to simultaneously estimate the depth and the camera motion for the motion stereo problem. A traditional Structure from Motion framework is utilized to establish geometric constrains for our variational model. The architecture is mainly composed of the texture constancy constraint, one-order motion smoothness constraint, a second-order depth regularize constraint and a soft constraint. The texture constancy constraint can improve the robustness against illumination changes. One-order motion smoothness constraint can reduce the noise in estimation of dense correspondence. The depth regularize constraint is used to handle inherent ambiguities and guarantee a smooth or piecewise smooth surface, and the soft constraint can provide a dense correspondence as initial estimation of the camera matrix to improve the robustness future. Compared to the traditional dense Structure from Motion approaches and popular stereo approaches, our monocular depth estimation results are more accurate and more robust. Even in contrast to the popular depth from single image networks, our variational approach still has good performance in estimation of monocular depth and camera motion.


Author(s):  
Chih-Shuan Huang ◽  
Wan-Nung Tsung ◽  
Wei-Jong Yang ◽  
Chin-Hsing Chen

2021 ◽  
pp. 108116
Author(s):  
Shuai Li ◽  
Jiaying Shi ◽  
Wenfeng Song ◽  
Aimin Hao ◽  
Hong Qin

Sign in / Sign up

Export Citation Format

Share Document