Monocular depth estimation for vision-based vehicles based on a self-supervised learning method

Author(s):  
Marco Tektonidis ◽  
David Monnin
2021 ◽  
Vol 13 (9) ◽  
pp. 1673
Author(s):  
Wanpeng Xu ◽  
Ling Zou ◽  
Lingda Wu ◽  
Zhipeng Fu

For the task of monocular depth estimation, self-supervised learning supervises training by calculating the pixel difference between the target image and the warped reference image, obtaining results comparable to those with full supervision. However, the problematic pixels in low-texture regions are ignored, since most researchers think that no pixels violate the assumption of camera motion, taking stereo pairs as the input in self-supervised learning, which leads to the optimization problem in these regions. To tackle this problem, we perform photometric loss using the lowest-level feature maps instead and implement first- and second-order smoothing to the depth, ensuring consistent gradients ring optimization. Given the shortcomings of ResNet as the backbone, we propose a new depth estimation network architecture to improve edge location accuracy and obtain clear outline information even in smoothed low-texture boundaries. To acquire more stable and reliable quantitative evaluation results, we introce a virtual data set in the self-supervised task because these have dense depth maps corresponding to pixel by pixel. We achieve performance that exceeds that of the prior methods on both the Eigen Splits of the KITTI and VKITTI2 data sets taking stereo pairs as the input.


2021 ◽  
Author(s):  
Chenweinan Jiang ◽  
Haichun Liu ◽  
Lanzhen Li ◽  
Changchun Pan

2021 ◽  
Author(s):  
Shuwei Shao ◽  
Zhongcai Pei ◽  
Weihai Chen ◽  
Baochang Zhang ◽  
Xingming Wu ◽  
...  

Author(s):  
Huachun Wang ◽  
Xinzhu Sang ◽  
Duo Chen ◽  
Peng Wang ◽  
Binbin Yan ◽  
...  

Author(s):  
M. Hermann ◽  
B. Ruf ◽  
M. Weinmann ◽  
S. Hinz

Abstract. Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.


Author(s):  
Chih-Shuan Huang ◽  
Wan-Nung Tsung ◽  
Wei-Jong Yang ◽  
Chin-Hsing Chen

2020 ◽  
Author(s):  
Xu Zheng ◽  
Yan Song ◽  
Jie Yan ◽  
Li-Rong Dai ◽  
Ian McLoughlin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document