scholarly journals A Fast Stereo Matching Network with Multi-Cross Attention

Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6016
Author(s):  
Ming Wei ◽  
Ming Zhu ◽  
Yi Wu ◽  
Jiaqi Sun ◽  
Jiarong Wang ◽  
...  

Stereo matching networks based on deep learning are widely developed and can obtain excellent disparity estimation. We present a new end-to-end fast deep learning stereo matching network in this work that aims to determine the corresponding disparity from two stereo image pairs. We extract the characteristics of the low-resolution feature images using the stacked hourglass structure feature extractor and build a multi-level detailed cost volume. We also use the edge of the left image to guide disparity optimization and sub-sample with the low-resolution data, ensuring excellent accuracy and speed at the same time. Furthermore, we design a multi-cross attention model for binocular stereo matching to improve the matching accuracy and achieve end-to-end disparity regression effectively. We evaluate our network on Scene Flow, KITTI2012, and KITTI2015 datasets, and the experimental results show that the speed and accuracy of our method are excellent.

2021 ◽  
Vol 13 (2) ◽  
pp. 274
Author(s):  
Guobiao Yao ◽  
Alper Yilmaz ◽  
Li Zhang ◽  
Fei Meng ◽  
Haibin Ai ◽  
...  

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.


2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Kun Zhou ◽  
Xiangxi Meng ◽  
Bo Cheng

Stereo vision is a flourishing field, attracting the attention of many researchers. Recently, leveraging on the development of deep learning, stereo matching algorithms have achieved remarkable performance far exceeding traditional approaches. This review presents an overview of different stereo matching algorithms based on deep learning. For convenience, we classified the algorithms into three categories: (1) non-end-to-end learning algorithms, (2) end-to-end learning algorithms, and (3) unsupervised learning algorithms. We have provided a comprehensive coverage of the remarkable approaches in each category and summarized the strengths, weaknesses, and major challenges, respectively. The speed, accuracy, and time consumption were adopted to compare the different algorithms.


Author(s):  
M. Mehltretter

Abstract. Motivated by the need to identify erroneous disparity assignments, various approaches for uncertainty and confidence estimation of dense stereo matching have been presented in recent years. As in many other fields, especially deep learning based methods have shown convincing results. However, most of these methods only model the uncertainty contained in the data, while ignoring the uncertainty of the employed dense stereo matching procedure. Additionally modelling the latter, however, is particularly beneficial if the domain of the training data varies from that of the data to be processed. For this purpose, in the present work the idea of probabilistic deep learning is applied to the task of dense stereo matching for the first time. Based on the well-known and commonly employed GC-Net architecture, a novel probabilistic neural network is presented, for the task of joint depth and uncertainty estimation from epipolar rectified stereo image pairs. Instead of learning the network parameters directly, the proposed probabilistic neural network learns a probability distribution from which parameters are sampled for every prediction. The variations between multiple such predictions on the same image pair allow to approximate the model uncertainty. The quality of the estimated depth and uncertainty information is assessed in an extensive evaluation on three different datasets.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 924 ◽  
Author(s):  
Zhao Pei ◽  
Deqiang Wen ◽  
Yanning Zhang ◽  
Miao Ma ◽  
Min Guo ◽  
...  

In recent years, disparity estimation of a scene based on deep learning methods has been extensively studied and significant progress has been made. In contrast, a traditional image disparity estimation method requires considerable resources and consumes much time in processes such as stereo matching and 3D reconstruction. At present, most deep learning based disparity estimation methods focus on estimating disparity based on monocular images. Motivated by the results of traditional methods that multi-view methods are more accurate than monocular methods, especially for scenes that are textureless and have thin structures, in this paper, we present MDEAN, a new deep convolutional neural network to estimate disparity using multi-view images with an asymmetric encoder–decoder network structure. First, our method takes an arbitrary number of multi-view images as input. Next, we use these images to produce a set of plane-sweep cost volumes, which are combined to compute a high quality disparity map using an end-to-end asymmetric network. The results show that our method performs better than state-of-the-art methods, in particular, for outdoor scenes with the sky, flat surfaces and buildings.


2020 ◽  
Vol 12 (3) ◽  
pp. 588
Author(s):  
Wei Chen ◽  
Xin Luo ◽  
Zhengfa Liang ◽  
Chen Li ◽  
Mingfei Wu ◽  
...  

Depth information has long been an important issue in computer vision. The methods for this can be categorized into (1) depth prediction from a single image and (2) binocular stereo matching. However, these two methods are generally regarded as separate tasks, which are accomplished in different network architectures when using deep learning-based methods. This study argues that these two tasks can be achieved using only one network with the same weights. We modify existing networks for stereo matching to perform the two tasks. We first enable the network capable of accepting both a single image and an image pair by duplicating the left image when the right image is absent. Then, we introduce a training procedure that alternatively selects training samples of depth prediction from a single image and binocular stereo matching. In this manner, the trained network can perform both tasks and single-image depth prediction even benefits from stereo matching to achieve better performance. Experimental results on KITTI raw dataset show that our model achieves state-of-the-art performances for accomplishing depth prediction from a single image and binocular stereo matching in the same architecture.


2020 ◽  
Vol 34 (07) ◽  
pp. 12926-12934
Author(s):  
Youmin Zhang ◽  
Yimin Chen ◽  
Xiao Bai ◽  
Suihanjin Yu ◽  
Kun Yu ◽  
...  

State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/youmi-zym/AcfNet.


2011 ◽  
Vol 10 (3) ◽  
pp. 65-72
Author(s):  
Shujun Zhang ◽  
Jianbo Zhang ◽  
Yun Liu

Current methods to solve the problem of binocular stereo matching can be divided into two categories: sparse points based methods and dense points based methods. However, both of them have different shortcomings and limitations. There is no perfect method to solve the disparity problem. Dense points based techniques relatively obtain more accurate results but with higher computation. A large number of window-based adaptive corres-pondence techniques have emerged in recent years. In order to solve the problem of high time complexity and large amount of calculation in matching process, we propose a new window-based correspondence search algorithm using mean shift and disparity estimation. Mean shift can aggregate the same or similar colors so it can be applied to pre-process the source images to reduce their dynamic color range. Disparity estimation is conducted on the pre-processed two images to compute disparities of uniform texture regions. Adaptive window matching through similarity computation and window-based support aggregation is finally executed and exact depth map is obtained. Experimental results show that our algorithm is more efficient and keeps smooth dis-parity better than the prior window method


2020 ◽  
Vol 2020 (14) ◽  
pp. 342-1-342-8
Author(s):  
Jeonghun Kim ◽  
Munchurl Kim

Recently, stereo cameras have been widely packed in smart phones and autonomous vehicles thanks to low cost and smallsized packages. Nevertheless, acquiring high resolution (HR) stereo images is still a challenging problem. While the traditional stereo image processing tasks have mainly focused on stereo matching, stereo super-resolution (SR) has drawn less attention which is necessitated for HR images. Some deep learning based stereo image SR works have recently shown promising results. However, they have not fully exploited binocular parallax in SR, which may lead to unrealistic visual perception. In this paper, we present a novel and computationally efficient convolutional neural network (CNN) based deep SR network for stereo images by learning parallax coherency between the left and right SR images, which is called ProPaCoL-Net. The proposed ProPaCoL-Net progressively learns parallax coherency via a novel recursive parallax coherency (RPC) module with shared parameters. The RPC module is effectively designed to extract parallax information in prior for the left image SR from its right view input images and vice versa. Furthermore, we propose a parallax coherency loss to reliably train the ProPaCoL-Net. From extensive experiments, the ProPaCoL-Net shows to outperform the very recent state-of-the-art method with average 1.15 dB higher in PSNR.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0251657
Author(s):  
Zedong Huang ◽  
Jinan Gu ◽  
Jing Li ◽  
Xuefei Yu

Deep learning based on a convolutional neural network (CNN) has been successfully applied to stereo matching. Compared with the traditional method, the speed and accuracy of this method have been greatly improved. However, the existing stereo matching framework based on a CNN often encounters two problems. First, the existing stereo matching network has many parameters, which leads to the matching running time being too long. Second, the disparity estimation is inadequate in some regions where reflections, repeated textures, and fine structures may lead to ill-posed problems. Through the lightweight improvement of the PSMNet (Pyramid Stereo Matching Network) model, the common matching effect of ill-conditioned areas such as repeated texture areas and weak texture areas is solved. In the feature extraction part, ResNeXt is introduced to learn unitary feature extraction, and the ASPP (Atrous Spatial Pyramid Pooling) module is trained to extract multiscale spatial feature information. The feature fusion module is designed to effectively fuse the feature information of different scales to construct the matching cost volume. The improved 3D CNN uses the stacked encoding and decoding structure to further regularize the matching cost volume and obtain the corresponding relationship between feature points under different parallax conditions. Finally, the disparity map is obtained by a regression. We evaluate our method on the Scene Flow, KITTI 2012, and KITTI 2015 stereo datasets. The experiments show that the proposed stereo matching network achieves a comparable prediction accuracy and much faster running speed compared with PSMNet.


Sign in / Sign up

Export Citation Format

Share Document