A Fast Stereo Matching Network with Multi-Cross Attention

Ming Wei; Ming Zhu; Yi Wu; Jiaqi Sun; Jiarong Wang; Changji Liu

doi:10.3390/s21186016

A Fast Stereo Matching Network with Multi-Cross Attention

Sensors ◽

10.3390/s21186016 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6016

Author(s):

Ming Wei ◽

Ming Zhu ◽

Yi Wu ◽

Jiaqi Sun ◽

Jiarong Wang ◽

...

Keyword(s):

Deep Learning ◽

Stereo Matching ◽

Disparity Estimation ◽

Stereo Image ◽

Matching Network ◽

Low Resolution ◽

Attention Model ◽

Binocular Stereo ◽

End To End ◽

Left Image

Stereo matching networks based on deep learning are widely developed and can obtain excellent disparity estimation. We present a new end-to-end fast deep learning stereo matching network in this work that aims to determine the corresponding disparity from two stereo image pairs. We extract the characteristics of the low-resolution feature images using the stacked hourglass structure feature extractor and build a multi-level detailed cost volume. We also use the edge of the left image to guide disparity optimization and sub-sample with the low-resolution data, ensuring excellent accuracy and speed at the same time. Furthermore, we design a multi-cross attention model for binocular stereo matching to improve the matching accuracy and achieve end-to-end disparity regression effectively. We evaluate our network on Scene Flow, KITTI2012, and KITTI2015 datasets, and the experimental results show that the speed and accuracy of our method are excellent.

Download Full-text

Matching Large Baseline Oblique Stereo Images Using an End-to-End Convolutional Neural Network

Remote Sensing ◽

10.3390/rs13020274 ◽

2021 ◽

Vol 13 (2) ◽

pp. 274

Author(s):

Guobiao Yao ◽

Alper Yilmaz ◽

Li Zhang ◽

Fei Meng ◽

Haibin Ai ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Stereo Matching ◽

Least Square ◽

Affine Invariant ◽

Stereo Images ◽

Distance Ratio ◽

Matching Algorithm ◽

End To End

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.

Download Full-text

Review of Stereo Matching Algorithms Based on Deep Learning

Computational Intelligence and Neuroscience ◽

10.1155/2020/8562323 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Kun Zhou ◽

Xiangxi Meng ◽

Bo Cheng

Keyword(s):

Deep Learning ◽

Unsupervised Learning ◽

Stereo Vision ◽

Stereo Matching ◽

Learning Algorithms ◽

Time Consumption ◽

End To End ◽

Traditional Approaches ◽

Speed Accuracy ◽

Comprehensive Coverage

Stereo vision is a flourishing field, attracting the attention of many researchers. Recently, leveraging on the development of deep learning, stereo matching algorithms have achieved remarkable performance far exceeding traditional approaches. This review presents an overview of different stereo matching algorithms based on deep learning. For convenience, we classified the algorithms into three categories: (1) non-end-to-end learning algorithms, (2) end-to-end learning algorithms, and (3) unsupervised learning algorithms. We have provided a comprehensive coverage of the remarkable approaches in each category and summarized the strengths, weaknesses, and major challenges, respectively. The speed, accuracy, and time consumption were adopted to compare the different algorithms.

Download Full-text

End-to-End Stereo Matching Network with Local Adaptive Awareness

Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing ◽

10.1145/3388818.3388822 ◽

2020 ◽

Author(s):

Chenggang Guo ◽

Dongyi Chen ◽

Zhiqi Huang

Keyword(s):

Stereo Matching ◽

Matching Network ◽

End To End

Download Full-text

UNCERTAINTY ESTIMATION FOR END-TO-END LEARNED DENSE STEREO MATCHING VIA PROBABILISTIC DEEP LEARNING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-161-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 161-169

Author(s):

M. Mehltretter

Keyword(s):

Neural Network ◽

Deep Learning ◽

Stereo Matching ◽

Probabilistic Neural Network ◽

Uncertainty Estimation ◽

Training Data ◽

Stereo Image ◽

Confidence Estimation ◽

Extensive Evaluation ◽

Dense Stereo Matching

Abstract. Motivated by the need to identify erroneous disparity assignments, various approaches for uncertainty and confidence estimation of dense stereo matching have been presented in recent years. As in many other fields, especially deep learning based methods have shown convincing results. However, most of these methods only model the uncertainty contained in the data, while ignoring the uncertainty of the employed dense stereo matching procedure. Additionally modelling the latter, however, is particularly beneficial if the domain of the training data varies from that of the data to be processed. For this purpose, in the present work the idea of probabilistic deep learning is applied to the task of dense stereo matching for the first time. Based on the well-known and commonly employed GC-Net architecture, a novel probabilistic neural network is presented, for the task of joint depth and uncertainty estimation from epipolar rectified stereo image pairs. Instead of learning the network parameters directly, the proposed probabilistic neural network learns a probability distribution from which parameters are sampled for every prediction. The variations between multiple such predictions on the same image pair allow to approximate the model uncertainty. The quality of the estimated depth and uncertainty information is assessed in an extensive evaluation on three different datasets.

Download Full-text

MDEAN: Multi-View Disparity Estimation with an Asymmetric Network

Electronics ◽

10.3390/electronics9060924 ◽

2020 ◽

Vol 9 (6) ◽

pp. 924 ◽

Cited By ~ 1

Author(s):

Zhao Pei ◽

Deqiang Wen ◽

Yanning Zhang ◽

Miao Ma ◽

Min Guo ◽

...

Keyword(s):

Deep Learning ◽

Stereo Matching ◽

Estimation Method ◽

Disparity Estimation ◽

Estimation Methods ◽

Disparity Map ◽

Plane Sweep ◽

Flat Surfaces ◽

Outdoor Scenes ◽

Traditional Image

In recent years, disparity estimation of a scene based on deep learning methods has been extensively studied and significant progress has been made. In contrast, a traditional image disparity estimation method requires considerable resources and consumes much time in processes such as stereo matching and 3D reconstruction. At present, most deep learning based disparity estimation methods focus on estimating disparity based on monocular images. Motivated by the results of traditional methods that multi-view methods are more accurate than monocular methods, especially for scenes that are textureless and have thin structures, in this paper, we present MDEAN, a new deep convolutional neural network to estimate disparity using multi-view images with an asymmetric encoder–decoder network structure. First, our method takes an arbitrary number of multi-view images as input. Next, we use these images to produce a set of plane-sweep cost volumes, which are combined to compute a high quality disparity map using an end-to-end asymmetric network. The results show that our method performs better than state-of-the-art methods, in particular, for outdoor scenes with the sky, flat surfaces and buildings.

Download Full-text

A Unified Framework for Depth Prediction from a Single Image and Binocular Stereo Matching

Remote Sensing ◽

10.3390/rs12030588 ◽

2020 ◽

Vol 12 (3) ◽

pp. 588

Author(s):

Wei Chen ◽

Xin Luo ◽

Zhengfa Liang ◽

Chen Li ◽

Mingfei Wu ◽

...

Keyword(s):

Stereo Matching ◽

Depth Information ◽

Training Procedure ◽

Single Image ◽

Unified Framework ◽

Depth Prediction ◽

Training Samples ◽

Binocular Stereo ◽

Left Image ◽

The Right

Depth information has long been an important issue in computer vision. The methods for this can be categorized into (1) depth prediction from a single image and (2) binocular stereo matching. However, these two methods are generally regarded as separate tasks, which are accomplished in different network architectures when using deep learning-based methods. This study argues that these two tasks can be achieved using only one network with the same weights. We modify existing networks for stereo matching to perform the two tasks. We first enable the network capable of accepting both a single image and an image pair by duplicating the left image when the right image is absent. Then, we introduce a training procedure that alternatively selects training samples of depth prediction from a single image and binocular stereo matching. In this manner, the trained network can perform both tasks and single-image depth prediction even benefits from stereo matching to achieve better performance. Experimental results on KITTI raw dataset show that our model achieves state-of-the-art performances for accomplishing depth prediction from a single image and binocular stereo matching in the same architecture.

Download Full-text

Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6991 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12926-12934

Author(s):

Youmin Zhang ◽

Yimin Chen ◽

Xiao Bai ◽

Suihanjin Yu ◽

Kun Yu ◽

...

Keyword(s):

Deep Learning ◽

Loss Function ◽

Stereo Matching ◽

State Of The Art ◽

Disparity Estimation ◽

Model Matching ◽

Regression Problem ◽

Unimodal Distribution ◽

Matching Process ◽

The Cost

State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/youmi-zym/AcfNet.

Download Full-text

A Window-Based Adaptive Correspondence Search Algorithm Using Mean Shift and Disparity Estimation

International Journal of Virtual Reality ◽

10.20870/ijvr.2011.10.3.2822 ◽

2011 ◽

Vol 10 (3) ◽

pp. 65-72

Author(s):

Shujun Zhang ◽

Jianbo Zhang ◽

Yun Liu

Keyword(s):

Stereo Matching ◽

Search Algorithm ◽

Mean Shift ◽

Depth Map ◽

Disparity Estimation ◽

Matching Process ◽

Window Method ◽

Binocular Stereo ◽

Correspondence Search ◽

Better Than

Current methods to solve the problem of binocular stereo matching can be divided into two categories: sparse points based methods and dense points based methods. However, both of them have different shortcomings and limitations. There is no perfect method to solve the disparity problem. Dense points based techniques relatively obtain more accurate results but with higher computation. A large number of window-based adaptive corres-pondence techniques have emerged in recent years. In order to solve the problem of high time complexity and large amount of calculation in matching process, we propose a new window-based correspondence search algorithm using mean shift and disparity estimation. Mean shift can aggregate the same or similar colors so it can be applied to pre-process the source images to reduce their dynamic color range. Disparity estimation is conducted on the pre-processed two images to compute disparities of uniform texture regions. Adaptive window matching through similarity computation and window-based support aggregation is finally executed and exact depth map is obtained. Experimental results show that our algorithm is more efficient and keeps smooth dis-parity better than the prior window method

Download Full-text

ProPaCoL-Net: A Novel Recursive Stereo Image SR Network with Progressive Parallax Coherency Learning

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-342 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 342-1-342-8

Author(s):

Jeonghun Kim ◽

Munchurl Kim

Keyword(s):

Autonomous Vehicles ◽

Stereo Matching ◽

Low Cost ◽

Super Resolution ◽

Stereo Image ◽

Stereo Images ◽

Computationally Efficient ◽

Stereo Cameras ◽

Left Image ◽

Coherency Loss

Recently, stereo cameras have been widely packed in smart phones and autonomous vehicles thanks to low cost and smallsized packages. Nevertheless, acquiring high resolution (HR) stereo images is still a challenging problem. While the traditional stereo image processing tasks have mainly focused on stereo matching, stereo super-resolution (SR) has drawn less attention which is necessitated for HR images. Some deep learning based stereo image SR works have recently shown promising results. However, they have not fully exploited binocular parallax in SR, which may lead to unrealistic visual perception. In this paper, we present a novel and computationally efficient convolutional neural network (CNN) based deep SR network for stereo images by learning parallax coherency between the left and right SR images, which is called ProPaCoL-Net. The proposed ProPaCoL-Net progressively learns parallax coherency via a novel recursive parallax coherency (RPC) module with shared parameters. The RPC module is effectively designed to extract parallax information in prior for the left image SR from its right view input images and vice versa. Furthermore, we propose a parallax coherency loss to reliably train the ProPaCoL-Net. From extensive experiments, the ProPaCoL-Net shows to outperform the very recent state-of-the-art method with average 1.15 dB higher in PSNR.

Download Full-text

A stereo matching algorithm based on the improved PSMNet

PLoS ONE ◽

10.1371/journal.pone.0251657 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0251657

Author(s):

Zedong Huang ◽

Jinan Gu ◽

Jing Li ◽

Xuefei Yu

Keyword(s):

Feature Extraction ◽

Stereo Matching ◽

Feature Fusion ◽

Disparity Estimation ◽

Disparity Map ◽

Matching Network ◽

Matching Effect ◽

Matching Cost ◽

Feature Information ◽

3D Cnn

Deep learning based on a convolutional neural network (CNN) has been successfully applied to stereo matching. Compared with the traditional method, the speed and accuracy of this method have been greatly improved. However, the existing stereo matching framework based on a CNN often encounters two problems. First, the existing stereo matching network has many parameters, which leads to the matching running time being too long. Second, the disparity estimation is inadequate in some regions where reflections, repeated textures, and fine structures may lead to ill-posed problems. Through the lightweight improvement of the PSMNet (Pyramid Stereo Matching Network) model, the common matching effect of ill-conditioned areas such as repeated texture areas and weak texture areas is solved. In the feature extraction part, ResNeXt is introduced to learn unitary feature extraction, and the ASPP (Atrous Spatial Pyramid Pooling) module is trained to extract multiscale spatial feature information. The feature fusion module is designed to effectively fuse the feature information of different scales to construct the matching cost volume. The improved 3D CNN uses the stacked encoding and decoding structure to further regularize the matching cost volume and obtain the corresponding relationship between feature points under different parallax conditions. Finally, the disparity map is obtained by a regression. We evaluate our method on the Scene Flow, KITTI 2012, and KITTI 2015 stereo datasets. The experiments show that the proposed stereo matching network achieves a comparable prediction accuracy and much faster running speed compared with PSMNet.

Download Full-text