scholarly journals A Semi-Supervised Monocular Stereo Matching Method

Symmetry ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 690
Author(s):  
Zhimin Zhang ◽  
Jianzhong Qiao ◽  
Shukuan Lin

Supervised monocular depth estimation methods based on learning have shown promising results compared with the traditional methods. However, these methods require a large number of high-quality corresponding ground truth depth data as supervision labels. Due to the limitation of acquisition equipment, it is expensive and impractical to record ground truth depth for different scenes. Compared to supervised methods, the self-supervised monocular depth estimation method without using ground truth depth is a promising research direction, but self-supervised depth estimation from a single image is geometrically ambiguous and suboptimal. In this paper, we propose a novel semi-supervised monocular stereo matching method based on existing approaches to improve the accuracy of depth estimation. This idea is inspired by the experimental results of the paper that the depth estimation accuracy of a stereo pair as input is better than that of a monocular view as input in the same self-supervised network model. Therefore, we decompose the monocular depth estimation problem into two sub-problems, a right view synthesized process followed by a semi-supervised stereo matching process. In order to improve the accuracy of the synthetic right view, we innovate beyond the existing view synthesis method Deep3D by adding a left-right consistency constraint and a smoothness constraint. To reduce the error caused by the reconstructed right view, we propose a semi-supervised stereo matching model that makes use of disparity maps generated by a self-supervised stereo matching model as the supervision cues and joint self-supervised cues to optimize the stereo matching network. In the test, the two networks are able to predict the depth map directly from a single image by pipeline connecting. Both procedures not only obey geometric principles, but also improve estimation accuracy. Test results on the KITTI dataset show that this method is superior to the current mainstream monocular self-supervised depth estimation methods under the same condition.

2021 ◽  
Author(s):  
Zhimin Zhang ◽  
◽  
Jianzhong Qiao ◽  
Shukuan Lin ◽  
◽  
...  

The depth and pose information are the basic issues in the field of robotics, autonomous driving, and virtual reality, and are also the focus and difficult issues of computer vision research. The supervised monocular depth and pose estimation learning are not feasible in environments where labeled data is not abundant. Self-supervised monocular video methods can learn effectively only by applying photometric constraints without expensive ground true depth label constraints, which results in an inefficient training process and suboptimal estimation accuracy. To solve these problems, a monocular weakly supervised depth and pose estimation method based on multi-information fusion is proposed in this paper. First, we design a high-precision stereo matching method to generate a depth and pose data as the "Ground Truth" labels to solve the problem that the ground truth labels are difficult to obtain. Then, we construct a multi-information fusion network model based on the "Ground truth" labels, video sequence, and IMU information to improve the estimation accuracy. Finally, we design the loss function of supervised cues based on "Ground Truth" labels cues and self-supervised cues to optimize our model. In the testing phase, the network model can separately output high-precision depth and pose data from a monocular video sequence. The resulting model outperforms mainstream monocular depth and poses estimation methods as well as the partial stereo matching method in the challenging KITTI dataset by only using a small number of real training data(200 pairs).


2018 ◽  
Vol 15 (1) ◽  
pp. 172988141775275 ◽  
Author(s):  
Zhen Xie ◽  
Jianhua Zhang ◽  
Pengfei Wang

In this article, we focus on the problem of depth estimation from a stereo pair of event-based sensors. These sensors asynchronously capture pixel-level brightness changes information (events) instead of standard intensity images at a specified frame rate. So, these sensors provide sparse data at low latency and high temporal resolution over a wide intrascene dynamic range. However, new asynchronous, event-based processing algorithms are required to process the event streams. We propose a fully event-based stereo three-dimensional depth estimation algorithm inspired by semiglobal matching. Our algorithm considers the smoothness constraints between the nearby events to remove the ambiguous and wrong matches when only using the properties of a single event or local features. Experimental validation and comparison with several state-of-the-art, event-based stereo matching methods are provided on five different scenes of event-based stereo data sets. The results show that our method can operate well in an event-driven way and has higher estimation accuracy.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1179 ◽  
Author(s):  
Tao Huang ◽  
Shuanfeng Zhao ◽  
Longlong Geng ◽  
Qian Xu

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5389
Author(s):  
Chuanxue Song ◽  
Chunyang Qi ◽  
Shixin Song ◽  
Feng Xiao

Depth estimation of a single image presents a classic problem for computer vision, and is important for the 3D reconstruction of scenes, augmented reality, and object detection. At present, most researchers are beginning to focus on unsupervised monocular depth estimation. This paper proposes solutions to the current depth estimation problem. These solutions include a monocular depth estimation method based on uncertainty analysis, which solves the problem in which a neural network has strong expressive ability but cannot evaluate the reliability of an output result. In addition, this paper proposes a photometric loss function based on the Retinex algorithm, which solves the problem of pulling around pixels due to the presence of moving objects. We objectively compare our method to current mainstream monocular depth estimation methods and obtain satisfactory results.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.


Author(s):  
L. Madhuanand ◽  
F. Nex ◽  
M. Y. Yang

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.


2015 ◽  
Vol 2015 ◽  
pp. 1-8
Author(s):  
Xue-he Zhang ◽  
Ge Li ◽  
Chang-le Li ◽  
He Zhang ◽  
Jie Zhao ◽  
...  

To fulfill the applications on robot vision, the commonly used stereo matching method for depth estimation is supposed to be efficient in terms of running speed and disparity accuracy. Based on this requirement, Delaunay-based stereo matching method is proposed to achieve the aforementioned standards in this paper. First, a Canny edge operator is used to detect the edge points of an image as supporting points. Those points are then processed using a Delaunay triangulation algorithm to divide the whole image into a series of linked triangular facets. A proposed module composed of these facets performs a rude estimation of image disparity. According to the triangular property of shared vertices, the estimated disparity is then refined to generate the disparity map. The method is tested on Middlebury stereo pairs. The running time of the proposed method is about 1 s and the matching accuracy is 93%. Experimental results show that the proposed method improves both running speed and disparity accuracy, which forms a steady foundation and good application prospect for a robot’s path planning system with stereo camera devices.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2272 ◽  
Author(s):  
Faisal Khan ◽  
Saqib Salahuddin ◽  
Hossein Javidnia

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.


2016 ◽  
Vol 13 (6) ◽  
pp. 172988141666337 ◽  
Author(s):  
Lei He ◽  
Qiulei Dong ◽  
Guanghui Wang

Predicting depth from a single image is an important problem for understanding the 3-D geometry of a scene. Recently, the nonparametric depth sampling (DepthTransfer) has shown great potential in solving this problem, and its two key components are a Scale Invariant Feature Transform (SIFT) flow–based depth warping between the input image and its retrieved similar images and a pixel-wise depth fusion from all warped depth maps. In addition to the inherent heavy computational load in the SIFT flow computation even under a coarse-to-fine scheme, the fusion reliability is also low due to the low discriminativeness of pixel-wise description nature. This article aims at solving these two problems. First, a novel sparse SIFT flow algorithm is proposed to reduce the complexity from subquadratic to sublinear. Then, a reweighting technique is introduced where the variance of the SIFT flow descriptor is computed at every pixel and used for reweighting the data term in the conditional Markov random fields. Our proposed depth transfer method is tested on the Make3D Range Image Data and NYU Depth Dataset V2. It is shown that, with comparable depth estimation accuracy, our method is 2–3 times faster than the DepthTransfer.


Sign in / Sign up

Export Citation Format

Share Document