Absolute Distance Prediction Based on Deep Learning Object Detection and Monocular Depth Estimation Models

Mapping Intimacies ◽

10.3233/faia210151 ◽

2021 ◽

Author(s):

Armin Masoumian ◽

David G.F. Marei ◽

Saddam Abdulwahab ◽

Julián Cristiano ◽

Domenec Puig ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Distance Estimation ◽

Depth Estimation ◽

Depth Image ◽

Absolute Distance ◽

Learning Framework ◽

Depth Images ◽

Monocular Depth ◽

Distance Prediction

Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image. Firstly, objects in the scene are detected and localized using the You Only Look Once (YOLOv5) network. In parallel, the estimated depth image is computed using a deep autoencoder network to detect the relative distances. The proposed object detection based YOLO was trained using a supervised learning technique, in turn, the network of depth estimation was self-supervised training. The presented distance estimation framework was evaluated on real images of outdoor scenes. The achieved results show that the proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance.

Download Full-text

Deep Learning Based Object Recognition Using Physically-Realistic Synthetic Depth Scenes

Machine Learning and Knowledge Extraction ◽

10.3390/make1030051 ◽

2019 ◽

Vol 1 (3) ◽

pp. 883-903 ◽

Cited By ~ 1

Author(s):

Daulet Baimukashev ◽

Alikhan Zhilisbayev ◽

Askat Kuzdeuov ◽

Artemiy Oleinikov ◽

Denis Fadeyev ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Object Recognition ◽

Data Collection ◽

Object Detection ◽

Depth Image ◽

Cluttered Environment ◽

Depth Data ◽

Depth Images ◽

Image Dataset

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.

Download Full-text

Deep Learning-Based Object Detection, Localisation and Tracking for Smart Wheelchair Healthcare Mobility

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18010091 ◽

2020 ◽

Vol 18 (1) ◽

pp. 91

Author(s):

Louis Lecrosnier ◽

Redouane Khemmar ◽

Nicolas Ragot ◽

Benoit Decoux ◽

Romain Rossi ◽

...

Keyword(s):

Object Detection ◽

Object Tracking ◽

Indoor Environment ◽

Distance Estimation ◽

Depth Estimation ◽

Detection Algorithm ◽

Use Case ◽

Detection Distance ◽

Detection Depth ◽

Sort Algorithm

This paper deals with the development of an Advanced Driver Assistance System (ADAS) for a smart electric wheelchair in order to improve the autonomy of disabled people. Our use case, built from a formal clinical study, is based on the detection, depth estimation, localization and tracking of objects in wheelchair’s indoor environment, namely: door and door handles. The aim of this work is to provide a perception layer to the wheelchair, enabling this way the detection of these keypoints in its immediate surrounding, and constructing of a short lifespan semantic map. Firstly, we present an adaptation of the YOLOv3 object detection algorithm to our use case. Then, we present our depth estimation approach using an Intel RealSense camera. Finally, as a third and last step of our approach, we present our 3D object tracking approach based on the SORT algorithm. In order to validate all the developments, we have carried out different experiments in a controlled indoor environment. Detection, distance estimation and object tracking are experimented using our own dataset, which includes doors and door handles.

Download Full-text

Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360$$^\circ $$∘ Panoramic Imagery

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01261-8_48 ◽

2018 ◽

pp. 812-830 ◽

Cited By ~ 14

Author(s):

Grégoire Payen de La Garanderie ◽

Amir Atapour Abarghouei ◽

Toby P. Breckon

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Blind Spot ◽

3D Object ◽

Monocular Depth ◽

3D Object Detection

Download Full-text

DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-451-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 451-458

Author(s):

L. Madhuanand ◽

F. Nex ◽

M. Y. Yang

Keyword(s):

Deep Learning ◽

Ground Level ◽

Depth Estimation ◽

Aerial Images ◽

Aerial Image ◽

Depth Information ◽

Single Image ◽

Monocular Depth ◽

Uav Images ◽

Image Depth

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Download Full-text

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Sensors ◽

10.3390/s20082272 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2272 ◽

Cited By ~ 5

Author(s):

Faisal Khan ◽

Saqib Salahuddin ◽

Hossein Javidnia

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Research Work ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Future Research ◽

Comprehensive Overview ◽

Ill Posed ◽

Monocular Depth

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Download Full-text

Task-Aware Monocular Depth Estimation for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6908 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12257-12264 ◽

Cited By ~ 1

Author(s):

Xinlong Wang ◽

Wei Yin ◽

Tao Kong ◽

Yuning Jiang ◽

Lei Li ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Depth Estimation ◽

3D Perception ◽

Research Attention ◽

3D Object ◽

Depth Prediction ◽

Monocular Depth ◽

Almost All ◽

3D Object Detection

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

Download Full-text

SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) ◽

10.1109/cvprw50498.2020.00383 ◽

2020 ◽

Author(s):

Haoyu Ren ◽

Aman Raj ◽

Mostafa El-Khamy ◽

Jungwon Lee

Keyword(s):

Deep Learning ◽

Depth Estimation ◽

Monocular Depth ◽

Weakly Supervised

Download Full-text

RGBD Salient Object Detection using Spatially Coherent Deep Learning Framework

2018 IEEE 23rd International Conference on Digital Signal Processing (DSP) ◽

10.1109/icdsp.2018.8631584 ◽

2018 ◽

Cited By ~ 4

Author(s):

Posheng Huang ◽

Chin-Han Shen ◽

Hsu-Feng Hsiao

Keyword(s):

Deep Learning ◽

Object Detection ◽

Salient Object Detection ◽

Salient Object ◽

Learning Framework

Download Full-text

Benchmark for Deep Learning based Visual Odometry and Monocular Depth Estimation

The Journal of Korea Robotics Society ◽

10.7746/jkros.2019.14.2.114 ◽

2019 ◽

Vol 14 (2) ◽

pp. 114-121

Author(s):

Hyukdoo Choi

Keyword(s):

Deep Learning ◽

Depth Estimation ◽

Visual Odometry ◽

Monocular Depth

Download Full-text

The Constraints between Edge Depth and Uncertainty for Monocular Depth Estimation

Electronics ◽

10.3390/electronics10243153 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3153

Author(s):

Shouying Wu ◽

Wei Li ◽

Binbin Liang ◽

Guoxin Huang

Keyword(s):

Error Rate ◽

Positive Impact ◽

Depth Estimation ◽

Multi Scale ◽

Teacher Networks ◽

Depth Images ◽

Teacher Student ◽

Object Edge ◽

Monocular Depth ◽

The Impact

The self-supervised monocular depth estimation paradigm has become an important branch of computer vision depth-estimation tasks. However, the depth estimation problem arising from object edge depth pulling or occlusion is still unsolved. The grayscale discontinuity of object edges leads to a relatively high depth uncertainty of pixels in these regions. We improve the geometric edge prediction results by taking uncertainty into account in the depth-estimation task. To this end, we explore how uncertainty affects this task and propose a new self-supervised monocular depth estimation technique based on multi-scale uncertainty. In addition, we introduce a teacher–student architecture in models and investigate the impact of different teacher networks on the depth and uncertainty results. We evaluate the performance of our paradigm in detail on the standard KITTI dataset. The experimental results show that the accuracy of our method increased from 87.7% to 88.2%, the AbsRel error rate decreased from 0.115 to 0.11, the SqRel error rate decreased from 0.903 to 0.822, and the RMSE error rate decreased from 4.863 to 4.686 compared with the benchmark Monodepth2. Our approach has a positive impact on the problem of texture replication or inaccurate object boundaries, producing sharper and smoother depth images.

Download Full-text