scholarly journals Unsupervised Deep Learning-Based RGB-D Visual Odometry

2020 ◽  
Vol 10 (16) ◽  
pp. 5426 ◽  
Author(s):  
Qiang Liu ◽  
Haidong Zhang ◽  
Yiming Xu ◽  
Li Wang

Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4735
Author(s):  
Sumin Zhang ◽  
Shouyi Lu ◽  
Rui He ◽  
Zhipeng Bao

Visual simultaneous localization and mapping (VSLAM) plays a vital role in the field of positioning and navigation. At the heart of VSLAM is visual odometry (VO), which uses continuous images to estimate the camera’s ego-motion. However, due to many assumptions of the classical VO system, robots can hardly operate in challenging environments. To solve this challenge, we combine the multiview geometry constraints of the classical stereo VO system with the robustness of deep learning to present an unsupervised pose correction network for the classical stereo VO system. The pose correction network regresses a pose correction that results in positioning error due to violation of modeling assumptions to make the classical stereo VO positioning more accurate. The pose correction network does not rely on the dataset with ground truth poses for training. The pose correction network also simultaneously generates a depth map and an explainability mask. Extensive experiments on the KITTI dataset show the pose correction network can significantly improve the positioning accuracy of the classical stereo VO system. Notably, the corrected classical stereo VO system’s average absolute trajectory error, average translational relative pose error, and average translational root-mean-square drift on a length of 100–800 m in the KITTI dataset is 13.77 cm, 0.038 m, and 1.08%, respectively. Therefore, the improved stereo VO system has almost reached the state of the art.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christian Crouzet ◽  
Gwangjin Jeong ◽  
Rachel H. Chae ◽  
Krystal T. LoPresti ◽  
Cody E. Dunn ◽  
...  

AbstractCerebral microhemorrhages (CMHs) are associated with cerebrovascular disease, cognitive impairment, and normal aging. One method to study CMHs is to analyze histological sections (5–40 μm) stained with Prussian blue. Currently, users manually and subjectively identify and quantify Prussian blue-stained regions of interest, which is prone to inter-individual variability and can lead to significant delays in data analysis. To improve this labor-intensive process, we developed and compared three digital pathology approaches to identify and quantify CMHs from Prussian blue-stained brain sections: (1) ratiometric analysis of RGB pixel values, (2) phasor analysis of RGB images, and (3) deep learning using a mask region-based convolutional neural network. We applied these approaches to a preclinical mouse model of inflammation-induced CMHs. One-hundred CMHs were imaged using a 20 × objective and RGB color camera. To determine the ground truth, four users independently annotated Prussian blue-labeled CMHs. The deep learning and ratiometric approaches performed better than the phasor analysis approach compared to the ground truth. The deep learning approach had the most precision of the three methods. The ratiometric approach has the most versatility and maintained accuracy, albeit with less precision. Our data suggest that implementing these methods to analyze CMH images can drastically increase the processing speed while maintaining precision and accuracy.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 18076-18088 ◽  
Author(s):  
Qiang Liu ◽  
Ruihao Li ◽  
Huosheng Hu ◽  
Dongbing Gu

Author(s):  
Vinícius da Silva Ramalho ◽  
Rômulo Francisco Lepinsk Lopes ◽  
Ricardo Luhm Silva ◽  
Marcelo Rudek

Synthetic datasets have been used to train 2D and 3D image-based deep learning models, and they serve as also as performance benchmarking. Although some authors already use 3D models for the development of navigation systems, their applications do not consider noise sources, which affects 3D sensors. Time-of-Flight sensors are susceptible to noise and conventional filters have limitations depending on the scenario it will be applied. On the other hand, deep learning filters can be more invariant to changes and take into consideration contextual information to attenuate noise. However, to train a deep learning filter a noiseless ground truth is required, but highly accurate hardware would be need. Synthetic datasets are provided with ground truth data, and similar noise can be applied to it, creating a noisy dataset for a deep learning approach. This research explores the training of a noise removal application using deep learning trained only with the Flying Things synthetic dataset with ground truth data and applying random noise to it. The trained model is validated with the Middlebury dataset which contains real-world data. The research results show that training the deep learning architecture for noise removal with only a synthetic dataset is capable to achieve near state of art performance, and the proposed model is able to process 12bit resolution depth images instead of 8bit images. Future studies will evaluate the algorithm performance regarding real-time noise removal to allow embedded applications.


Author(s):  
Kavita Srivastava

The steep rise in autonomous systems and the internet of things in recent years has influenced the way in which computation has performed. With built-in AI (artificial intelligence) in IoT and cyber-physical systems, the need for high-performance computing has emerged. Cloud computing is no longer sufficient for the sensor-driven systems which continuously keep on collecting data from the environment. The sensor-based systems such as autonomous vehicles require analysis of data and predictions in real-time which is not possible only with the centralized cloud. This scenario has given rise to a new computing paradigm called edge computing. Edge computing requires the storage of data, analysis, and prediction performed on the network edge as opposed to a cloud server thereby enabling quick response and less storage overhead. The intelligence at the edge can be obtained through deep learning. This chapter contains information about various deep learning frameworks, hardware, and systems for edge computing and examples of deep neural network training using the Caffe 2 framework.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Jiacheng Li ◽  
Ruirui Li ◽  
Ruize Han ◽  
Song Wang

Abstract Background Retinal vessel segmentation benefits significantly from deep learning. Its performance relies on sufficient training images with accurate ground-truth segmentation, which are usually manually annotated in the form of binary pixel-wise label maps. Manually annotated ground-truth label maps, more or less, contain errors for part of the pixels. Due to the thin structure of retina vessels, such errors are more frequent and serious in manual annotations, which negatively affect deep learning performance. Methods In this paper, we develop a new method to automatically and iteratively identify and correct such noisy segmentation labels in the process of network training. We consider historical predicted label maps of network-in-training from different epochs and jointly use them to self-supervise the predicted labels during training and dynamically correct the supervised labels with noises. Results We conducted experiments on the three datasets of DRIVE, STARE and CHASE-DB1 with synthetic noises, pseudo-labeled noises, and manually labeled noises. For synthetic noise, the proposed method corrects the original noisy label maps to a more accurate label map by 4.0–$$9.8\%$$ 9.8 % on $$F_1$$ F 1 and 10.7–$$16.8\%$$ 16.8 % on PR on three testing datasets. For the other two types of noise, the method could also improve the label map quality. Conclusions Experiment results verified that the proposed method could achieve better retinal image segmentation performance than many existing methods by simultaneously correcting the noise in the initial label map.


2021 ◽  
Author(s):  
Christian Crouzet ◽  
Gwangjin Jeong ◽  
Rachel Chae ◽  
Krystal LoPresti ◽  
Cody Dunn ◽  
...  

Abstract Cerebral microhemorrhages (CMHs) are associated with cerebrovascular disease, cognitive impairment, and normal aging. One method to study CMHs is to analyze histological sections (5-40 μm) stained with Prussian blue. Currently, users manually and subjectively identify and quantify Prussian blue-stained regions of interest, which is prone to inter-individual variability and can lead to significant delays in data analysis. To improve this labor-intensive process, we developed and compared three digital pathology approaches to identify and quantify CMHs from Prussian blue-stained brain sections: 1) ratiometric analysis of RGB pixel values, 2) phasor analysis of RGB images, and 3) deep learning using a mask region-based convolutional neural network. We applied these approaches to a preclinical mouse model of inflammation-induced CMHs. One-hundred CMHs were imaged using a 20x objective and RGB color camera. To determine the ground truth, four users independently annotated Prussian blue-labeled CMHs. The deep learning and ratiometric approaches performed better than the phasor analysis approach compared to the ground truth. The deep learning approach had the most precision of the three methods. The ratiometric approach has the most versatility and maintained accuracy, albeit with less precision. Our data suggest that implementing these methods to analyze CMH images can drastically increase the processing speed while maintaining precision and accuracy.


2019 ◽  
Vol 38 (14) ◽  
pp. 1538-1548
Author(s):  
Tonci Novkovic ◽  
Fadri Furrer ◽  
Marko Panjek ◽  
Margarita Grinvald ◽  
Roland Siegwart ◽  
...  

With the progress of machine learning, the demand for realistic data with high-quality annotations has been thriving. In order to generalize well, considerable amounts of data are required, especially realistic ground-truth data, for tasks such as object detection and scene segmentation. Such data can be difficult, time-consuming, and expensive to collect. This article presents a dataset of household objects and box scenes commonly found in warehouse environments. The dataset was obtained using a robotic setup with four different cameras. It contains reconstructed objects and scenes, as well as raw RGB and depth images, camera poses, pixel-wise labels of objects directly in the RGB images, and 3D bounding boxes with poses in the world frame. Furthermore, raw calibration data are provided, together with the intrinsic and extrinsic parameters for all the sensors. By providing object labels as pixel-wise masks, 3D, and 2D object bounding boxes, this dataset is useful for both object recognition and instance segmentation. The realistic scenes provided will serve for learning-based algorithms applied to scenarios where boxes of objects are often found, such as in the logistics sector. Both the dataset and the tools for data processing are published and available online.


Sign in / Sign up

Export Citation Format

Share Document