Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Sumin Zhang; Shouyi Lu; Rui He; Zhipeng Bao

doi:10.3390/s21144735

Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Sensors ◽

10.3390/s21144735 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4735

Author(s):

Sumin Zhang ◽

Shouyi Lu ◽

Rui He ◽

Zhipeng Bao

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Depth Map ◽

Ground Truth ◽

Visual Odometry ◽

Vital Role ◽

Positioning Accuracy ◽

Multiview Geometry ◽

Localization And Mapping ◽

Unsupervised Deep Learning

Visual simultaneous localization and mapping (VSLAM) plays a vital role in the field of positioning and navigation. At the heart of VSLAM is visual odometry (VO), which uses continuous images to estimate the camera’s ego-motion. However, due to many assumptions of the classical VO system, robots can hardly operate in challenging environments. To solve this challenge, we combine the multiview geometry constraints of the classical stereo VO system with the robustness of deep learning to present an unsupervised pose correction network for the classical stereo VO system. The pose correction network regresses a pose correction that results in positioning error due to violation of modeling assumptions to make the classical stereo VO positioning more accurate. The pose correction network does not rely on the dataset with ground truth poses for training. The pose correction network also simultaneously generates a depth map and an explainability mask. Extensive experiments on the KITTI dataset show the pose correction network can significantly improve the positioning accuracy of the classical stereo VO system. Notably, the corrected classical stereo VO system’s average absolute trajectory error, average translational relative pose error, and average translational root-mean-square drift on a length of 100–800 m in the KITTI dataset is 13.77 cm, 0.038 m, and 1.08%, respectively. Therefore, the improved stereo VO system has almost reached the state of the art.

Download Full-text

Unsupervised Deep Learning-Based RGB-D Visual Odometry

Applied Sciences ◽

10.3390/app10165426 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5426 ◽

Cited By ~ 1

Author(s):

Qiang Liu ◽

Haidong Zhang ◽

Yiming Xu ◽

Li Wang

Keyword(s):

Deep Learning ◽

Feature Matching ◽

Ground Truth ◽

Visual Odometry ◽

Depth Images ◽

Network Training ◽

Stream Structure ◽

Unsupervised Deep Learning ◽

Rgb Images ◽

Learning Frameworks

Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.

Download Full-text

Unsupervised Deep Learning via Affinity Diffusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6757 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11029-11036

Author(s):

Jiabo Huang ◽

Qi Dong ◽

Shaogang Gong ◽

Xiatian Zhu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

General Purpose ◽

Training Data ◽

Learning Approach ◽

Model Learning ◽

Feature Representations ◽

Discriminative Feature ◽

Training Samples ◽

Unsupervised Deep Learning

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

Download Full-text

Review of the State of the Art of Deep Learning for Plant Diseases: A Broad Analysis and Discussion

Plants ◽

10.3390/plants9101302 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1302 ◽

Cited By ~ 1

Author(s):

Reem Ibrahim Hasan ◽

Suhaila Mohd Yusuf ◽

Laith Alzubaidi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Early Detection ◽

Feature Fusion ◽

State Of The Art ◽

Vital Role ◽

Considerable Improvement ◽

Plant Diseases ◽

Recent Emergence

Deep learning (DL) represents the golden era in the machine learning (ML) domain, and it has gradually become the leading approach in many fields. It is currently playing a vital role in the early detection and classification of plant diseases. The use of ML techniques in this field is viewed as having brought considerable improvement in cultivation productivity sectors, particularly with the recent emergence of DL, which seems to have increased accuracy levels. Recently, many DL architectures have been implemented accompanying visualisation techniques that are essential for determining symptoms and classifying plant diseases. This review investigates and analyses the most recent methods, developed over three years leading up to 2020, for training, augmentation, feature fusion and extraction, recognising and counting crops, and detecting plant diseases, including how these methods can be harnessed to feed deep classifiers and their effects on classifier accuracy.

Download Full-text

Exploitation of deep learning in the automatic detection of cracks on paved roads

GEOMATICA ◽

10.1139/geomat-2019-0008 ◽

2019 ◽

Vol 73 (2) ◽

pp. 29-44

Author(s):

Won Mo Jung ◽

Faizaan Naveed ◽

Baoxin Hu ◽

Jianguo Wang ◽

Ningyuan Li

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Ground Truth ◽

Learning Networks ◽

Test Image ◽

Convolutional Network ◽

Image Patches ◽

Severity Levels ◽

First Time ◽

Different Levels

With the advance of deep learning networks, their applications in the assessment of pavement conditions are gaining more attention. A convolutional neural network (CNN) is the most commonly used network in image classification. In terms of pavement assessment, most existing CNNs are designed to only distinguish between cracks and non-cracks. Few networks classify cracks in different levels of severity. Information on the severity of pavement cracks is critical for pavement repair services. In this study, the state-of-the-art CNN used in the detection of pavement cracks was improved to localize the cracks and identify their distress levels based on three categories (low, medium, and high). In addition, a fully convolutional network (FCN) was, for the first time, utilized in the detection of pavement cracks. These designed architectures were validated using the data acquired on four highways in Ontario, Canada, and compared with the ground truth that was provided by the Ministry of Transportation of Ontario (MTO). The results showed that with the improved CNN, the prediction precision on a series of test image patches were 72.9%, 73.9%, and 73.1% for cracks with the severity levels of low, medium, and high, respectively. The precision for the FCN was tested on whole pavement images, resulting in 62.8%, 63.3%, and 66.4%, respectively, for cracks with the severity levels of low, medium, and high. It is worth mentioning that the ground truth contained some uncertainties, which partially contributed to the relatively low precision.

Download Full-text

Using Unsupervised Deep Learning Technique for Monocular Visual Odometry

IEEE Access ◽

10.1109/access.2019.2896988 ◽

2019 ◽

Vol 7 ◽

pp. 18076-18088 ◽

Cited By ~ 5

Author(s):

Qiang Liu ◽

Ruihao Li ◽

Huosheng Hu ◽

Dongbing Gu

Keyword(s):

Deep Learning ◽

Visual Odometry ◽

Learning Technique ◽

Unsupervised Deep Learning

Download Full-text

LIFT-SLAM: a deep-learning feature-based monocular visual SLAM method

10.5753/wtdr_ctdr.2020.14954 ◽

2020 ◽

Author(s):

Hudson Bruno ◽

Esther Colombini

Keyword(s):

Deep Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Parameter Tuning ◽

Robot Motion ◽

Visual Slam ◽

Feature Descriptors ◽

Localization And Mapping ◽

Feature Based

The Simultaneous Localization and Mapping (SLAM) problem addresses the possibility of a robot to localize itself in an unknown environment and simultaneously build a consistent map of this environment. Recently, cameras have been successfully used to get the environment’s features to perform SLAM, which is referred to as visual SLAM (VSLAM). However, classical VSLAM algorithms can be easily induced to fail when the robot motion or the environment is too challenging. Although new approaches based on Deep Neural Networks (DNNs) have achieved promising results in VSLAM, they still are unable to outperform traditional methods. To leverage the robustness of deep learning to enhance traditional VSLAM systems, we propose to combine the potential of deep learning-based feature descriptors with the traditional geometry-based VSLAM, building a new VSLAM system called LIFT-SLAM. Experiments conducted on KITTI and Euroc datasets show that deep learning can be used to improve the performance of traditional VSLAM systems, as the proposed approach was able to achieve results comparable to the state-of-the-art while being robust to sensorial noise. We enhance the proposed VSLAM pipeline by avoiding parameter tuning for specific datasets with an adaptive approach while evaluating how transfer learning can affect the quality of the features extracted.

Download Full-text

Unsupervised Representation High-Resolution Remote Sensing Image Scene Classification via Contrastive Learning Convolutional Neural Network

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.8.577 ◽

2021 ◽

Vol 87 (8) ◽

pp. 577-591

Author(s):

Fengpeng Li ◽

Jiabao Li ◽

Wei Han ◽

Ruyi Feng ◽

Lizhe Wang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Convolutional Neural Network ◽

State Of The Art ◽

Remote Sensing Image ◽

Scene Classification ◽

Data Set ◽

Unsupervised Deep Learning

Inspired by the outstanding achievement of deep learning, supervised deep learning representation methods for high-spatial-resolution remote sensing image scene classification obtained state-of-the-art performance. However, supervised deep learning representation methods need a considerable amount of labeled data to capture class-specific features, limiting the application of deep learning-based methods while there are a few labeled training samples. An unsupervised deep learning representation, high-resolution remote sensing image scene classification method is proposed in this work to address this issue. The proposed method, called contrastive learning, narrows the distance between positive views: color channels belonging to the same images widens the gaps between negative view pairs consisting of color channels from different images to obtain class-specific data representations of the input data without any supervised information. The classifier uses extracted features by the convolutional neural network (CNN)-based feature extractor with labeled information of training data to set space of each category and then, using linear regression, makes predictions in the testing procedure. Comparing with existing unsupervised deep learning representation high-resolution remote sensing image scene classification methods, contrastive learning CNN achieves state-of-the-art performance on three different scale benchmark data sets: small scale RSSCN7 data set, midscale aerial image data set, and large-scale NWPU-RESISC45 data set.

Download Full-text

Diversity-Generated Image Inpainting with Style Extraction

10.20944/preprints201912.0028.v1 ◽

2019 ◽

Author(s):

Weiwei Cai ◽

Zhanguo Wei

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Image Inpainting ◽

Ground Truth ◽

Generative Model ◽

Input Noise ◽

Latent Vector ◽

Proposed Model ◽

Ground Truth Image

The latest methods based on deep learning have achieved amazing results regarding the complex work of inpainting large missing areas in an image. This type of method generally attempts to generate one single "optimal" inpainting result, ignoring many other plausible results. However, considering the uncertainty of the inpainting task, one sole result can hardly be regarded as a desired regeneration of the missing area. In view of this weakness, which is related to the design of the previous algorithms, we propose a novel deep generative model equipped with a brand new style extractor which can extract the style noise (a latent vector) from the ground truth image. Once obtained, the extracted style noise and the ground truth image are both input into the generator. We also craft a consistency loss that guides the generated image to approximate the ground truth. Meanwhile, the same extractor captures the style noise from the generated image, which is forced to approach the input noise according to the consistency loss. After iterations, our generator is able to learn the styles corresponding to multiple sets of noise. The proposed model can generate a (sufficiently large) number of inpainting results consistent with the context semantics of the image. Moreover, we check the effectiveness of our model on three databases, i.e., CelebA, Agricultural Disease, and MauFlex. Compared to state-of-the-art inpainting methods, this model is able to offer desirable inpainting results with both a better quality and higher diversity. The code and model will be made available on https://github.com/vivitsai/SEGAN.

Download Full-text

DeepSide: A Deep Learning Framework for Drug Side Effect Prediction

10.1101/843029 ◽

2019 ◽

Cited By ~ 3

Author(s):

Onur Can Uner ◽

Ramazan Gokberk Cinbis ◽

Oznur Tastan ◽

A. Ercument Cicek

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Side Effect ◽

State Of The Art ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Ground Truth ◽

The State ◽

Specific Information ◽

Context Specific

AbstractDrug failures due to unforeseen adverse effects at clinical trials pose health risks for the participants and lead to substantial financial losses. Side effect prediction algorithms have the potential to guide the drug design process. LINCS L1000 dataset provides a vast resource of cell line gene expression data perturbed by different drugs and creates a knowledge base for context specific features. The state-of-the-art approach that aims at using context specific information relies on only the high-quality experiments in LINCS L1000 and discards a large portion of the experiments. In this study, our goal is to boost the prediction performance by utilizing this data to its full extent. We experiment with 5 deep learning architectures. We find that a multi-modal architecture produces the best predictive performance among multi-layer perceptron-based architectures when drug chemical structure (CS), and the full set of drug perturbed gene expression profiles (GEX) are used as modalities. Overall, we observe that the CS is more informative than the GEX. A convolutional neural network-based model that uses only SMILES string representation of the drugs achieves the best results and provides 13.0% macro-AUC and 3.1% micro-AUC improvements over the state-of-the-art. We also show that the model is able to predict side effect-drug pairs that are reported in the literature but was missing in the ground truth side effect dataset. DeepSide is available at http://github.com/OnurUner/DeepSide.

Download Full-text

Fast and Effective Region-based Depth Map Upsampling with Application to Location Map-Free Reversible Data Hiding

Journal of Advances in Mathematics and Computer Science ◽

10.9734/jamcs/2020/v35i430268 ◽

2020 ◽

pp. 24-45

Author(s):

Kuo-Liang Chung ◽

Yu-Ling Tseng ◽

Tzu-Hsien Chan ◽

Ching-Sheng Wang

Keyword(s):

Data Hiding ◽

Reversible Data Hiding ◽

State Of The Art ◽

Depth Map ◽

Ground Truth ◽

The State ◽

Embedding Capacity ◽

Art Methods ◽

Location Map ◽

Depth Map Upsampling

In this paper, we rst propose a fast and eective region-based depth map upsampling method, and then propose a joint upsampling and location map-free reversible data hiding method, simpled called the JUR method. In the proposed upsampling method, all the missing depth pixels are partitioned into three disjoint regions: the homogeneous, semi-homogeneous, and non- homogeneous regions. Then, we propose the depth copying, mean value, and bicubic interpolation approaches to reconstruct the three kinds of missing depth pixels quickly, respectively. In the proposed JUR method, without any location map overhead, using the neighboring ground truth depth pixels of each missing depth pixel, achieving substantial quality, and embedding capacity merits. The comprehensive experiments have been carried out to not only justify the execution-time and quality merits of the upsampled depth maps by our upsampling method relative to the state-of-the-art methods, but also justify the embedding capacity and quality merits of our JUR method when compared with the state-of-the-art methods.

Download Full-text