Object Detection-Based One-Shot Imitation Learning with an RGB-D Camera

Quanquan Shao; Jin Qi; Jin Ma; Yi Fang; Weiming Wang; Jie Hu

doi:10.3390/app10030803

Object Detection-Based One-Shot Imitation Learning with an RGB-D Camera

Applied Sciences ◽

10.3390/app10030803 ◽

2020 ◽

Vol 10 (3) ◽

pp. 803 ◽

Cited By ~ 2

Author(s):

Quanquan Shao ◽

Jin Qi ◽

Jin Ma ◽

Yi Fang ◽

Weiming Wang ◽

...

Keyword(s):

Object Detection ◽

Visual Information ◽

Depth Map ◽

Robot Learning ◽

Target Object ◽

Fine Tuning ◽

Great Success ◽

Learning Approaches ◽

Policy Network ◽

End To End

End-to-end robot learning has achieved a great success for robots to obtain various manipulation skills. It learns a function which maps visual information to robotic action directly. Because of the diversity of target objects, most end-to-end robot learning approaches have focused on a single object-specific task with a limited capability of generalization. In this work, an object detection-based one-shot learning method is proposed, which separates the semantic understanding from robot control. It enables a robot to acquire similar manipulation skills efficiently and to have the ability to cope with new objects with a single demonstration. This approach mainly has two modules: the object detection network and the motion policy network. With RGB images, the object detection network tries to output the task-related semantic keypoint of the target object, which is the center of the container in this application, and the motion policy network generates the motion action based on the depth map and the detected keypoint. To evaluate this proposed pipeline, a series of experiments are conducted on typical placing tasks in different simulation scenarios and, additionally, the learned policy is transferred from simulation to the real world without any fine-tuning.

Download Full-text

Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip49819.2020.9301859 ◽

2020 ◽

Author(s):

Chang Liu ◽

Kebin Jia ◽

Pengyu Liu

Keyword(s):

Edge Detection ◽

Depth Map ◽

Intra Coding ◽

End To End

Download Full-text

Fine-Tuning of Pre-Trained End-to-End Speech Recognition with Generative Adversarial Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413703 ◽

2021 ◽

Author(s):

Md. Akmal Haidar ◽

Mehdi Rezagholizadeh

Keyword(s):

Speech Recognition ◽

Fine Tuning ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

End To End

Download Full-text

BLPNet: An End-to-End Model Towards Voxelization Free 3D Object Detection

2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR) ◽

10.1109/icievicivpr48672.2020.9306563 ◽

2020 ◽

Author(s):

Zhihao Cui ◽

Zhenhua Zhang

Keyword(s):

Object Detection ◽

3D Object ◽

End To End ◽

3D Object Detection

Download Full-text

Dynamic Low-Light Image Enhancement for Object Detection via End-to-End Training

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412802 ◽

2021 ◽

Author(s):

Haifeng Guo ◽

Tong Lu ◽

Yirui Wu

Keyword(s):

Object Detection ◽

Image Enhancement ◽

Low Light ◽

Light Image ◽

End To End

Download Full-text

Object Detection, Distributed Cloud Computing and Parallelization Techniques for Autonomous Driving Systems

Applied Sciences ◽

10.3390/app11072925 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2925

Author(s):

Edgar Cortés Gallardo Medina ◽

Victor Miguel Velazquez Espitia ◽

Daniela Chípuli Silva ◽

Sebastián Fernández Ruiz de las Cuevas ◽

Marco Palacios Hirata ◽

...

Keyword(s):

Neural Networks ◽

Cloud Computing ◽

Path Planning ◽

Object Detection ◽

Short Term Memory ◽

Smart Cities ◽

Autonomous Vehicle ◽

Computational Time ◽

End To End ◽

On The Road

Autonomous vehicles are increasingly becoming a necessary trend towards building the smart cities of the future. Numerous proposals have been presented in recent years to tackle particular aspects of the working pipeline towards creating a functional end-to-end system, such as object detection, tracking, path planning, sentiment or intent detection, amongst others. Nevertheless, few efforts have been made to systematically compile all of these systems into a single proposal that also considers the real challenges these systems will have on the road, such as real-time computation, hardware capabilities, etc. This paper reviews the latest techniques towards creating our own end-to-end autonomous vehicle system, considering the state-of-the-art methods on object detection, and the possible incorporation of distributed systems and parallelization to deploy these methods. Our findings show that while techniques such as convolutional neural networks, recurrent neural networks, and long short-term memory can effectively handle the initial detection and path planning tasks, more efforts are required to implement cloud computing to reduce the computational time that these methods demand. Additionally, we have mapped different strategies to handle the parallelization task, both within and between the networks.

Download Full-text

How to train your differentiable filter

Autonomous Robots ◽

10.1007/s10514-021-09990-9 ◽

2021 ◽

Author(s):

Alina Kloss ◽

Georg Martius ◽

Jeannette Bohg

Keyword(s):

Bayesian Filtering ◽

Learning Approaches ◽

Filtering Algorithms ◽

Recursive Filtering ◽

Process Dynamics ◽

Noise Characteristics ◽

Complex Models ◽

End To End ◽

Robotic Applications ◽

And Training

AbstractIn many robotic applications, it is crucial to maintain a belief about the state of a system, which serves as input for planning and decision making and provides feedback during task execution. Bayesian Filtering algorithms address this state estimation problem, but they require models of process dynamics and sensory observations and the respective noise characteristics of these models. Recently, multiple works have demonstrated that these models can be learned by end-to-end training through differentiable versions of recursive filtering algorithms. In this work, we investigate the advantages of differentiable filters (DFs) over both unstructured learning approaches and manually-tuned filtering algorithms, and provide practical guidance to researchers interested in applying such differentiable filters. For this, we implement DFs with four different underlying filtering algorithms and compare them in extensive experiments. Specifically, we (i) evaluate different implementation choices and training approaches, (ii) investigate how well complex models of uncertainty can be learned in DFs, (iii) evaluate the effect of end-to-end training through DFs and (iv) compare the DFs among each other and to unstructured LSTM models.

Download Full-text

Machine learning approaches for visual information retrieval

Visual Indexing and Retrieval - SpringerBriefs in Computer Science ◽

10.1007/978-1-4614-3588-4_3 ◽

2012 ◽

pp. 21-40 ◽

Cited By ~ 3

Author(s):

Frédéric Precioso ◽

Matthieu Cord

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Visual Information ◽

Learning Approaches ◽

Visual Information Retrieval

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

Visual attention strategies for target object detection

10.26686/wgtn.17067635 ◽

2021 ◽

Author(s):

◽

Ibrahim Mohammad Hussain Rahman

Keyword(s):

Visual Attention ◽

Object Detection ◽

Target Object ◽

Detection Accuracy ◽

Estimation Model ◽

Top Down ◽

Bottom Up ◽

Feature Map ◽

Low Level ◽

Visual Tasks

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection. Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet. For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis. The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency: 1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features. 2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects. 3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system. 4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map. 5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps. The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text