Pose Estimation of Primitive-Shaped Objects from a Depth Image Using Superquadric Representation

Ryo Hachiuma; Hideo Saito

doi:10.3390/app10165442

Pose Estimation of Primitive-Shaped Objects from a Depth Image Using Superquadric Representation

Applied Sciences ◽

10.3390/app10165442 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5442

Author(s):

Ryo Hachiuma ◽

Hideo Saito

Keyword(s):

Pose Estimation ◽

Degrees Of Freedom ◽

Shape Representation ◽

Estimation Method ◽

Depth Image ◽

Six Degrees Of Freedom ◽

Depth Images ◽

Object Pose Estimation ◽

Primitive Shape ◽

Conventional Methods

This paper presents a method for estimating the six Degrees of Freedom (6DoF) pose of texture-less primitive-shaped objects from depth images. As the conventional methods for object pose estimation require rich texture or geometric features to the target objects, these methods are not suitable for texture-less and geometrically simple shaped objects. In order to estimate the pose of the primitive-shaped object, the parameters that represent primitive shapes are estimated. However, these methods explicitly limit the number of types of primitive shapes that can be estimated. We employ superquadrics as a primitive shape representation that can represent various types of primitive shapes with only a few parameters. In order to estimate the superquadric parameters of primitive-shaped objects, the point cloud of the object must be segmented from a depth image. It is known that the parameter estimation is sensitive to outliers, which are caused by the miss-segmentation of the depth image. Therefore, we propose a novel estimation method for superquadric parameters that are robust to outliers. In the experiment, we constructed a dataset in which the person grasps and moves the primitive-shaped objects. The experimental results show that our estimation method outperformed three conventional methods and the baseline method.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6781 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11221-11228

Author(s):

Yueying Kao ◽

Weiming Li ◽

Qiang Wang ◽

Zhouchen Lin ◽

Wooshik Kim ◽

...

Keyword(s):

Pose Estimation ◽

Large Scale ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Depth Images ◽

In The Wild ◽

Object Pose Estimation ◽

Image Pairs ◽

Rgb Image

Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.

Download Full-text

Real-time Robust Six Degrees of Freedom Object Pose Estimation with a Time-of-flight Camera and a Color Camera

Journal of Field Robotics ◽

10.1002/rob.21519 ◽

2014 ◽

Vol 32 (1) ◽

pp. 61-84 ◽

Cited By ~ 3

Author(s):

Kaipeng Sun ◽

Robin Heß ◽

Zhihao Xu ◽

Klaus Schilling

Keyword(s):

Real Time ◽

Pose Estimation ◽

Degrees Of Freedom ◽

Time Of Flight ◽

Six Degrees Of Freedom ◽

Object Pose Estimation

Download Full-text

Median-Shape Representation Learning for Category-Level Object Pose Estimation in Cluttered Environments

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412318 ◽

2021 ◽

Author(s):

Hiroki Tatemichi ◽

Yasutomo Kawanishi ◽

Daisuke Deguchi ◽

Ichiro Ide ◽

Ayako Amma ◽

...

Keyword(s):

Pose Estimation ◽

Shape Representation ◽

Representation Learning ◽

Cluttered Environments ◽

Object Pose Estimation

Download Full-text

CSA6D: Channel-Spatial Attention Networks for 6D Object Pose Estimation

Cognitive Computation ◽

10.1007/s12559-021-09966-y ◽

2021 ◽

Author(s):

Tao Chen ◽

Dongbing Gu

Keyword(s):

Spatial Attention ◽

Pose Estimation ◽

Estimation Accuracy ◽

Pose Prediction ◽

Attention Networks ◽

Depth Images ◽

Segmented Object ◽

Benchmark Datasets ◽

Object Pose Estimation ◽

Network Channel

Abstract6D object pose estimation plays a crucial role in robotic manipulation and grasping tasks. The aim to estimate the 6D object pose from RGB or RGB-D images is to detect objects and estimate their orientations and translations relative to the given canonical models. RGB-D cameras provide two sensory modalities: RGB and depth images, which could benefit the estimation accuracy. But the exploitation of two different modality sources remains a challenging issue. In this paper, inspired by recent works on attention networks that could focus on important regions and ignore unnecessary information, we propose a novel network: Channel-Spatial Attention Network (CSA6D) to estimate the 6D object pose from RGB-D camera. The proposed CSA6D includes a pre-trained 2D network to segment the interested objects from RGB image. Then it uses two separate networks to extract appearance and geometrical features from RGB and depth images for each segmented object. Two feature vectors for each pixel are stacked together as a fusion vector which is refined by an attention module to generate a aggregated feature vector. The attention module includes a channel attention block and a spatial attention block which can effectively leverage the concatenated embeddings into accurate 6D pose prediction on known objects. We evaluate proposed network on two benchmark datasets YCB-Video dataset and LineMod dataset and the results show it can outperform previous state-of-the-art methods under ADD and ADD-S metrics. Also, the attention map demonstrates our proposed network searches for the unique geometry information as the most likely features for pose estimation. From experiments, we conclude that the proposed network can accurately estimate the object pose by effectively leveraging multi-modality features.

Download Full-text

Pose Estimation of a Six Degrees of Freedom Pipe-Bender using a 3D-Visual Measurement System of High Accuracy

IFToMM International Symposium on Robotics and Mechatronics ◽

10.3850/978-981-07-7744-9_044 ◽

2013 ◽

Author(s):

E. Castillo-Castaneda

Keyword(s):

Pose Estimation ◽

Measurement System ◽

Degrees Of Freedom ◽

High Accuracy ◽

Visual Measurement ◽

Six Degrees Of Freedom

Download Full-text

DGGAN: Depth-image Guided Generative Adversarial Networks for Disentangling RGB and Depth Images in 3D Hand Pose Estimation

2020 IEEE Winter Conference on Applications of Computer Vision (WACV) ◽

10.1109/wacv45572.2020.9093380 ◽

2020 ◽

Author(s):

Liangjian Chen ◽

Shih-Yao Lin ◽

Yusheng Xie ◽

Yen-Yu Lin ◽

Wei Fan ◽

...

Keyword(s):

Pose Estimation ◽

Depth Image ◽

Generative Adversarial Networks ◽

Hand Pose Estimation ◽

Image Guided ◽

Depth Images ◽

Adversarial Networks ◽

Hand Pose

Download Full-text

Depth Image–Based Deep Learning of Grasp Planning for Textureless Planar-Faced Objects in Vision-Guided Robotic Bin-Picking

Sensors ◽

10.3390/s20030706 ◽

2020 ◽

Vol 20 (3) ◽

pp. 706 ◽

Cited By ~ 4

Author(s):

Ping Jiang ◽

Yoshiyuki Ishihara ◽

Nobukatsu Sugiyama ◽

Junji Oaki ◽

Seiji Tokura ◽

...

Keyword(s):

Feature Extraction ◽

Degrees Of Freedom ◽

Color Image ◽

Texture Features ◽

Depth Image ◽

Feature Descriptor ◽

Grasp Planning ◽

Depth Images ◽

Bin Picking ◽

Picking System

Bin-picking of small parcels and other textureless planar-faced objects is a common task at warehouses. A general color image–based vision-guided robot picking system requires feature extraction and goal image preparation of various objects. However, feature extraction for goal image matching is difficult for textureless objects. Further, prior preparation of huge numbers of goal images is impractical at a warehouse. In this paper, we propose a novel depth image–based vision-guided robot bin-picking system for textureless planar-faced objects. Our method uses a deep convolutional neural network (DCNN) model that is trained on 15,000 annotated depth images synthetically generated in a physics simulator to directly predict grasp points without object segmentation. Unlike previous studies that predicted grasp points for a robot suction hand with only one vacuum cup, our DCNN also predicts optimal grasp patterns for a hand with two vacuum cups (left cup on, right cup on, or both cups on). Further, we propose a surface feature descriptor to extract surface features (center position and normal) and refine the predicted grasp point position, removing the need for texture features for vision-guided robot control and sim-to-real modification for DCNN model training. Experimental results demonstrate the efficiency of our system, namely that a robot with 7 degrees of freedom can pick randomly posed textureless boxes in a cluttered environment with a 97.5% success rate at speeds exceeding 1000 pieces per hour.

Download Full-text

Can we Localize an AV from a Single Image? Deep-Geometric 6 DoF Localization in Topo-metric Maps

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4052604 ◽

2021 ◽

pp. 1-13

Author(s):

Punarjay Chakravarty ◽

Tom Roussel ◽

Gaurav Pandey ◽

Tinne Tuytelaars

Keyword(s):

Pose Estimation ◽

Autonomous Vehicles ◽

Hybrid Algorithm ◽

Degrees Of Freedom ◽

Single Camera ◽

Single Image ◽

Six Degrees Of Freedom ◽

Localization Algorithms ◽

Mapping Process ◽

Set Of Points

Abstract We describe a Deep-Geometric Localizer that is able to estimate the full six degrees-of-freedom (DoF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6DOF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilise a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using Deep Learning, and use a geometric algorithm (PnP) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6DOF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono), and allows accurate 6DOF pose estimation in a previously mapped environment using a single camera. With results in simulated and real environments, our hybrid algorithm is particularly useful for autonomous vehicles (AVs) and shuttles that might repeatedly traverse the same route.

Download Full-text

Upper-Body Pose Recognition Using Cylinder Pattern Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.781 ◽

2014 ◽

Vol 571-572 ◽

pp. 781-787

Author(s):

Jae Wan Park ◽

Seok Jin Lee ◽

Chil Woo Lee

Keyword(s):

Coordinate System ◽

Pose Estimation ◽

Estimation Method ◽

Cylindrical Coordinate System ◽

Upper Body ◽

Feature Points ◽

Feature Vectors ◽

Depth Images ◽

Cylindrical Coordinate ◽

Cylinder Model

In this paper, we propose cylindrical coordinate system which can analyze upper-body pose of the depth images correctly. This method extracts the part of human body from the depth images, and we configure the center of the part as origin of the cylindrical coordinate system. And we configure multi-layered cylinders based on the origin, then, we can analyze pose using the pattern which is crossed depth images namely cylinders and upper-body. Since the crossed point of the cylinders and upper-body is obtained as brightness values, it can extract to convert feature vector of the cylindrical coordinate system. The extracted feature vectors of the cylindrical coordinate system are presented to feature space of circular and are categorized pose patterns. The pose patterns are learned using average value of the feature vectors, and the pose patterns are categorized as pose to compare to pre-defined pose patterns using Euclidean distance. In this paper, we applied dynamic cylinder model to the region of the upper-body, so we can be classified as head, body and arms through simple computation, and to extract pose information is possible effectively. In this paper, the effect that can get through proposing pose estimation method is as following. At the first, pose estimation is possible by using only minimum feature points which apply cylinder model in region that connect human's torso, head and arms. The second is as following. When we obtain the feature points, because of applying cylinder model, we can extract user's feature points and angle of rotation easily. And in this paper, we don't consider the status of the user's body is titled using only the upper-body poses of the state rightly standing pose toward the front. Because it has disadvantage which cannot distinguish between changes according to the tile of the torso, but we can detect the vectors of the hands and arm using cylindrical coordinate system. Therefore, in the future, we will study to be able to recognize the pose in the tilted status.

Download Full-text