scholarly journals Pose Estimation of Primitive-Shaped Objects from a Depth Image Using Superquadric Representation

2020 ◽  
Vol 10 (16) ◽  
pp. 5442
Author(s):  
Ryo Hachiuma ◽  
Hideo Saito

This paper presents a method for estimating the six Degrees of Freedom (6DoF) pose of texture-less primitive-shaped objects from depth images. As the conventional methods for object pose estimation require rich texture or geometric features to the target objects, these methods are not suitable for texture-less and geometrically simple shaped objects. In order to estimate the pose of the primitive-shaped object, the parameters that represent primitive shapes are estimated. However, these methods explicitly limit the number of types of primitive shapes that can be estimated. We employ superquadrics as a primitive shape representation that can represent various types of primitive shapes with only a few parameters. In order to estimate the superquadric parameters of primitive-shaped objects, the point cloud of the object must be segmented from a depth image. It is known that the parameter estimation is sensitive to outliers, which are caused by the miss-segmentation of the depth image. Therefore, we propose a novel estimation method for superquadric parameters that are robust to outliers. In the experiment, we constructed a dataset in which the person grasps and moves the primitive-shaped objects. The experimental results show that our estimation method outperformed three conventional methods and the baseline method.

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


2020 ◽  
Vol 34 (07) ◽  
pp. 11221-11228
Author(s):  
Yueying Kao ◽  
Weiming Li ◽  
Qiang Wang ◽  
Zhouchen Lin ◽  
Wooshik Kim ◽  
...  

Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.


Author(s):  
Tao Chen ◽  
Dongbing Gu

Abstract6D object pose estimation plays a crucial role in robotic manipulation and grasping tasks. The aim to estimate the 6D object pose from RGB or RGB-D images is to detect objects and estimate their orientations and translations relative to the given canonical models. RGB-D cameras provide two sensory modalities: RGB and depth images, which could benefit the estimation accuracy. But the exploitation of two different modality sources remains a challenging issue. In this paper, inspired by recent works on attention networks that could focus on important regions and ignore unnecessary information, we propose a novel network: Channel-Spatial Attention Network (CSA6D) to estimate the 6D object pose from RGB-D camera. The proposed CSA6D includes a pre-trained 2D network to segment the interested objects from RGB image. Then it uses two separate networks to extract appearance and geometrical features from RGB and depth images for each segmented object. Two feature vectors for each pixel are stacked together as a fusion vector which is refined by an attention module to generate a aggregated feature vector. The attention module includes a channel attention block and a spatial attention block which can effectively leverage the concatenated embeddings into accurate 6D pose prediction on known objects. We evaluate proposed network on two benchmark datasets YCB-Video dataset and LineMod dataset and the results show it can outperform previous state-of-the-art methods under ADD and ADD-S metrics. Also, the attention map demonstrates our proposed network searches for the unique geometry information as the most likely features for pose estimation. From experiments, we conclude that the proposed network can accurately estimate the object pose by effectively leveraging multi-modality features.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 706 ◽  
Author(s):  
Ping Jiang ◽  
Yoshiyuki Ishihara ◽  
Nobukatsu Sugiyama ◽  
Junji Oaki ◽  
Seiji Tokura ◽  
...  

Bin-picking of small parcels and other textureless planar-faced objects is a common task at warehouses. A general color image–based vision-guided robot picking system requires feature extraction and goal image preparation of various objects. However, feature extraction for goal image matching is difficult for textureless objects. Further, prior preparation of huge numbers of goal images is impractical at a warehouse. In this paper, we propose a novel depth image–based vision-guided robot bin-picking system for textureless planar-faced objects. Our method uses a deep convolutional neural network (DCNN) model that is trained on 15,000 annotated depth images synthetically generated in a physics simulator to directly predict grasp points without object segmentation. Unlike previous studies that predicted grasp points for a robot suction hand with only one vacuum cup, our DCNN also predicts optimal grasp patterns for a hand with two vacuum cups (left cup on, right cup on, or both cups on). Further, we propose a surface feature descriptor to extract surface features (center position and normal) and refine the predicted grasp point position, removing the need for texture features for vision-guided robot control and sim-to-real modification for DCNN model training. Experimental results demonstrate the efficiency of our system, namely that a robot with 7 degrees of freedom can pick randomly posed textureless boxes in a cluttered environment with a 97.5% success rate at speeds exceeding 1000 pieces per hour.


Author(s):  
Punarjay Chakravarty ◽  
Tom Roussel ◽  
Gaurav Pandey ◽  
Tinne Tuytelaars

Abstract We describe a Deep-Geometric Localizer that is able to estimate the full six degrees-of-freedom (DoF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6DOF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilise a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using Deep Learning, and use a geometric algorithm (PnP) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6DOF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono), and allows accurate 6DOF pose estimation in a previously mapped environment using a single camera. With results in simulated and real environments, our hybrid algorithm is particularly useful for autonomous vehicles (AVs) and shuttles that might repeatedly traverse the same route.


2014 ◽  
Vol 571-572 ◽  
pp. 781-787
Author(s):  
Jae Wan Park ◽  
Seok Jin Lee ◽  
Chil Woo Lee

In this paper, we propose cylindrical coordinate system which can analyze upper-body pose of the depth images correctly. This method extracts the part of human body from the depth images, and we configure the center of the part as origin of the cylindrical coordinate system. And we configure multi-layered cylinders based on the origin, then, we can analyze pose using the pattern which is crossed depth images namely cylinders and upper-body. Since the crossed point of the cylinders and upper-body is obtained as brightness values, it can extract to convert feature vector of the cylindrical coordinate system. The extracted feature vectors of the cylindrical coordinate system are presented to feature space of circular and are categorized pose patterns. The pose patterns are learned using average value of the feature vectors, and the pose patterns are categorized as pose to compare to pre-defined pose patterns using Euclidean distance. In this paper, we applied dynamic cylinder model to the region of the upper-body, so we can be classified as head, body and arms through simple computation, and to extract pose information is possible effectively. In this paper, the effect that can get through proposing pose estimation method is as following. At the first, pose estimation is possible by using only minimum feature points which apply cylinder model in region that connect human's torso, head and arms. The second is as following. When we obtain the feature points, because of applying cylinder model, we can extract user's feature points and angle of rotation easily. And in this paper, we don't consider the status of the user's body is titled using only the upper-body poses of the state rightly standing pose toward the front. Because it has disadvantage which cannot distinguish between changes according to the tile of the torso, but we can detect the vectors of the hands and arm using cylindrical coordinate system. Therefore, in the future, we will study to be able to recognize the pose in the tilted status.


Sign in / Sign up

Export Citation Format

Share Document