Marker-Less 3d Object Recognition and 6d Pose Estimation for Homogeneous Textureless Objects: An RGB-D Approach

Nasim Hajari; Gabriel Lugo Bustillo; Harsh Sharma; Irene Cheng

doi:10.3390/s20185098

Marker-Less 3d Object Recognition and 6d Pose Estimation for Homogeneous Textureless Objects: An RGB-D Approach

Sensors ◽

10.3390/s20185098 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5098

Author(s):

Nasim Hajari ◽

Gabriel Lugo Bustillo ◽

Harsh Sharma ◽

Irene Cheng

Keyword(s):

Object Recognition ◽

Pose Estimation ◽

Template Matching ◽

Low Cost ◽

Region Of Interest ◽

Depth Information ◽

Natural Scenes ◽

Proposed Model ◽

Industrial Assembly ◽

Rgb Images

The task of recognising an object and estimating its 6d pose in a scene has received considerable attention in recent years. The accessibility and low-cost of consumer RGB-D cameras, make object recognition and pose estimation feasible even for small industrial businesses. An example is the industrial assembly line, where a robotic arm should pick a small, textureless and mostly homogeneous object and place it in a designated location. Despite all the recent advancements of object recognition and pose estimation techniques in natural scenes, the problem remains challenging for industrial parts. In this paper, we present a framework to simultaneously recognise the object’s class and estimate its 6d pose from RGB-D data. The proposed model adapts a global approach, where an object and the Region of Interest (ROI) are first recognised from RGB images. The object’s pose is then estimated from the corresponding depth information. We train various classifiers based on extracted Histogram of Oriented Gradient (HOG) features to detect and recognize the objects. We then perform template matching on the point cloud based on surface normal and Fast Point Feature Histograms (FPFH) to estimate the pose of the object. Experimental results show that our system is quite efficient, accurate and robust to illumination and background changes, even for the challenging objects of Tless dataset.

Download Full-text

CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation

Sensors ◽

10.3390/s21186095 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6095

Author(s):

Xiaojing Sun ◽

Bin Wang ◽

Longxiang Huang ◽

Qian Zhang ◽

Sulei Zhu ◽

...

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Depth Information ◽

Feature Maps ◽

Hand Pose Estimation ◽

Depth Sensors ◽

Key Points ◽

Rgb Images ◽

Public Datasets ◽

Hand Pose

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.

Download Full-text

InterNet+: A Light Network for Hand Pose Estimation

Sensors ◽

10.3390/s21206747 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6747

Author(s):

Yang Liu ◽

Jie Jiang ◽

Jiahao Sun ◽

Xianghan Wang

Keyword(s):

Performance Improvement ◽

Pose Estimation ◽

Network Architecture ◽

Activation Function ◽

Depth Information ◽

Global Features ◽

Hand Pose Estimation ◽

Feature Extractor ◽

Rgb Images ◽

Hand Pose

Hand pose estimation from RGB images has always been a difficult task, owing to the incompleteness of the depth information. Moon et al. improved the accuracy of hand pose estimation by using a new network, InterNet, through their unique design. Still, the network still has potential for improvement. Based on the architecture of MobileNet v3 and MoGA, we redesigned a feature extractor that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc. Using these modules effectively with our network, architecture can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.

Download Full-text

Investigation of Five Algorithms for Selection of the Optimal Region of Interest in Smartphone Photoplethysmography

Journal of Sensors ◽

10.1155/2016/6830152 ◽

2016 ◽

Vol 2016 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Rong-Chao Peng ◽

Wen-Rong Yan ◽

Ning-Ling Zhang ◽

Wan-Hua Lin ◽

Xiao-Lin Zhou ◽

...

Keyword(s):

Template Matching ◽

Low Cost ◽

Region Of Interest ◽

Physiological Parameters ◽

Home Healthcare ◽

Spectral Energy ◽

Optimal Region ◽

Subject Experiment ◽

Parameters Measurement

Smartphone photoplethysmography is a newly developed technique that can detect several physiological parameters from the photoplethysmographic signal obtained by the built-in camera of a smartphone. It is simple, low-cost, and easy-to-use, with a great potential to be used in remote medicine and home healthcare service. However, the determination of the optimal region of interest (ROI), which is an important issue for extracting photoplethysmographic signals from the camera video, has not been well studied. We herein proposed five algorithms for ROI selection: variance (VAR), spectral energy ratio (SER), template matching (TM), temporal difference (TD), and gradient (GRAD). Their performances were evaluated by a 50-subject experiment comparing the heart rates measured from the electrocardiogram and those from the smartphone using the five algorithms. The results revealed that the TM and the TD algorithms outperformed the other three as they had less standard error of estimate (<1.5 bpm) and smaller limits of agreement (<3 bpm). The TD algorithm was slightly better than the TM algorithm and more suitable for smartphone applications. These results may be helpful to improve the accuracy of the physiological parameters measurement and to make the smartphone photoplethysmography technique more practical.

Download Full-text

SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w7-397-2017 ◽

2017 ◽

Vol XLII-2/W7 ◽

pp. 397-404

Author(s):

Z. Wang ◽

T. Li ◽

L. Pan ◽

Z. Kang

Keyword(s):

Distance Measure ◽

Low Cost ◽

Semantic Segmentation ◽

Depth Information ◽

Maximum Mean Discrepancy ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Multiple Kernel ◽

Indoor Scenes ◽

Rgb Images

With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image classification. We use Multiple Kernel Maximum Mean Discrepancy (MK-MMD) as a distance measure to find common and special features of RGB and D images in the network to enhance performance of classification automatically. To explore better methods of applying MMD, we designed two strategies; the first calculates MMD for each feature map, and the other calculates MMD for whole batch features. Based on the result of classification, we use the full connect CRFs for the semantic segmentation. The experimental results show that our method can achieve a good performance on indoor RGB-D image semantic segmentation.

Download Full-text

Human Pose Estimation Algorithm for Low-Cost Computing Platform Using Depth Information Only

Advances in Intelligent Systems and Computing - Robot Intelligence Technology and Applications 3 ◽

10.1007/978-3-319-16841-8_61 ◽

2015 ◽

pp. 675-684

Author(s):

Hanguen Kim ◽

Sangwon Lee ◽

Youngjae Kim ◽

Dongsung Lee ◽

Jinsun Ju ◽

...

Keyword(s):

Pose Estimation ◽

Low Cost ◽

Estimation Algorithm ◽

Human Pose Estimation ◽

Depth Information ◽

Computing Platform ◽

Human Pose

Download Full-text

An Object Detection and Pose Estimation Approach for Position Based Visual Servoing

Electrical Control and Communication Engineering ◽

10.1515/ecce-2017-0005 ◽

2017 ◽

Vol 12 (1) ◽

pp. 34-39

Author(s):

Lei Shi

Keyword(s):

Object Recognition ◽

Object Detection ◽

Pose Estimation ◽

Stereo Vision ◽

Visual Servoing ◽

Correction Method ◽

Depth Information ◽

Recognition Method ◽

Lighting Condition ◽

Detection And Recognition

Abstract In this paper, an object recognition method and a pose estimation approach using stereo vision is presented. The proposed approach was used for position based visual servoing of a 6 DoF manipulator. The object detection and recognition method was designed with the purpose of increasing robustness. A RGB color-based object descriptor and an online correction method is proposed for object detection and recognition. Pose was estimated by using the depth information derived from stereo vision camera and an SVD based method. Transformation between the desired pose and object pose was calculated and later used for position based visual servoing. Experiments were carried out to verify the proposed approach for object recognition. The stereo camera was also tested to see whether the depth accuracy is adequate. The proposed object recognition method is invariant to scale, orientation and lighting condition which increases the level of robustness. The accuracy of stereo vision camera can reach 1 mm. The accuracy is adequate for tasks such as grasping and manipulation.

Download Full-text

DRNet: A Depth-Based Regression Network for 6D Object Pose Estimation

Sensors ◽

10.3390/s21051692 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1692

Author(s):

Lei Jin ◽

Xiaojuan Wang ◽

Mingshu He ◽

Jingyue Wang

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Region Of Interest ◽

Depth Map ◽

Depth Information ◽

Challenging Problem ◽

Optimization Framework ◽

Art Object ◽

Object Pose Estimation ◽

Difficult Cases

This paper focuses on 6Dof object pose estimation from a single RGB image. We tackle this challenging problem with a two-stage optimization framework. More specifically, we first introduce a translation estimation module to provide an initial translation based on an estimated depth map. Then, a pose regression module combines the ROI (Region of Interest) and the original image to predict the rotation and refine the translation. Compared with previous end-to-end methods that directly predict rotations and translations, our method can utilize depth information as weak guidance and significantly reduce the searching space for the subsequent module. Furthermore, we design a new loss function function for symmetric objects, an approach that has handled such exceptionally difficult cases in prior works. Experiments show that our model achieves state-of-the-art object pose estimation for the YCB- video dataset (Yale-CMU-Berkeley).

Download Full-text

A technique for protecting delicate specimens during processing

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100156894 ◽

1989 ◽

Vol 47 ◽

pp. 982-983

Author(s):

R.J. Mount ◽

R.V. Harrison

Keyword(s):

Basilar Membrane ◽

Low Cost ◽

Organ Of Corti ◽

Region Of Interest ◽

Sensory Epithelium ◽

Physical Damage ◽

Air Drying ◽

Specimen Handling ◽

Two Component

The sensory end organ of the ear, the organ of Corti, rests on a thin basilar membrane which lies between the bone of the central modiolus and the bony wall of the cochlea. In vivo, the organ of Corti is protected by the bony wall which totally surrounds it. In order to examine the sensory epithelium by scanning electron microscopy it is necessary to dissect away the protective bone and expose the region of interest (Fig. 1). This leaves the fragile organ of Corti susceptible to physical damage during subsequent handling. In our laboratory cochlear specimens, after dissection, are routinely prepared by the O-T- O-T-O technique, critical point dried and then lightly sputter coated with gold. This processing involves considerable specimen handling including several hours on a rotator during which the organ of Corti is at risk of being physically damaged. The following procedure uses low cost, readily available materials to hold the specimen during processing ,preventing physical damage while allowing an unhindered exchange of fluids.Following fixation, the cochlea is dehydrated to 70% ethanol then dissected under ethanol to prevent air drying. The holder is prepared by punching a hole in the flexible snap cap of a Wheaton vial with a paper hole punch. A small amount of two component epoxy putty is well mixed then pushed through the hole in the cap. The putty on the inner cap is formed into a “cup” to hold the specimen (Fig. 2), the putty on the outside is smoothed into a “button” to give good attachment even when the cap is flexed during handling (Fig. 3). The cap is submerged in the 70% ethanol, the bone at the base of the cochlea is seated into the cup and the sides of the cup squeezed with forceps to grip it (Fig.4). Several types of epoxy putty have been tried, most are either soluble in ethanol to some degree or do not set in ethanol. The only putty we find successful is “DUROtm MASTERMENDtm Epoxy Extra Strength Ribbon” (Loctite Corp., Cleveland, Ohio), this is a blue and yellow ribbon which is kneaded to form a green putty, it is available at many hardware stores.

Download Full-text

Drug Target Group Prediction with Multiple Drug Networks

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666190702103927 ◽

2020 ◽

Vol 23 (4) ◽

pp. 274-284 ◽

Cited By ~ 12

Author(s):

Jingang Che ◽

Lei Chen ◽

Zi-Han Guo ◽

Shuaiqun Wang ◽

Aorigele

Keyword(s):

Drug Target ◽

Low Cost ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Multiple Drug ◽

Property A ◽

Multiple Networks ◽

Proposed Model ◽

The One

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.

Download Full-text

Tower Crane Location Optimization for Heavy Unit Lifting in High-Rise Modular Construction

Buildings ◽

10.3390/buildings11030121 ◽

2021 ◽

Vol 11 (3) ◽

pp. 121

Author(s):

Hosang Hyun ◽

Moonseo Park ◽

Dowan Lee ◽

Jeonghoon Lee

Keyword(s):

Low Cost ◽

Modular Construction ◽

Algorithm Optimization ◽

Location Selection ◽

Project Completion ◽

High Rise ◽

Tower Crane ◽

Proposed Model ◽

Study Results ◽

Crane Selection

Modular construction, which involves unit production in factories and on-site work, has benefits such as low cost, high quality, and short duration, resulting from the controlled factory environment utilized. An efficient tower crane lifting plan ensures successful high-rise modular project completion. For improved efficiency, the lifting plan should minimize the reaching distance of the tower crane, because this distance directly affects the tower crane capacity, which is directly related to crane operation cost. In situations where units are lifted from trailers, the trailer-to-tower crane distance can have a significant impact on the tower crane operation efficiency. However, optimization of this distance to improve efficiency has not been sufficiently considered. This research proposes a genetic algorithm optimization model that suggests optimized tower crane and trailer locations. The case study results show that through the proposed model, the project manager can reflect the optimal location selection and optimal tower crane selection options with minimal cost.

Download Full-text