3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information

Yiqi Wu; Shichao Ma; Dejun Zhang; Jun Sun

doi:10.3390/sym12101636

3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information

Symmetry ◽

10.3390/sym12101636 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1636

Author(s):

Yiqi Wu ◽

Shichao Ma ◽

Dejun Zhang ◽

Jun Sun

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

Point Clouds ◽

Structural Relationship ◽

Estimation Methods ◽

Hand Pose Estimation ◽

3D Data ◽

Structural Relationships ◽

The Mean ◽

Hand Pose

Hand pose estimation from 3D data is a key challenge in computer vision as well as an essential step for human–computer interaction. A lot of deep learning-based hand pose estimation methods have made significant progress but give less consideration to the inner interactions of input data, especially when consuming hand point clouds. Therefore, this paper proposes an end-to-end capsule-based hand pose estimation network (Capsule-HandNet), which processes hand point clouds directly with the consideration of structural relationships among local parts, including symmetry, junction, relative location, etc. Firstly, an encoder is adopted in Capsule-HandNet to extract multi-level features into the latent capsule by dynamic routing. The latent capsule represents the structural relationship information of the hand point cloud explicitly. Then, a decoder recovers a point cloud to fit the input hand point cloud via a latent capsule. This auto-encoder procedure is designed to ensure the effectiveness of the latent capsule. Finally, the hand pose is regressed from the combined feature, which consists of the global feature and the latent capsule. The Capsule-HandNet is evaluated on public hand pose datasets under the metrics of the mean error and the fraction of frames. The mean joint errors of Capsule-HandNet on MSRA and ICVL datasets reach 8.85 mm and 7.49 mm, respectively, and Capsule-HandNet outperforms the state-of-the-art methods on most thresholds under the fraction of frames metric. The experimental results demonstrate the effectiveness of Capsule-HandNet for 3D hand pose estimation.

Download Full-text

A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods

Sensors ◽

10.3390/s20041074 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1074 ◽

Cited By ~ 3

Author(s):

Weiya Chen ◽

Chenchen Yu ◽

Chenyu Tu ◽

Zehua Lyu ◽

Jing Tang ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Wearable Sensors ◽

Complex Structure ◽

Estimation Methods ◽

Human Computer Interactions ◽

Hand Pose Estimation ◽

Timely Review ◽

Kinematic Models ◽

Hand Pose

Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.

Download Full-text

Real-Time Energy Efficient Hand Pose Estimation: A Case Study

Sensors ◽

10.3390/s20102828 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2828

Author(s):

Mhd Rashed Al Koutayni ◽

Vladimir Rybalkin ◽

Jameel Malik ◽

Ahmed Elhayek ◽

Christian Weis ◽

...

Keyword(s):

Neural Network ◽

Real Time ◽

Pose Estimation ◽

Energy Efficient ◽

Graphics Processing Units ◽

Estimation Algorithm ◽

High Energy ◽

Estimation Methods ◽

Hand Pose Estimation ◽

Hand Pose

The estimation of human hand pose has become the basis for many vital applications where the user depends mainly on the hand pose as a system input. Virtual reality (VR) headset, shadow dexterous hand and in-air signature verification are a few examples of applications that require to track the hand movements in real-time. The state-of-the-art 3D hand pose estimation methods are based on the Convolutional Neural Network (CNN). These methods are implemented on Graphics Processing Units (GPUs) mainly due to their extensive computational requirements. However, GPUs are not suitable for the practical application scenarios, where the low power consumption is crucial. Furthermore, the difficulty of embedding a bulky GPU into a small device prevents the portability of such applications on mobile devices. The goal of this work is to provide an energy efficient solution for an existing depth camera based hand pose estimation algorithm. First, we compress the deep neural network model by applying the dynamic quantization techniques on different layers to achieve maximum compression without compromising accuracy. Afterwards, we design a custom hardware architecture. For our device we selected the FPGA as a target platform because FPGAs provide high energy efficiency and can be integrated in portable devices. Our solution implemented on Xilinx UltraScale+ MPSoC FPGA is 4.2× faster and 577.3× more energy efficient than the original implementation of the hand pose estimation algorithm on NVIDIA GeForce GTX 1070.

Download Full-text

A 4.45 ms Low-Latency 3D Point-Cloud-Based Neural Network Processor for Hand Pose Estimation in Immersive Wearable Devices

2020 IEEE Symposium on VLSI Circuits ◽

10.1109/vlsicircuits18222.2020.9162895 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dongseok Im ◽

Sanghoon Kang ◽

Donghyeon Han ◽

Sungpill Choi ◽

Hoi-Jun Yoo

Keyword(s):

Neural Network ◽

Pose Estimation ◽

Point Cloud ◽

Wearable Devices ◽

Network Processor ◽

Low Latency ◽

3D Point Cloud ◽

Hand Pose Estimation ◽

Hand Pose

Download Full-text

3D Hand Pose Estimation in Point Cloud Using 3D Convolutional Neural Network on Egocentric Datasets

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v2020.n2.936 ◽

2021 ◽

Vol 2020 (2) ◽

pp. 87-97

Author(s):

Lê Văn Hùng

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

Color Image ◽

Point Cloud Data ◽

Complex Scene ◽

Cloud Data ◽

Hand Pose Estimation ◽

Processing Step ◽

Egocentric Vision ◽

Hand Pose

3D hand pose estimation from egocentric vision is an important study in the construction of assistance systems and modeling of robot hand in robotics. In this paper, we propose a complete method for estimating 3D hand posefrom the complex scene data obtained from the egocentric sensor. In which we propose a simple yet highly efficient pre-processing step for hand segmentation. In the estimation process, we used the Hand PointNet (HPN), V2V-PoseNet(V2V), Point-to-Point Regression PointNet (PtoP) for finetuning to estimate the 3D hand pose from the collected data obtained from the egocentric sensor, such as CVRA, FPHA (First-Person Hand Action) datasets. HPN, V2V, PtoP are thedeep networks/Convolutional Neural Networks (CNNs) for estimating 3D hand pose that uses the point cloud data of the hand. We evaluate the estimation results using the preprocessing step and do not use the pre-processing step to see the effectiveness of the proposed method. The results show that 3D distance error is increased many times compared to estimates on the hand datasets are not obstructed (the hand data obtained from surveillance cameras, are viewed from top view, front view, sides view) such as MSRA, NYU, ICVL datasets. The results are quantified, analyzed, shown on the point cloud data of CVAR dataset and projected on the color image of FPHA dataset.

Download Full-text

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

International Journal of Computer Vision ◽

10.1007/s11263-018-1081-7 ◽

2018 ◽

Vol 126 (11) ◽

pp. 1180-1198 ◽

Cited By ~ 11

Author(s):

James Steven Supančič ◽

Grégory Rogez ◽

Yi Yang ◽

Jamie Shotton ◽

Deva Ramanan

Keyword(s):

Pose Estimation ◽

Estimation Methods ◽

Hand Pose Estimation ◽

Hand Pose

Download Full-text

CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention

Applied Sciences ◽

10.3390/app10020618 ◽

2020 ◽

Vol 10 (2) ◽

pp. 618

Author(s):

Xianghan Wang ◽

Jie Jiang ◽

Yanming Guo ◽

Lai Kang ◽

Yingmei Wei ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Spatial Information ◽

Image Features ◽

Estimation Methods ◽

Feature Maps ◽

Hand Pose Estimation ◽

Rgb Images ◽

Rgb Image ◽

Hand Pose

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.

Download Full-text

LPPM-Net: Local-aware point processing module based 3D hand pose estimation for point cloud

Signal Processing Image Communication ◽

10.1016/j.image.2020.116036 ◽

2021 ◽

Vol 90 ◽

pp. 116036

Author(s):

Jian Yang ◽

Xiaohong Ma ◽

Yi Sun ◽

Xiangbo Lin

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

Hand Pose Estimation ◽

Processing Module ◽

Hand Pose

Download Full-text

Multihead Self Attention Hand Pose Estimation

E3S Web of Conferences ◽

10.1051/e3sconf/202021803023 ◽

2020 ◽

Vol 218 ◽

pp. 03023

Author(s):

Zhiqin Zhang ◽

Bo Zhang ◽

Fen Li ◽

Dehua Kong

Keyword(s):

Neural Networks ◽

Pose Estimation ◽

State Of The Art ◽

Salient Feature ◽

Estimation Methods ◽

Feature Maps ◽

Test Dataset ◽

Hand Pose Estimation ◽

Network Backbone ◽

Hand Pose

In This paper, we propose a hand pose estimation neural networks architecture named MSAHP which can improve PCK (percentage correct keypoints) greatly by fusing self-attention module in CNN (Convolutional Neural Networks). The proposed network is based on a ResNet (Residual Neural Network) backbone and concatenate discriminative features through multiple different scale feature maps, then multiple head self-attention module was used to focus on the salient feature map area. In recent years, self-attention mechanism was applicated widely in NLP and speech recognition, which can improve greatly key metrics. But in compute vision especially for hand pose estimation, we did not find the application. Experiments on hand pose estimation dataset demonstrate the improved PCK of our MSAHP than the existing state-of-the-art hand pose estimation methods. Specifically, the proposed method can achieve 93.68% PCK score on our mixed test dataset.

Download Full-text

Learning hand latent features for unsupervised 3D hand pose estimation

Journal of Autonomous Intelligence ◽

10.32629/jai.v2i1.36 ◽

2019 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Jamal Firmat Banzi1,2 ◽

Isack Bulugu3 ◽

Zhongfu Ye1

Keyword(s):

Pose Estimation ◽

Predictive Coding ◽

Depth Map ◽

Real Data ◽

Training Data ◽

Estimation Methods ◽

Hand Pose Estimation ◽

Latent Features ◽

Estimation System ◽

Hand Pose

Recent hand pose estimation methods require large numbers of annotated training data to extract the dynamic information from a hand representation. Nevertheless, precise and dense annotation on the real data is difficult to come by and the amount of information passed to the training algorithm is significantly higher. This paper presents an approach to developing a hand pose estimation system which can accurately regress a 3D pose in an unsupervised manner. The whole process is performed in three stages. Firstly, the hand is modelled by a novel latent tree dependency model (LTDM) which transforms internal joints location to an explicit representation. Secondly, we perform predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision. A mapping is then performed between an image depth and a generated representation. Thirdly, the hand joints are regressed using convolutional neural networks to finally estimate the latent pose given some depth map. Finally, an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose. To demonstrate the performance of the proposed system, a complete experiment is conducted on three challenging public datasets, ICVL, MSRA, and NYU. The empirical results show the significant performance of our method which is comparable or better than state-of-the-art approaches.

Download Full-text

A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods

Applied Sciences ◽

10.3390/app10196850 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6850

Author(s):

Theocharis Chatzis ◽

Andreas Stergioulas ◽

Dimitrios Konstantinidis ◽

Kosmas Dimitropoulos ◽

Petros Daras

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Estimation Methods ◽

Depth Cameras ◽

Hand Pose Estimation ◽

Technological Advances ◽

Multimodal Information ◽

Cost Efficient ◽

Hand Pose ◽

Comprehensive Study

The field of 3D hand pose estimation has been gaining a lot of attention recently, due to its significance in several applications that require human-computer interaction (HCI). The utilization of technological advances, such as cost-efficient depth cameras coupled with the explosive progress of Deep Neural Networks (DNNs), has led to a significant boost in the development of robust markerless 3D hand pose estimation methods. Nonetheless, finger occlusions and rapid motions still pose significant challenges to the accuracy of such methods. In this survey, we provide a comprehensive study of the most representative deep learning-based methods in literature and propose a new taxonomy heavily based on the input data modality, being RGB, depth, or multimodal information. Finally, we demonstrate results on the most popular RGB and depth-based datasets and discuss potential research directions in this rapidly growing field.

Download Full-text