Self-Attention Network for Human Pose Estimation

Hailun Xia; Tianyang Zhang

doi:10.3390/app11041826

Self-Attention Network for Human Pose Estimation

Applied Sciences ◽

10.3390/app11041826 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1826

Author(s):

Hailun Xia ◽

Tianyang Zhang

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Attention Network ◽

Learning Framework ◽

Benchmark Datasets ◽

Rgb Images ◽

Human Pose ◽

Human Joints ◽

Symmetric Relations ◽

2D And 3D

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.

Download Full-text

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

Computer Vision -- ACCV 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-16808-1_21 ◽

2015 ◽

pp. 302-315 ◽

Cited By ~ 31

Author(s):

Arjun Jain ◽

Jonathan Tompson ◽

Yann LeCun ◽

Christoph Bregler

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Human Pose Estimation ◽

Learning Framework ◽

Motion Features ◽

Human Pose

Download Full-text

Generative 2D and 3D Human Pose Estimation with Vote Distributions

Advances in Visual Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-33179-4_45 ◽

2012 ◽

pp. 470-481 ◽

Cited By ~ 2

Author(s):

Jürgen Brauer ◽

Wolfgang Hübner ◽

Michael Arens

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Human Pose ◽

2D And 3D ◽

3D Human Pose Estimation

Download Full-text

Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation

Sensors ◽

10.3390/s21072464 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2464

Author(s):

Zhe Zhang ◽

Chunyu Wang ◽

Wenhu Qin

Keyword(s):

Pose Estimation ◽

Energy Function ◽

Human Pose Estimation ◽

Frame Synchronization ◽

Camera System ◽

Multiple Camera ◽

Camera Systems ◽

Benchmark Datasets ◽

Improved Performance ◽

Human Pose

Multiple-camera systems can expand coverage and mitigate occlusion problems. However, temporal synchronization remains a problem for budget cameras and capture devices. We propose an out-of-the-box framework to temporally synchronize multiple cameras using semantic human pose estimation from the videos. Human pose predictions are obtained with an out-of-the-shelf pose estimator for each camera. Our method firstly calibrates each pair of cameras by minimizing an energy function related to epipolar distances. We also propose a simple yet effective multiple-person association algorithm across cameras and a score-regularized energy function for improved performance. Secondly, we integrate the synchronized camera pairs into a graph and derive the optimal temporal displacement configuration for the multiple-camera system. We evaluate our method on four public benchmark datasets and demonstrate robust sub-frame synchronization accuracy on all of them.

Download Full-text

Deep Full-Body HPE for Activity Recognition from RGB Frames Only

Informatics ◽

10.3390/informatics8010002 ◽

2021 ◽

Vol 8 (1) ◽

pp. 2

Author(s):

Sameh Neili Boualia ◽

Najoua Essoukri Ben Amara

Keyword(s):

Computer Vision ◽

Activity Recognition ◽

Pose Estimation ◽

Human Robot Interaction ◽

Svm Classifier ◽

Estimation Model ◽

Rgb Images ◽

Human Pose ◽

Human Joints ◽

Full Body

Human Pose Estimation (HPE) is defined as the problem of human joints’ localization (also known as keypoints: elbows, wrists, etc.) in images or videos. It is also defined as the search for a specific pose in space of all articulated joints. HPE has recently received significant attention from the scientific community. The main reason behind this trend is that pose estimation is considered as a key step for many computer vision tasks. Although many approaches have reported promising results, this domain remains largely unsolved due to several challenges such as occlusions, small and barely visible joints, and variations in clothing and lighting. In the last few years, the power of deep neural networks has been demonstrated in a wide variety of computer vision problems and especially the HPE task. In this context, we present in this paper a Deep Full-Body-HPE (DFB-HPE) approach from RGB images only. Based on ConvNets, fifteen human joint positions are predicted and can be further exploited for a large range of applications such as gesture recognition, sports performance analysis, or human-robot interaction. To evaluate the proposed deep pose estimation model, we apply it to recognize the daily activities of a person in an unconstrained environment. Therefore, the extracted features, represented by deep estimated poses, are fed to an SVM classifier. To validate the proposed architecture, our approach is tested on two publicly available benchmarks for pose estimation and activity recognition, namely the J-HMDBand CAD-60datasets. The obtained results demonstrate the efficiency of the proposed method based on ConvNets and SVM and prove how deep pose estimation can improve the recognition accuracy. By means of comparison with state-of-the-art methods, we achieve the best HPE performance, as well as the best activity recognition precision on the CAD-60 dataset.

Download Full-text

DRPose3D: Depth Ranking in 3D Human Pose Estimation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/136 ◽

2018 ◽

Cited By ~ 11

Author(s):

Min Wang ◽

Xipeng Chen ◽

Wentao Liu ◽

Chen Qian ◽

Liang Lin ◽

...

Keyword(s):

Neural Network ◽

Pose Estimation ◽

Human Pose Estimation ◽

Geometric Feature ◽

Classification Problems ◽

Two Stage ◽

Human Pose ◽

Human Joints ◽

3D Information ◽

3D Human Pose Estimation

In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation. Instead of accurate 3D positions, the depth ranking can be identified by human intuitively and learned using the deep neural network more easily by solving classification problems. Moreover, depth ranking contains rich 3D information. It prevents the 2D-to-3D pose regression in two-stage methods from being ill-posed. In our method, firstly, we design a Pairwise Ranking Convolutional Neural Network (PRCNN) to extract depth rankings of human joints from images. Secondly, a coarse-to-fine 3D Pose Network(DPNet) is proposed to estimate 3D poses from both depth rankings and 2D human joint locations. Additionally, to improve the generality of our model, we introduce a statistical method to augment depth rankings. Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation.

Download Full-text

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1333250 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Shili Niu ◽

Weihua Ou ◽

Shihua Feng ◽

Jianping Gou ◽

Fei Long ◽

...

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Computational Cost ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Model Parameters ◽

Resource Limited ◽

Benchmark Datasets ◽

Human Pose ◽

Low Computational Cost

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

Download Full-text

Progressive Bi-C3D Pose Grammar for Human Pose Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7004 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13033-13040 ◽

Cited By ~ 2

Author(s):

Lu Zhou ◽

Yingying Chen ◽

Jinqiao Wang ◽

Hanqing Lu

Keyword(s):

Pose Estimation ◽

Human Body ◽

Message Passing ◽

Contextual Information ◽

Human Pose Estimation ◽

Body Parts ◽

Multi Scale ◽

Human Pose ◽

Human Joints ◽

Body Joints

In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive refinement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which confirms the feasibility and effectiveness of C3D in information interactions.

Download Full-text

Deep Learning-Based 2D and 3D Human Pose Estimation: A Survey

Proceedings of Second International Conference on Computing, Communications, and Cyber-Security - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-0733-2_38 ◽

2021 ◽

pp. 541-556

Author(s):

Pooja Parekh ◽

Atul Patel

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Human Pose Estimation ◽

Human Pose ◽

2D And 3D ◽

3D Human Pose Estimation

Download Full-text

UniPose+: A unified framework for 2D and 3D human pose estimation in images and videos

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2021.3124736 ◽

2021 ◽

pp. 1-1

Author(s):

Bruno Artacho ◽

Andreas Savakis

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Unified Framework ◽

Human Pose ◽

2D And 3D ◽

3D Human Pose Estimation

Download Full-text

3D Human Pose Estimation in Vietnamese Traditional Martial Art Videos

Journal of Advanced Engineering and Computation ◽

10.25073/jaec.201933.252 ◽

2019 ◽

Vol 3 (3) ◽

pp. 471

Author(s):

Tuong Thanh Nguyen ◽

Van-Hung Le ◽

Duy-Long Duong ◽

Thanh-Cong Pham ◽

Dung Le

Keyword(s):

Pose Estimation ◽

Martial Arts ◽

Social Life ◽

The Body ◽

Human Pose Estimation ◽

Body Parts ◽

Creative Commons ◽

Martial Art ◽

Benchmark Datasets ◽

Human Pose

Preserving, maintaining and teaching traditional martial arts are very important activities in social life. That helps preserve national culture, exercise and self-defense for practitioners. However, traditional martial arts have many different postures and activities of the body and body parts are diverse. The problem of estimating the actions of the human body still has many challenges, such as accuracy, obscurity, etc. In this paper, we survey several strong studies in the recent years for 3-D human pose estimation. Statistical tables have been compiled for years, typical results of these studies on the Human 3.6m dataset have been summarized. We also present a comparative study for 3-D human pose estimation based on the method that uses a single image. This study based on the methods that use the Convolutional Neural Network (CNN) for 2-D pose estimation, and then using 3-D pose library for mapping the 2-D results into the 3-D space. The CNNs model is trained on the benchmark datasets as MSCOCO Keypoints Challenge dataset [1], Human 3.6m [2], MPII dataset [3], LSP [4], [5], etc. We final publish the dataset of Vietnamese's traditional martial arts in Binh Dinh province for evaluating the 3-D human pose estimation. Quantitative results are presented and evaluated.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.

Download Full-text