3D Human Pose Estimation in Vietnamese Traditional Martial Art Videos

Tuong Thanh Nguyen; Van-Hung Le; Duy-Long Duong; Thanh-Cong Pham; Dung Le

doi:10.25073/jaec.201933.252

3D Human Pose Estimation in Vietnamese Traditional Martial Art Videos

Journal of Advanced Engineering and Computation ◽

10.25073/jaec.201933.252 ◽

2019 ◽

Vol 3 (3) ◽

pp. 471

Author(s):

Tuong Thanh Nguyen ◽

Van-Hung Le ◽

Duy-Long Duong ◽

Thanh-Cong Pham ◽

Dung Le

Keyword(s):

Pose Estimation ◽

Martial Arts ◽

Social Life ◽

The Body ◽

Human Pose Estimation ◽

Body Parts ◽

Creative Commons ◽

Martial Art ◽

Benchmark Datasets ◽

Human Pose

Preserving, maintaining and teaching traditional martial arts are very important activities in social life. That helps preserve national culture, exercise and self-defense for practitioners. However, traditional martial arts have many different postures and activities of the body and body parts are diverse. The problem of estimating the actions of the human body still has many challenges, such as accuracy, obscurity, etc. In this paper, we survey several strong studies in the recent years for 3-D human pose estimation. Statistical tables have been compiled for years, typical results of these studies on the Human 3.6m dataset have been summarized. We also present a comparative study for 3-D human pose estimation based on the method that uses a single image. This study based on the methods that use the Convolutional Neural Network (CNN) for 2-D pose estimation, and then using 3-D pose library for mapping the 2-D results into the 3-D space. The CNNs model is trained on the benchmark datasets as MSCOCO Keypoints Challenge dataset [1], Human 3.6m [2], MPII dataset [3], LSP [4], [5], etc. We final publish the dataset of Vietnamese's traditional martial arts in Binh Dinh province for evaluating the 3-D human pose estimation. Quantitative results are presented and evaluated.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.

Download Full-text

An Evaluation of Pose Estimation in Video of Traditional Martial Arts Presentation

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v2019.n2.864 ◽

2019 ◽

Vol 2019 (2) ◽

pp. 114-126

Author(s):

Nguyễn Tường Thành ◽

Lê Văn Hùng ◽

Phạm Thành Công

Keyword(s):

National Culture ◽

Pose Estimation ◽

Martial Arts ◽

Social Life ◽

Body Parts ◽

Deviation Angle ◽

Self Defense ◽

Rgb Images ◽

Human Pose ◽

Small Dataset

Preserving, maintaining, and teaching traditional martial arts are very important activities in social life. That helps individuals preserve national culture, exercise, and practice self-defense. However, traditional martial arts have many differentposturesaswellasvariedmovementsofthebodyand body parts. The problem of estimating the actions of human body still has many challenges, such as accuracy, obscurity, and so forth. This paper begins with a review of several methods of 2-D human pose estimation on the RGB images, in which the methods of using the Convolutional Neural Network (CNN) models have outstanding advantages in terms of processing time and accuracy. In this work we built a small dataset and used CNN for estimating keypoints and joints of actions in traditional martial arts videos. Next we applied the measurements (length of joints, deviation angle of joints, and deviation of keypoints) for evaluating pose estimation in 2-D and 3-D spaces. The estimator was trained on the classic MSCOCO Keypoints Challenge dataset, the results were evaluated on a well-known dataset of Martial Arts, Dancing, and Sports dataset. The results were quantitatively evaluated and reported in this paper.

Download Full-text

A survey of human pose estimation: The body parts parsing based methods

Journal of Visual Communication and Image Representation ◽

10.1016/j.jvcir.2015.06.013 ◽

2015 ◽

Vol 32 ◽

pp. 10-19 ◽

Cited By ~ 32

Author(s):

Zhao Liu ◽

Jianke Zhu ◽

Jiajun Bu ◽

Chun Chen

Keyword(s):

Pose Estimation ◽

The Body ◽

Human Pose Estimation ◽

Body Parts ◽

Human Pose

Download Full-text

Martial Arts, Dancing and Sports dataset: A challenging stereo and multi-view dataset for 3D human pose estimation

Image and Vision Computing ◽

10.1016/j.imavis.2017.02.002 ◽

2017 ◽

Vol 61 ◽

pp. 22-39 ◽

Cited By ~ 18

Author(s):

Weichen Zhang ◽

Zhiguang Liu ◽

Liuyang Zhou ◽

Howard Leung ◽

Antoni B. Chan

Keyword(s):

Pose Estimation ◽

Martial Arts ◽

Human Pose Estimation ◽

Human Pose ◽

3D Human Pose Estimation

Download Full-text

Mixing body‐parts model for 2D human pose estimation in stereo videos

IET Computer Vision ◽

10.1049/iet-cvi.2016.0249 ◽

2017 ◽

Vol 11 (6) ◽

pp. 426-433 ◽

Cited By ~ 3

Author(s):

Manuel I. López‐Quintero ◽

Manuel J. Marín‐Jiménez ◽

Rafael Muñoz‐Salinas ◽

Rafael Medina‐Carnicer

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Body Parts ◽

Human Pose

Download Full-text

Human pose estimation based on parallel atrous convolution and body structure constraints

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212061 ◽

2021 ◽

pp. 1-11

Author(s):

Min Zhang ◽

Haijie Yang ◽

Pengfei Li ◽

Ming Jiang

Keyword(s):

Pose Estimation ◽

The Body ◽

Human Pose Estimation ◽

Spatial Relationships ◽

Body Structure ◽

Convolutional Network ◽

In Series ◽

Human Pose ◽

Ablation Study ◽

View Transformation

Human pose estimation is still a challenging task in computer vision, especially in the case of camera view transformation, joints occlusions and overlapping, the task will be of ever-increasing difficulty to achieve success. Most existing methods pass the input through a network, which typically consists of high-to-low resolution sub-networks that are connected in series. Still, during the up-sampling process, the spatial relationships and details might be lost. This paper designs a parallel atrous convolutional network with body structure constraints (PAC-BCNet) to address the problem. Among the mentioned techniques, the parallel atrous convolution (PAC) is constructed to deal with scale changes by connecting multiple different atrous convolution sub-networks in parallel. And it is used to extract features from different scales without reducing the resolution. Besides, the body structure constraints (BC), which enhance the correlation between each keypoint, are constructed to obtain better spatial relationships of the body by designing keypoints constraints sets and improving the loss function. In this work, a comparative experiment of the serial atrous convolution, the parallel atrous convolution, the ablation study with and without body structure constraints are conducted, which reasonably proves the effectiveness of the approach. The model is evaluated on two widely used human pose estimation benchmarks (MPII and LSP). The method achieves better performance on both datasets.

Download Full-text

Learning human poses in natural scenes

10.32469/10355/66196 ◽

2018 ◽

Author(s):

◽

Guanghan Ning

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

The Body ◽

Human Pose Estimation ◽

Natural Scenes ◽

Top Down ◽

University Of Missouri ◽

Single Person ◽

Human Pose ◽

High Level

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The task of human pose estimation in natural scenes is to determine the precise pixel locations of body keypoints. It is very important for many high-level computer vision tasks, including action and activity recognition, human-computer interaction, motion capture, and animation. We cover two different approaches for this task: top-down approach and bottom-up approach. In the top-down approach, we propose a human tracking method called ROLO that localizes each person. We then propose a state-of-the-art single-person human pose estimator that predicts the body keypoints of each individual. In the bottomup approach, we propose an efficient multi-person pose estimator with which we participated in a PoseTrack challenge [11]. On top of these, we propose to employ adversarial training to further boost the performance of single-person human pose estimator while generating synthetic images. We also propose a novel PoSeg network that jointly estimates the multi-person human poses and semantically segment the portraits of these persons at pixel-level. Lastly, we extend some of the proposed methods on human pose estimation and portrait segmentation to the task of human parsing, a more finegrained computer vision perception of humans.

Download Full-text

Body parts relevance learning via expectation–maximization for human pose estimation

Multimedia Systems ◽

10.1007/s00530-021-00755-z ◽

2021 ◽

Author(s):

Luhui Yue ◽

Junxia Li ◽

Qingshan Liu

Keyword(s):

Pose Estimation ◽

Expectation Maximization ◽

Human Pose Estimation ◽

Body Parts ◽

Human Pose ◽

Relevance Learning

Download Full-text

Occlusion-free appearance modeling of body parts for human pose estimation

2015 14th IAPR International Conference on Machine Vision Applications (MVA) ◽

10.1109/mva.2015.7153195 ◽

2015 ◽

Cited By ~ 1

Author(s):

Yuki Kawana ◽

Norimichi Ukita ◽

Norihiro Hagita

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Body Parts ◽

Appearance Modeling ◽

Human Pose

Download Full-text

Semantically Synchronizing Multiple-Camera Systems with Human Pose Estimation

Sensors ◽

10.3390/s21072464 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2464

Author(s):

Zhe Zhang ◽

Chunyu Wang ◽

Wenhu Qin

Keyword(s):

Pose Estimation ◽

Energy Function ◽

Human Pose Estimation ◽

Frame Synchronization ◽

Camera System ◽

Multiple Camera ◽

Camera Systems ◽

Benchmark Datasets ◽

Improved Performance ◽

Human Pose

Multiple-camera systems can expand coverage and mitigate occlusion problems. However, temporal synchronization remains a problem for budget cameras and capture devices. We propose an out-of-the-box framework to temporally synchronize multiple cameras using semantic human pose estimation from the videos. Human pose predictions are obtained with an out-of-the-shelf pose estimator for each camera. Our method firstly calibrates each pair of cameras by minimizing an energy function related to epipolar distances. We also propose a simple yet effective multiple-person association algorithm across cameras and a score-regularized energy function for improved performance. Secondly, we integrate the synchronized camera pairs into a graph and derive the optimal temporal displacement configuration for the multiple-camera system. We evaluate our method on four public benchmark datasets and demonstrate robust sub-frame synchronization accuracy on all of them.

Download Full-text

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1333250 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Shili Niu ◽

Weihua Ou ◽

Shihua Feng ◽

Jianping Gou ◽

Fei Long ◽

...

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Computational Cost ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Model Parameters ◽

Resource Limited ◽

Benchmark Datasets ◽

Human Pose ◽

Low Computational Cost

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

Download Full-text