Parallel Deep Learning Ensembles for Human Pose Estimation

Author(s):  
Hailin Ren ◽  
Anil Kumar ◽  
Xinran Wang ◽  
Pinhas Ben-Tzvi

This paper presents an efficient method to detect human pose with monocular color imagery using a parallel architecture based on deep neural network. The network presented in this approach consists of two sequentially connected stages of 13 parallel CNN ensembles, where each ensemble is trained to detect one specific kind of linkage of the human skeleton structure. After detecting all skeleton linkages, a voting score-based post-processing algorithm assembles the individual linkages to form a complete human structure. This algorithm exploits human structural heuristics while assembling skeleton links and searches only for adjacent link pairs around the expected common joint area. The use of structural heuristics in the presented approach heavily simplifies the post-processing computations. Furthermore, the parallel architecture of the presented network enables mutually independent computing nodes to be efficiently deployed on parallel computing devices such as GPUs for computationally efficient training. The proposed network has been trained and tested on the COCO 2017 person-keypoints dataset and delivers pose estimation performance matching state-of-art networks. The parallel ensembles architecture improves its adaptability in applications aimed at identifying only specific body parts while saving computational resources.

2017 ◽  
Vol 11 (6) ◽  
pp. 426-433 ◽  
Author(s):  
Manuel I. López‐Quintero ◽  
Manuel J. Marín‐Jiménez ◽  
Rafael Muñoz‐Salinas ◽  
Rafael Medina‐Carnicer

Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3865 ◽  
Author(s):  
Sungjin Hong ◽  
Yejin Kim

Human poses are difficult to estimate due to the complicated body structure and the self-occlusion problem. In this paper, we introduce a marker-less system for human pose estimation by detecting and tracking key body parts, namely the head, hands, and feet. Given color and depth images captured by multiple red, green, blue, and depth (RGB-D) cameras, our system constructs a graph model with segmented regions from each camera and detects the key body parts as a set of extreme points based on accumulative geodesic distances in the graph. During the search process, local detection using a supervised learning model is utilized to match local body features. A final set of extreme points is selected with a voting scheme and tracked with physical constraints from the unified data received from the multiple cameras. During the tracking process, a Kalman filter-based method is introduced to reduce positional noises and to recover from a failure of tracking extremes. Our system shows an average of 87% accuracy against the commercial system, which outperforms the previous multi-Kinects system, and can be applied to recognize a human action or to synthesize a motion sequence from a few key poses using a small set of extremes as input data.


Author(s):  
Nguyễn Tường Thành ◽  
Lê Văn Hùng ◽  
Phạm Thành Công

Preserving, maintaining, and teaching traditional martial arts are very important activities in social life. That helps individuals preserve national culture, exercise, and practice self-defense. However, traditional martial arts have many differentposturesaswellasvariedmovementsofthebodyand body parts. The problem of estimating the actions of human body still has many challenges, such as accuracy, obscurity, and so forth. This paper begins with a review of several methods of 2-D human pose estimation on the RGB images, in which the methods of using the Convolutional Neural Network (CNN) models have outstanding advantages in terms of processing time and accuracy. In this work we built a small dataset and used CNN for estimating keypoints and joints of actions in traditional martial arts videos. Next we applied the measurements (length of joints, deviation angle of joints, and deviation of keypoints) for evaluating pose estimation in 2-D and 3-D spaces. The estimator was trained on the classic MSCOCO Keypoints Challenge dataset, the results were evaluated on a well-known dataset of Martial Arts, Dancing, and Sports dataset. The results were quantitatively evaluated and reported in this paper.


2021 ◽  
Vol 2129 (1) ◽  
pp. 012027
Author(s):  
Qing Zhang ◽  
Lei Ding ◽  
Kai Qing Zhou ◽  
Jian Feng Li

Abstract For traditional human pose estimation models rely on a large amount of human body feature information, this paper proposes an optimization model using genetic algorithm to solve the problem of multiple person body part assembly. Different from other human body parts assembly method. The method proposed in this paper depends on the joints position information, namely the sum of the connection distances between the joints as the objective function, and finds the optimal value to obtain the best human pose assembly information. The simulation results show that compared with the traditional OpenPose model, the model proposed in this paper can obtain the same human skeleton using less position information.


Author(s):  
Rahul Ratusaria ◽  
Tushar Baghel ◽  
Ayush Chander Vanshi ◽  
Neeraj Garg

Human Pose estimation has grabbed the eye of the computer vision community for the past few decades. It is a vital step closer to knowledge people in pics and motion pictures. Strong articulations, small and hardly visible joints, occlusions, apparel, and lighting changes make it very difficult to perform estimate pose. Human Pose estimation is an important problem that needed to be study. It is used to detect human anatomical key points (e.g., shoulder, elbows, legs, wrist, etc.) in real time using less computational resources. There are many Artificial Intelligence models i.e, Posenet, OpenPose1 and MediaPipe8 for Real time Human Pose Estimation. Many experiments has performed to find out the best suitable model for Human Pose Estimation. Experiments stated that PoseNet is suitable to run on lightweight devices like browsers whereas OpenPose meant to run on GPU powered devices and is more accurate. On the other hand, MediaPipe is very fast, modular, reusable and highly efficient. Hence, our model uses the MediaPipe to perform its estimation. Keywords: Pose estimation, Gym Rep Tracker, Media Pipe, Python, Machine learning


2020 ◽  
Vol 34 (07) ◽  
pp. 13033-13040 ◽  
Author(s):  
Lu Zhou ◽  
Yingying Chen ◽  
Jinqiao Wang ◽  
Hanqing Lu

In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive refinement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which confirms the feasibility and effectiveness of C3D in information interactions.


2014 ◽  
Vol 36 (11) ◽  
pp. 2131-2143 ◽  
Author(s):  
Matthias Dantone ◽  
Juergen Gall ◽  
Christian Leistner ◽  
Luc Van Gool

Sign in / Sign up

Export Citation Format

Share Document