Recover Human Pose from Monocular Image Under Weak Perspective Projection

Author(s):  
Minglei Tong ◽  
Yuncai Liu ◽  
Thomas S. Huang
2020 ◽  
Vol 34 (07) ◽  
pp. 12378-12385
Author(s):  
Haiping Wu ◽  
Bin Xiao

In this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image. First, we propose to use densely-generated limb depth maps to ease the learning of body joints depth, which are well aligned with image cues. Then, we design a lifting module from 2D pixel coordinates to 3D camera coordinates which explicitly takes the depth values as inputs, and is aligned with camera perspective projection model. We show our method achieves superior performance on large-scale 3D pose datasets Human3.6M and MPI-INF-3DHP, and sets the new state-of-the-art.


Author(s):  
Cao Hui ◽  
Noboru Ohnishi ◽  
Yoshinori Takeuchi ◽  
Tetsuya Matsumoto ◽  
Hiroaki Kudo

2020 ◽  
Vol 34 (07) ◽  
pp. 11312-11319 ◽  
Author(s):  
Jogendra Nath Kundu ◽  
Siddharth Seth ◽  
Rahul M V ◽  
Mugalodi Rakesh ◽  
Venkatesh Babu Radhakrishnan ◽  
...  

Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related task, such as 2D pose or multi-view image pairs. In contrast, we propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our pose estimation framework relies on a minimal set of prior knowledge that defines the underlying kinematic 3D structure, such as skeletal joint connectivity information with bone-length ratios in a fixed canonical scale. The proposed model employs three consecutive differentiable transformations namely forward-kinematics, camera-projection and spatial-map transformation. This design not only acts as a suitable bottleneck stimulating effective pose disentanglement, but also yields interpretable latent pose representations avoiding training of an explicit latent embedding to pose mapper. Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings. Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability.


Author(s):  
Huei-Yung Lin ◽  
Ting-Wen Chen ◽  
Chih-Chang Chen ◽  
Chia-Hao Hsieh ◽  
Wen-Nung Lie

Sign in / Sign up

Export Citation Format

Share Document