An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks

Hashim Yasin; Björn Krüger

doi:10.3390/s21072415

An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks

Sensors ◽

10.3390/s21072415 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2415

Author(s):

Hashim Yasin ◽

Björn Krüger

Keyword(s):

Feature Space ◽

Ground Truth ◽

Reconstruction Error ◽

Synthetic Image ◽

Camera Model ◽

Retrieval Task ◽

In The Wild ◽

Internet Images ◽

Human Pose ◽

Image Planes

We propose an efficient and novel architecture for 3D articulated human pose retrieval and reconstruction from 2D landmarks extracted from a 2D synthetic image, an annotated 2D image, an in-the-wild real RGB image or even a hand-drawn sketch. Given 2D joint positions in a single image, we devise a data-driven framework to infer the corresponding 3D human pose. To this end, we first normalize 3D human poses from Motion Capture (MoCap) dataset by eliminating translation, orientation, and the skeleton size discrepancies from the poses and then build a knowledge-base by projecting a subset of joints of the normalized 3D poses onto 2D image-planes by fully exploiting a variety of virtual cameras. With this approach, we not only transform 3D pose space to the normalized 2D pose space but also resolve the 2D-3D cross-domain retrieval task efficiently. The proposed architecture searches for poses from a MoCap dataset that are near to a given 2D query pose in a definite feature space made up of specific joint sets. These retrieved poses are then used to construct a weak perspective camera and a final 3D posture under the camera model that minimizes the reconstruction error. To estimate unknown camera parameters, we introduce a nonlinear, two-fold method. We exploit the retrieved similar poses and the viewing directions at which the MoCap dataset was sampled to minimize the projection error. Finally, we evaluate our approach thoroughly on a large number of heterogeneous 2D examples generated synthetically, 2D images with ground-truth, a variety of real in-the-wild internet images, and a proof of concept using 2D hand-drawn sketches of human poses. We conduct a pool of experiments to perform a quantitative study on PARSE dataset. We also show that the proposed system yields competitive, convincing results in comparison to other state-of-the-art methods.

Download Full-text

MeshLifter: Weakly Supervised Approach for 3D Human Mesh Reconstruction from a Single 2D Pose Based on Loop Structure

Sensors ◽

10.3390/s20154257 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4257

Author(s):

Sunwon Jeong ◽

Ju Yong Chang

Keyword(s):

Ground Truth ◽

Reconstruction Error ◽

Loop Structure ◽

Ground Truth Data ◽

3D Data ◽

Reconstruction Performance ◽

Human Pose ◽

Mesh Reconstruction ◽

Weakly Supervised ◽

2D And 3D

In this paper, we address the problem of 3D human mesh reconstruction from a single 2D human pose based on deep learning. We propose MeshLifter, a network that estimates a 3D human mesh from an input 2D human pose. Unlike most existing 3D human mesh reconstruction studies that train models using paired 2D and 3D data, we propose a weakly supervised learning method based on a loop structure to train the MeshLifter. The proposed method alleviates the difficulty of obtaining ground-truth 3D data to ensure that the MeshLifter can be trained successfully from a 2D human pose dataset and an unpaired 3D motion capture dataset. We compare the proposed method with recent state-of-the-art studies through various experiments and show that the proposed method achieves effective 3D human mesh reconstruction performance. Notably, our proposed method achieves a reconstruction error of 59.1 mm without using the 3D ground-truth data of Human3.6M, the standard dataset for 3D human mesh reconstruction.

Download Full-text

Listen2Cough

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3448124 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-22

Author(s):

Xuhai Xu ◽

Ebrahim Nemati ◽

Korosh Vatanparvar ◽

Viswam Nathan ◽

Tousif Ahmed ◽

...

Keyword(s):

Health Assessment ◽

Ground Truth ◽

Health Condition ◽

Fine Tuning ◽

Detection Model ◽

Assessment Tasks ◽

Augmentation Techniques ◽

In The Wild ◽

The Rich ◽

Lung Health

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transform the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.

Download Full-text

3D Human Pose Estimation in the Wild by Adversarial Learning

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00551 ◽

2018 ◽

Cited By ~ 85

Author(s):

Wei Yang ◽

Wanli Ouyang ◽

Xiaolong Wang ◽

Jimmy Ren ◽

Hongsheng Li ◽

...

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Adversarial Learning ◽

In The Wild ◽

Human Pose ◽

3D Human Pose Estimation

Download Full-text

Markerless 3D Human Pose Tracking in the Wild with Fusion of Multiple Depth Cameras: Comparative Experimental Study with Kinect 2 and 3

Smart Innovation, Systems and Technologies - Activity and Behavior Computing ◽

10.1007/978-981-15-8944-7_8 ◽

2020 ◽

pp. 119-134

Author(s):

Jessica Colombel ◽

David Daney ◽

Vincent Bonnet ◽

François Charpillet

Keyword(s):

Experimental Study ◽

Depth Cameras ◽

Pose Tracking ◽

Human Pose Tracking ◽

In The Wild ◽

Human Pose

Download Full-text

Automated single particle detection and tracking for large microscopy datasets

Royal Society Open Science ◽

10.1098/rsos.160225 ◽

2016 ◽

Vol 3 (5) ◽

pp. 160225 ◽

Cited By ~ 11

Author(s):

Rhodri S. Wilson ◽

Lei Yang ◽

Alison Dun ◽

Annya M. Smyth ◽

Rory R. Duncan ◽

...

Keyword(s):

Single Molecule ◽

Single Particle ◽

Image Data ◽

Ground Truth ◽

Detection Algorithm ◽

Large Datasets ◽

Single Particle Tracking ◽

Synthetic Image ◽

Particle Detection ◽

Very Large Datasets

Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

Download Full-text

Comparative Analysis of Supervised and Unsupervised Approaches Applied to Large-Scale “In The Wild” Face Verification

Symmetry ◽

10.3390/sym12111832 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1832

Author(s):

Tomasz Hachaj ◽

Patryk Mazurek

Keyword(s):

Pattern Recognition ◽

Large Scale ◽

Statistical Significance ◽

Ground Truth ◽

Classification Algorithm ◽

Adjusted Rand Index ◽

Face Verification ◽

Data Set ◽

In The Wild ◽

Unsupervised Approaches

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.

Download Full-text

DensePose: Dense Human Pose Estimation in the Wild

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00762 ◽

2018 ◽

Cited By ~ 169

Author(s):

Riza Alp Guler ◽

Natalia Neverova ◽

Iasonas Kokkinos

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

In The Wild ◽

Human Pose

Download Full-text

Estimation of Robot Position and Orientation Using a Stationary Fisheye Camera

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015600040 ◽

2015 ◽

Vol 24 (06) ◽

pp. 1560004 ◽

Cited By ~ 1

Author(s):

Konstantinos K. Delibasis ◽

Vassilis P. Plagianakos ◽

Ilias Maglogiannis

Keyword(s):

Mobile Robot ◽

Ground Truth ◽

Positional Error ◽

Indoor Environments ◽

Basic Operation ◽

Robot Localization ◽

Camera Model ◽

Fisheye Camera ◽

True Position ◽

Robot Tasks

A core problem in robotics is the determination of the location and pose of a mobile robot in its environment. The localization is a basic operation, which must be successfully carried out in complex environments using imprecise and/or contaminated data and is essential for a broad range of mobile robot tasks, since the robot behavior depends on its position. In this work, we propose the use of a stationary fisheye camera for real time robot localization in indoor environments. We employ a model for the formation of the image by the fisheye camera, which can be used for accelerating the segmentation of the robot's top surface, as well as for calculating the robot's true position in the real world frame of reference. The proposed algorithm for robot localization exploits the calibrated fisheye camera model and the known dimensions of the robot, whereas it does not depend on any information from the robot's sensors and does not require visual landmarks in the indoor environment. Furthermore, the pose (orientation) of the robot is determined using a triangular shape placed on top of the robot's flat top surface, using Hu's moment invariants, appropriately modified using the calibrated fisheye camera model. Initial results are presented from video sequences and are compared to the ground truth position, obtained by the robot's sensors. The dependence of the average positional error with the distance from the camera is also measured.

Download Full-text

Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision

2017 International Conference on 3D Vision (3DV) ◽

10.1109/3dv.2017.00064 ◽

2017 ◽

Cited By ~ 122

Author(s):

Dushyant Mehta ◽

Helge Rhodin ◽

Dan Casas ◽

Pascal Fua ◽

Oleksandr Sotnychenko ◽

...

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

In The Wild ◽

Human Pose ◽

3D Human Pose Estimation

Download Full-text

Whole-Body Human Pose Estimation in the Wild

Computer Vision – ECCV 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58545-7_12 ◽

2020 ◽

pp. 196-214 ◽

Cited By ~ 1

Author(s):

Sheng Jin ◽

Lumin Xu ◽

Jin Xu ◽

Can Wang ◽

Wentao Liu ◽

...

Keyword(s):

Pose Estimation ◽

Whole Body ◽

Human Pose Estimation ◽

In The Wild ◽

Human Pose

Download Full-text