scholarly journals An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2415
Author(s):  
Hashim Yasin ◽  
Björn Krüger

We propose an efficient and novel architecture for 3D articulated human pose retrieval and reconstruction from 2D landmarks extracted from a 2D synthetic image, an annotated 2D image, an in-the-wild real RGB image or even a hand-drawn sketch. Given 2D joint positions in a single image, we devise a data-driven framework to infer the corresponding 3D human pose. To this end, we first normalize 3D human poses from Motion Capture (MoCap) dataset by eliminating translation, orientation, and the skeleton size discrepancies from the poses and then build a knowledge-base by projecting a subset of joints of the normalized 3D poses onto 2D image-planes by fully exploiting a variety of virtual cameras. With this approach, we not only transform 3D pose space to the normalized 2D pose space but also resolve the 2D-3D cross-domain retrieval task efficiently. The proposed architecture searches for poses from a MoCap dataset that are near to a given 2D query pose in a definite feature space made up of specific joint sets. These retrieved poses are then used to construct a weak perspective camera and a final 3D posture under the camera model that minimizes the reconstruction error. To estimate unknown camera parameters, we introduce a nonlinear, two-fold method. We exploit the retrieved similar poses and the viewing directions at which the MoCap dataset was sampled to minimize the projection error. Finally, we evaluate our approach thoroughly on a large number of heterogeneous 2D examples generated synthetically, 2D images with ground-truth, a variety of real in-the-wild internet images, and a proof of concept using 2D hand-drawn sketches of human poses. We conduct a pool of experiments to perform a quantitative study on PARSE dataset. We also show that the proposed system yields competitive, convincing results in comparison to other state-of-the-art methods.

Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4257
Author(s):  
Sunwon Jeong ◽  
Ju Yong Chang

In this paper, we address the problem of 3D human mesh reconstruction from a single 2D human pose based on deep learning. We propose MeshLifter, a network that estimates a 3D human mesh from an input 2D human pose. Unlike most existing 3D human mesh reconstruction studies that train models using paired 2D and 3D data, we propose a weakly supervised learning method based on a loop structure to train the MeshLifter. The proposed method alleviates the difficulty of obtaining ground-truth 3D data to ensure that the MeshLifter can be trained successfully from a 2D human pose dataset and an unpaired 3D motion capture dataset. We compare the proposed method with recent state-of-the-art studies through various experiments and show that the proposed method achieves effective 3D human mesh reconstruction performance. Notably, our proposed method achieves a reconstruction error of 59.1 mm without using the 3D ground-truth data of Human3.6M, the standard dataset for 3D human mesh reconstruction.


Author(s):  
Xuhai Xu ◽  
Ebrahim Nemati ◽  
Korosh Vatanparvar ◽  
Viswam Nathan ◽  
Tousif Ahmed ◽  
...  

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transform the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.


2016 ◽  
Vol 3 (5) ◽  
pp. 160225 ◽  
Author(s):  
Rhodri S. Wilson ◽  
Lei Yang ◽  
Alison Dun ◽  
Annya M. Smyth ◽  
Rory R. Duncan ◽  
...  

Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.


Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1832
Author(s):  
Tomasz Hachaj ◽  
Patryk Mazurek

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.


2015 ◽  
Vol 24 (06) ◽  
pp. 1560004 ◽  
Author(s):  
Konstantinos K. Delibasis ◽  
Vassilis P. Plagianakos ◽  
Ilias Maglogiannis

A core problem in robotics is the determination of the location and pose of a mobile robot in its environment. The localization is a basic operation, which must be successfully carried out in complex environments using imprecise and/or contaminated data and is essential for a broad range of mobile robot tasks, since the robot behavior depends on its position. In this work, we propose the use of a stationary fisheye camera for real time robot localization in indoor environments. We employ a model for the formation of the image by the fisheye camera, which can be used for accelerating the segmentation of the robot's top surface, as well as for calculating the robot's true position in the real world frame of reference. The proposed algorithm for robot localization exploits the calibrated fisheye camera model and the known dimensions of the robot, whereas it does not depend on any information from the robot's sensors and does not require visual landmarks in the indoor environment. Furthermore, the pose (orientation) of the robot is determined using a triangular shape placed on top of the robot's flat top surface, using Hu's moment invariants, appropriately modified using the calibrated fisheye camera model. Initial results are presented from video sequences and are compared to the ground truth position, obtained by the robot's sensors. The dependence of the average positional error with the distance from the camera is also measured.


Author(s):  
Dushyant Mehta ◽  
Helge Rhodin ◽  
Dan Casas ◽  
Pascal Fua ◽  
Oleksandr Sotnychenko ◽  
...  

Author(s):  
Sheng Jin ◽  
Lumin Xu ◽  
Jin Xu ◽  
Can Wang ◽  
Wentao Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document