scholarly journals Using DeepLabCut for 3D markerless pose estimation across species and behaviors

2018 ◽  
Author(s):  
Tanmay Nath ◽  
Alexander Mathis ◽  
An Chi Chen ◽  
Amir Patel ◽  
Matthias Bethge ◽  
...  

Noninvasive behavioral tracking of animals during experiments is crucial to many scientific pursuits. Extracting the poses of animals without using markers is often essential for measuring behavioral effects in biomechanics, genetics, ethology & neuroscience. Yet, extracting detailed poses without markers in dynamically changing backgrounds has been challenging. We recently introduced an open source toolbox called DeepLabCut that builds on a state-of-the-art human pose estimation algorithm to allow a user to train a deep neural network using limited training data to precisely track user-defined features that matches human labeling accuracy. Here, with this paper we provide an updated toolbox that is self contained within a Python package that includes new features such as graphical user interfaces and active-learning based network refinement. Lastly, we provide a step-by-step guide for using DeepLabCut.

Author(s):  
Jaymie Strecker ◽  
Atif M. Memon

This chapter describes the state of the art in testing GUI-based software. Traditionally, GUI testing has been performed manually or semimanually, with the aid of capture- replay tools. Since this process may be too slow and ineffective to meet the demands of today’s developers and users, recent research in GUI testing has pushed toward automation. Model-based approaches are being used to generate and execute test cases, implement test oracles, and perform regression testing of GUIs automatically. This chapter shows how research to date has addressed the difficulties of testing GUIs in today’s rapidly evolving technological world, and it points to the many challenges that lie ahead.


Author(s):  
Sha Xin Wei

Since 1984, Graphical User Interfaces have typically relied on visual icons that mimic physical objects like the folder, button, and trash can, or canonical geometric elements like menus, and spreadsheet cells. GUI’s leverage our intuition about the physical environment. But the world can be thought of as being made of stuff as well as things. Making interfaces from this point of view requires a way to simulate the physics of stuff in realtime response to continuous gesture, driven by behavior logic that can be understood by the user and the designer. The author argues for leveraging the corporeal intuition that people learn from birth about heat flow, water, smoke, to develop interfaces at the density of matter that leverage in turn the state of the art in computational physics.


2020 ◽  
Vol 34 (07) ◽  
pp. 11924-11931
Author(s):  
Zhongwei Qiu ◽  
Kai Qiu ◽  
Jianlong Fu ◽  
Dongmei Fu

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.


2020 ◽  
Vol 34 (07) ◽  
pp. 10631-10638
Author(s):  
Yu Cheng ◽  
Bo Yang ◽  
Bo Wang ◽  
Robby T. Tan

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zhengtuo Wang ◽  
Yuetong Xu ◽  
Guanhua Xu ◽  
Jianzhong Fu ◽  
Jiongyan Yu ◽  
...  

Purpose In this work, the authors aim to provide a set of convenient methods for generating training data, and then develop a deep learning method based on point clouds to estimate the pose of target for robot grasping. Design/methodology/approach This work presents a deep learning method PointSimGrasp on point clouds for robot grasping. In PointSimGrasp, a point cloud emulator is introduced to generate training data and a pose estimation algorithm, which, based on deep learning, is designed. After trained with the emulation data set, the pose estimation algorithm could estimate the pose of target. Findings In experiment part, an experimental platform is built, which contains a six-axis industrial robot, a binocular structured-light sensor and a base platform with adjustable inclination. A data set that contains three subsets is set up on the experimental platform. After trained with the emulation data set, the PointSimGrasp is tested on the experimental data set, and an average translation error of about 2–3 mm and an average rotation error of about 2–5 degrees are obtained. Originality/value The contributions are as follows: first, a deep learning method on point clouds is proposed to estimate 6D pose of target; second, a convenient training method for pose estimation algorithm is presented and a point cloud emulator is introduced to generate training data; finally, an experimental platform is built, and the PointSimGrasp is tested on the platform.


Author(s):  
Jielu Yan ◽  
MingLiang Zhou ◽  
Jinli Pan ◽  
Meng Yin ◽  
Bin Fang

3D human pose estimation describes estimating 3D articulation structure of a person from an image or a video. The technology has massive potential because it can enable tracking people and analyzing motion in real time. Recently, much research has been conducted to optimize human pose estimation, but few works have focused on reviewing 3D human pose estimation. In this paper, we offer a comprehensive survey of the state-of-the-art methods for 3D human pose estimation, referred to as pose estimation solutions, implementations on images or videos that contain different numbers of people and advanced 3D human pose estimation techniques. Furthermore, different kinds of algorithms are further subdivided into sub-categories and compared in light of different methodologies. To the best of our knowledge, this is the first such comprehensive survey of the recent progress of 3D human pose estimation and will hopefully facilitate the completion, refinement and applications of 3D human pose estimation.


Author(s):  
Wei Feng ◽  
Wentao Liu ◽  
Tong Li ◽  
Jing Peng ◽  
Chen Qian ◽  
...  

Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects’ localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.


Author(s):  
Daniel Groos ◽  
Heri Ramampiaro ◽  
Espen AF Ihlen

Abstract Single-person human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements of real-life applications. The proliferation of deep learning techniques has resulted in the development of many advanced approaches. However, with the progresses in the field, more complex and inefficient models have also been introduced, which have caused tremendous increases in computational demands. To cope with these complexity and inefficiency challenges, we propose a novel convolutional neural network architecture, called EfficientPose, which exploits recently proposed EfficientNets in order to deliver efficient and scalable single-person pose estimation. EfficientPose is a family of models harnessing an effective multi-scale feature extractor and computationally efficient detection blocks using mobile inverted bottleneck convolutions, while at the same time ensuring that the precision of the pose configurations is still improved. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model both in terms of accuracy and computational efficiency. In particular, our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.


Sign in / Sign up

Export Citation Format

Share Document