GAS-GCN: Gated Action-Specific Graph Convolutional Networks for Skeleton-Based Action Recognition

Wensong Chan; Zhiqiang Tian; Yang Wu

doi:10.3390/s20123499

GAS-GCN: Gated Action-Specific Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20123499 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3499 ◽

Cited By ~ 3

Author(s):

Wensong Chan ◽

Zhiqiang Tian ◽

Yang Wu

Keyword(s):

Action Recognition ◽

Human Body ◽

Adjacency Matrix ◽

Large Scale ◽

Redundant Information ◽

Temporal Dimension ◽

Convolutional Networks ◽

Effective Operation ◽

Public Datasets ◽

Body Joints

Skeleton-based action recognition has achieved great advances with the development of graph convolutional networks (GCNs). Many existing GCNs-based models only use the fixed hand-crafted adjacency matrix to describe the connections between human body joints. This omits the important implicit connections between joints, which contain discriminative information for different actions. In this paper, we propose an action-specific graph convolutional module, which is able to extract the implicit connections and properly balance them for each action. In addition, to filter out the useless and redundant information in the temporal dimension, we propose a simple yet effective operation named gated temporal convolution. These two major novelties ensure the superiority of our proposed method, as demonstrated on three large-scale public datasets: NTU-RGB + D, Kinetics, and NTU-RGB + D 120, and also shown in the detailed ablation studies.

Download Full-text

Attention adjacency matrix based graph convolutional networks for skeleton-based action recognition

Neurocomputing ◽

10.1016/j.neucom.2021.02.001 ◽

2021 ◽

Vol 440 ◽

pp. 230-239

Author(s):

Jun Xie ◽

Qiguang Miao ◽

Ruyi Liu ◽

Wentian Xin ◽

Lei Tang ◽

...

Keyword(s):

Action Recognition ◽

Adjacency Matrix ◽

Convolutional Networks

Download Full-text

UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5653 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2677-2684

Author(s):

Marjaneh Safaei ◽

Pooyan Balouchian ◽

Hassan Foroosh

Keyword(s):

Action Recognition ◽

Human Body ◽

Large Scale ◽

Temporal Information ◽

Training Data ◽

Body Motion ◽

Human Actions ◽

Still Images ◽

Still Image ◽

Novel Approach

Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over $1M$ images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively static human-object interaction categories, (2) collecting images from the wild to benefit from a varied set of action representations, (3) appending multiple human-annotated labels per image rather than just the action label, and (4) inclusion of rich, structured and multi-modal set of metadata for each image. This departs from existing datasets, which typically provide single annotation in a smaller number of images and categories, with no metadata. UCF-STAR exposes the intrinsic difficulty of action recognition through its realistic scene and action complexity. To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of “latent” motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.

Download Full-text

Global Co-Occurrence Feature and Local Spatial Feature Learning for Skeleton-Based Action Recognition

Entropy ◽

10.3390/e22101135 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1135

Author(s):

Jun Xie ◽

Wentian Xin ◽

Ruyi Liu ◽

Qiguang Miao ◽

Lijie Sheng ◽

...

Keyword(s):

Spatial Structure ◽

Action Recognition ◽

Large Scale ◽

Recent Progress ◽

Feature Fusion ◽

Model Performance ◽

Feature Learning ◽

Learning Model ◽

Spatial Feature ◽

Convolutional Networks

Recent progress on skeleton-based action recognition has been substantial, benefiting mostly from the explosive development of Graph Convolutional Networks (GCN). However, prevailing GCN-based methods may not effectively capture the global co-occurrence features among joints and the local spatial structure features composed of adjacent bones. They also ignore the effect of channels unrelated to action recognition on model performance. Accordingly, to address these issues, we propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS) consisting of two branches. The first branch, based on the Vertex Attention Mechanism branch (VAM-branch), captures the global co-occurrence feature of actions effectively; the second, based on the Cross-kernel Feature Fusion branch (CFF-branch), extracts local spatial structure features composed of adjacent bones and restrains the channels unrelated to action recognition. Extensive experiments on two large-scale datasets, NTU-RGB+D and Kinetics, demonstrate that GCLS achieves the best performance when compared to the mainstream approaches.

Download Full-text

Action Recognition Based on Motion Representing and Reconstructed Phase Spaces Matching of 3D Joint Positions

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.333-335.675 ◽

2013 ◽

Vol 333-335 ◽

pp. 675-679

Author(s):

Yan Tao Zhao ◽

Bo Zhang ◽

Xu Guang Zhang ◽

Xiao Li Li ◽

Mei Ling Fu ◽

...

Keyword(s):

Action Recognition ◽

Human Body ◽

Nonlinear Dynamical Systems ◽

Human Action Recognition ◽

Human Action ◽

Polar Space ◽

Phase Portraits ◽

Nonlinear Dynamical ◽

Human Pose ◽

Body Joints

This paper presents an efficient and novel framework for human action recognition based on representing the motion of human body-joints and the theory of nonlinear dynamical systems. Our work is motivated by the pictorial structures model and advances in human pose estimation. Intuitively, a collective understanding of human joints movements can lead to a better representation and understanding of any human action through quantization in the polar space. We use time-delay embedding on the time series resulting of the evolution of human body-joints variables along time to reconstruct phase portraits. Moreover, we train SVM models for action recognition by comparing the distances between trajectories of human body-joints variables within the reconstructed phase portraits. The proposed framework is evaluated on MSR-Action3D dataset and results compared against several state-of-the-art methods.

Download Full-text

Localized Trajectories for 2D and 3D Action Recognition

Sensors ◽

10.3390/s19163503 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3503 ◽

Cited By ~ 4

Author(s):

Konstantinos Papadopoulos ◽

Girum Demisse ◽

Enjie Ghorbel ◽

Michel Antunes ◽

Djamila Aouada ◽

...

Keyword(s):

Action Recognition ◽

Human Activity ◽

Human Body ◽

Image Plane ◽

Bag Of Words ◽

Dense Trajectories ◽

Motion Trajectories ◽

Body Joints ◽

Background Motion ◽

2D And 3D

The Dense Trajectories concept is one of the most successful approaches in action recognition, suitable for scenarios involving a significant amount of motion. However, due to noise and background motion, many generated trajectories are irrelevant to the actual human activity and can potentially lead to performance degradation. In this paper, we propose Localized Trajectories as an improved version of Dense Trajectories where motion trajectories are clustered around human body joints provided by RGB-D cameras and then encoded by local Bag-of-Words. As a result, the Localized Trajectories concept provides an advanced discriminative representation of actions. Moreover, we generalize Localized Trajectories to 3D by using the depth modality. One of the main advantages of 3D Localized Trajectories is that they describe radial displacements that are perpendicular to the image plane. Extensive experiments and analysis were carried out on five different datasets.

Download Full-text

Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20247149 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7149

Author(s):

Qi Zuo ◽

Lian Zou ◽

Cien Fan ◽

Dongqian Li ◽

Hao Jiang ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Movement Patterns ◽

Physical Structure ◽

Human Skeleton ◽

Spatiotemporal Features ◽

Convolutional Networks ◽

Skeleton Graph ◽

Adaptive Fusion ◽

Different Parts

Spatiotemporal graph convolution has made significant progress in skeleton-based action recognition in recent years. Most of the existing graph convolution methods take all the joints of the human skeleton as the overall modeling graph, ignoring the differences in the movement patterns of various parts of the human, and cannot well connect the relationship between the different parts of the human skeleton. To capture the unique features of different parts of human skeleton data and the correlation of different parts, we propose two new graph convolution methods: the whole graph convolution network (WGCN) and the part graph convolution network (PGCN). WGCN learns the whole scale skeleton spatiotemporal features according to the movement patterns and physical structure of the human skeleton. PGCN divides the human skeleton graph into several subgraphs to learn the part scale spatiotemporal features. Moreover, we propose an adaptive fusion module that combines the two features for multiple complementary adaptive fusion to obtain more effective skeleton features. By coupling these proposals, we build a whole and part adaptive fusion graph convolution neural network (WPGCN) that outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Download Full-text

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Electronics ◽

10.3390/electronics10182198 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2198

Author(s):

Chaoyue Li ◽

Lian Zou ◽

Cien Fan ◽

Hao Jiang ◽

Yifeng Liu

Keyword(s):

Action Recognition ◽

Large Scale ◽

Feature Learning ◽

Superior Performance ◽

Sparse Graph ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatial Graph ◽

Multi Stage ◽

The Time Domain

Graph convolutional networks (GCNs), which model human actions as a series of spatial-temporal graphs, have recently achieved superior performance in skeleton-based action recognition. However, the existing methods mostly use the physical connections of joints to construct a spatial graph, resulting in limited topological information of the human skeleton. In addition, the action features in the time domain have not been fully explored. To better extract spatial-temporal features, we propose a multi-stage attention-enhanced sparse graph convolutional network (MS-ASGCN) for skeleton-based action recognition. To capture more abundant joint dependencies, we propose a new strategy for constructing skeleton graphs. This simulates bidirectional information flows between neighboring joints and pays greater attention to the information transmission between sparse joints. In addition, a part attention mechanism is proposed to learn the weight of each part and enhance the part-level feature learning. We introduce multiple streams of different stages and merge them in specific layers of the network to further improve the performance of the model. Our model is finally verified on two large-scale datasets, namely NTU-RGB+D and Skeleton-Kinetics. Experiments demonstrate that the proposed MS-ASGCN outperformed the previous state-of-the-art methods on both datasets.

Download Full-text

NPU RGB+D Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players

Applied Sciences ◽

10.3390/app11104426 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4426

Author(s):

Chunyan Ma ◽

Ji Fan ◽

Jinghao Yao ◽

Tao Zhang

Keyword(s):

Action Recognition ◽

Large Scale ◽

Short Term Memory ◽

Evaluation Criteria ◽

Image Data ◽

Basketball Player ◽

Basketball Players ◽

Convolutional Network ◽

Atomic Actions ◽

New Feature

Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.

Download Full-text

Effective Skeleton-Based Action Recognition by combining Graph Convolutional Networks and Fisher Vector Encoding

2020 15th IEEE International Conference on Signal Processing (ICSP) ◽

10.1109/icsp48669.2020.9321006 ◽

2020 ◽

Author(s):

Jun Tang ◽

Yanjiang Wang ◽

Baodi Liu

Keyword(s):

Action Recognition ◽

Convolutional Networks ◽

Fisher Vector ◽

Fisher Vector Encoding

Download Full-text

A New Multi-Person Pose Estimation Method Using the Partitioned CenterPose Network

Applied Sciences ◽

10.3390/app11094241 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4241

Author(s):

Jiahua Wu ◽

Hyo Jong Lee

Keyword(s):

Pose Estimation ◽

Human Body ◽

State Of The Art ◽

Estimation Method ◽

Bottom Up ◽

Center Point ◽

Novel Approach ◽

Body Joints

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.

Download Full-text