Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition

Qi Zuo; Lian Zou; Cien Fan; Dongqian Li; Hao Jiang; Yifeng Liu

doi:10.3390/s20247149

Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20247149 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7149

Author(s):

Qi Zuo ◽

Lian Zou ◽

Cien Fan ◽

Dongqian Li ◽

Hao Jiang ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Movement Patterns ◽

Physical Structure ◽

Human Skeleton ◽

Spatiotemporal Features ◽

Convolutional Networks ◽

Skeleton Graph ◽

Adaptive Fusion ◽

Different Parts

Spatiotemporal graph convolution has made significant progress in skeleton-based action recognition in recent years. Most of the existing graph convolution methods take all the joints of the human skeleton as the overall modeling graph, ignoring the differences in the movement patterns of various parts of the human, and cannot well connect the relationship between the different parts of the human skeleton. To capture the unique features of different parts of human skeleton data and the correlation of different parts, we propose two new graph convolution methods: the whole graph convolution network (WGCN) and the part graph convolution network (PGCN). WGCN learns the whole scale skeleton spatiotemporal features according to the movement patterns and physical structure of the human skeleton. PGCN divides the human skeleton graph into several subgraphs to learn the part scale spatiotemporal features. Moreover, we propose an adaptive fusion module that combines the two features for multiple complementary adaptive fusion to obtain more effective skeleton features. By coupling these proposals, we build a whole and part adaptive fusion graph convolution neural network (WPGCN) that outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Download Full-text

Human Skeleton Graph Attention Convolutional for Video Action Recognition

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) ◽

10.1109/isctt51595.2020.00040 ◽

2020 ◽

Author(s):

Deyuan Zhang ◽

Hongwei Gao ◽

Hailong Dai ◽

Xiangbin Shi

Keyword(s):

Action Recognition ◽

Human Skeleton ◽

Skeleton Graph

Download Full-text

Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018989 ◽

2019 ◽

Vol 33 ◽

pp. 8989-8996 ◽

Cited By ~ 11

Author(s):

Yu-Hui Wen ◽

Lin Gao ◽

Hongbo Fu ◽

Fang-Lue Zhang ◽

Shihong Xia

Keyword(s):

Spatial Structure ◽

Action Recognition ◽

Large Scale ◽

Human Skeleton ◽

Semantic Roles ◽

Temporal Domain ◽

Non Local ◽

Novel Model ◽

Local Block ◽

Skeleton Structure

Hierarchical structure and different semantic roles of joints in human skeleton convey important information for action recognition. Conventional graph convolution methods for modeling skeleton structure consider only physically connected neighbors of each joint, and the joints of the same type, thus failing to capture highorder information. In this work, we propose a novel model with motif-based graph convolution to encode hierarchical spatial structure, and a variable temporal dense block to exploit local temporal information over different ranges of human skeleton sequences. Moreover, we employ a non-local block to capture global dependencies of temporal domain in an attention mechanism. Our model achieves improvements over the stateof-the-art methods on two large-scale datasets.

Download Full-text

Action Recognition Based on Two-Stream Convolutional Networks With Long-Short-Term Spatiotemporal Features

IEEE Access ◽

10.1109/access.2020.2993227 ◽

2020 ◽

Vol 8 ◽

pp. 85284-85293 ◽

Cited By ~ 3

Author(s):

Yanqin Wan ◽

Zujun Yu ◽

Yao Wang ◽

Xingxin Li

Keyword(s):

Action Recognition ◽

Short Term ◽

Spatiotemporal Features ◽

Convolutional Networks

Download Full-text

GAS-GCN: Gated Action-Specific Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20123499 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3499 ◽

Cited By ~ 3

Author(s):

Wensong Chan ◽

Zhiqiang Tian ◽

Yang Wu

Keyword(s):

Action Recognition ◽

Human Body ◽

Adjacency Matrix ◽

Large Scale ◽

Redundant Information ◽

Temporal Dimension ◽

Convolutional Networks ◽

Effective Operation ◽

Public Datasets ◽

Body Joints

Skeleton-based action recognition has achieved great advances with the development of graph convolutional networks (GCNs). Many existing GCNs-based models only use the fixed hand-crafted adjacency matrix to describe the connections between human body joints. This omits the important implicit connections between joints, which contain discriminative information for different actions. In this paper, we propose an action-specific graph convolutional module, which is able to extract the implicit connections and properly balance them for each action. In addition, to filter out the useless and redundant information in the temporal dimension, we propose a simple yet effective operation named gated temporal convolution. These two major novelties ensure the superiority of our proposed method, as demonstrated on three large-scale public datasets: NTU-RGB + D, Kinetics, and NTU-RGB + D 120, and also shown in the detailed ablation studies.

Download Full-text

Global Co-Occurrence Feature and Local Spatial Feature Learning for Skeleton-Based Action Recognition

Entropy ◽

10.3390/e22101135 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1135

Author(s):

Jun Xie ◽

Wentian Xin ◽

Ruyi Liu ◽

Qiguang Miao ◽

Lijie Sheng ◽

...

Keyword(s):

Spatial Structure ◽

Action Recognition ◽

Large Scale ◽

Recent Progress ◽

Feature Fusion ◽

Model Performance ◽

Feature Learning ◽

Learning Model ◽

Spatial Feature ◽

Convolutional Networks

Recent progress on skeleton-based action recognition has been substantial, benefiting mostly from the explosive development of Graph Convolutional Networks (GCN). However, prevailing GCN-based methods may not effectively capture the global co-occurrence features among joints and the local spatial structure features composed of adjacent bones. They also ignore the effect of channels unrelated to action recognition on model performance. Accordingly, to address these issues, we propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS) consisting of two branches. The first branch, based on the Vertex Attention Mechanism branch (VAM-branch), captures the global co-occurrence feature of actions effectively; the second, based on the Cross-kernel Feature Fusion branch (CFF-branch), extracts local spatial structure features composed of adjacent bones and restrains the channels unrelated to action recognition. Extensive experiments on two large-scale datasets, NTU-RGB+D and Kinetics, demonstrate that GCLS achieves the best performance when compared to the mainstream approaches.

Download Full-text

Shallow Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s21020452 ◽

2021 ◽

Vol 21 (2) ◽

pp. 452

Author(s):

Wenjie Yang ◽

Jianlin Zhang ◽

Jingju Cai ◽

Zhiyong Xu

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Recognition Task ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatial Graph ◽

Graph Size ◽

Skeleton Graph

Graph convolutional networks (GCNs) have brought considerable improvement to the skeleton-based action recognition task. Existing GCN-based methods usually use the fixed spatial graph size among all the layers. It severely affects the model’s abilities to exploit the global and semantic discriminative information due to the limits of receptive fields. Furthermore, the fixed graph size would cause many redundancies in the representation of actions, which is inefficient for the model. The redundancies could also hinder the model from focusing on beneficial features. To address those issues, we proposed a plug-and-play channel adaptive merging module (CAMM) specific for the human skeleton graph, which can merge the vertices from the same part of the skeleton graph adaptively and efficiently. The merge weights are different across the channels, so every channel has its flexibility to integrate the joints. Then, we build a novel shallow graph convolutional network (SGCN) based on the module, which achieves state-of-the-art performance with less computational cost. Experimental results on NTU-RGB+D and Kinetics-Skeleton illustrates the superiority of our methods.

Download Full-text

A Novel Graph Representation for Skeleton-based Action Recognition

Signal & Image Processing An International Journal ◽

10.5121/sipij.2020.11605 ◽

2020 ◽

Vol 11 (6) ◽

pp. 65-73

Author(s):

Tingwei Li ◽

Ruiwen Zhang ◽

Qing Li

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Graph Representation ◽

Convolutional Networks ◽

Multi Scale ◽

Spatial Features ◽

Skeleton Graph ◽

Temporal And Spatial ◽

Generic Representation ◽

Novel Model

Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so that it can effectively capture the features of related nodes and improve the performance of model. More attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges with the existing methods based on GCNs. First, the consistency of temporal and spatial features is ignored due to extracting features node by node and frame by frame. We design a generic representation of skeleton sequences for action recognition and propose a novel model called Temporal Graph Networks (TGN), which can obtain spatiotemporal features simultaneously. Secondly, the adjacency matrix of graph describing the relation of joints are mostly depended on the physical connection between joints. We propose a multi-scale graph strategy to appropriately describe the relations between joints in skeleton graph, which adopts a full-scale graph, part-scale graph and core-scale graph to capture the local features of each joint and the contour features of important joints. Extensive experiments are conducted on two large datasets including NTU RGB+D and Kinetics Skeleton. And the experiments results show that TGN with our graph strategy outperforms other state-of-the-art methods.

Download Full-text

Multi Scale Temporal Graph Networks for Skeleton-Based Action Recognition

10.5121/csit.2020.101605 ◽

2020 ◽

Author(s):

Tingwei Li ◽

Ruiwen Zhang ◽

Qing Li

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Convolutional Networks ◽

Multi Scale ◽

Spatial Features ◽

Skeleton Graph ◽

Temporal Graph ◽

Temporal And Spatial ◽

Generic Representation ◽

Novel Model

Graph convolutional networks (GCNs) can effectively capture the features of related nodes and improve the performance of model. More attention is paid to employing GCN in Skeleton-Based action recognition. But existing methods based on GCNs have two problems. First, the consistency of temporal and spatial features is ignored for extracting features node by node and frame by frame. To obtain spatiotemporal features simultaneously, we design a generic representation of skeleton sequences for action recognition and propose a novel model called Temporal Graph Networks (TGN). Secondly, the adjacency matrix of graph describing the relation of joints are mostly depended on the physical connection between joints. To appropriate describe the relations between joints in skeleton graph, we propose a multi-scale graph strategy, adopting a full-scale graph, part-scale graph and core-scale graph to capture the local features of each joint and the contour features of important joints. Experiments were carried out on two large datasets and results show that TGN with our graph strategy outperforms state-of-the-art methods.

Download Full-text

Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition

Electronics ◽

10.3390/electronics10182198 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2198

Author(s):

Chaoyue Li ◽

Lian Zou ◽

Cien Fan ◽

Hao Jiang ◽

Yifeng Liu

Keyword(s):

Action Recognition ◽

Large Scale ◽

Feature Learning ◽

Superior Performance ◽

Sparse Graph ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatial Graph ◽

Multi Stage ◽

The Time Domain

Graph convolutional networks (GCNs), which model human actions as a series of spatial-temporal graphs, have recently achieved superior performance in skeleton-based action recognition. However, the existing methods mostly use the physical connections of joints to construct a spatial graph, resulting in limited topological information of the human skeleton. In addition, the action features in the time domain have not been fully explored. To better extract spatial-temporal features, we propose a multi-stage attention-enhanced sparse graph convolutional network (MS-ASGCN) for skeleton-based action recognition. To capture more abundant joint dependencies, we propose a new strategy for constructing skeleton graphs. This simulates bidirectional information flows between neighboring joints and pays greater attention to the information transmission between sparse joints. In addition, a part attention mechanism is proposed to learn the weight of each part and enhance the part-level feature learning. We introduce multiple streams of different stages and merge them in specific layers of the network to further improve the performance of the model. Our model is finally verified on two large-scale datasets, namely NTU-RGB+D and Skeleton-Kinetics. Experiments demonstrate that the proposed MS-ASGCN outperformed the previous state-of-the-art methods on both datasets.

Download Full-text

A comparative review of graph convolutional networks for human skeleton-based action recognition

Artificial Intelligence Review ◽

10.1007/s10462-021-10107-y ◽

2021 ◽

Author(s):

Liqi Feng ◽

Yaqin Zhao ◽

Wenxuan Zhao ◽

Jiaxi Tang

Keyword(s):

Action Recognition ◽

Human Skeleton ◽

Convolutional Networks ◽

Comparative Review

Download Full-text