Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition

Chhavi Dhiman; Dinesh Kumar Vishwakarma; Paras Agarwal

doi:10.1145/3441628

Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3441628 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-24

Author(s):

Chhavi Dhiman ◽

Dinesh Kumar Vishwakarma ◽

Paras Agarwal

Keyword(s):

State Of The Art ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Action Representation ◽

Left Hand ◽

Temporal Features ◽

Wide Range ◽

Spatio Temporal ◽

Ablation Study

Recently, human activity recognition using skeleton data is increasing due to its ease of acquisition and finer shape details. Still, it suffers from a wide range of intra-class variation, inter-class similarity among the actions and view variation due to which extraction of discriminative spatial and temporal features is still a challenging problem. In this regard, we present a novel Residual Inception Attention Driven CNN (RIAC-Net) Network, which visualizes the dynamics of the action in a part-wise manner. The complete skeletonis partitioned into five key parts: Head to Spine, Left Leg, Right Leg, Left Hand, Right Hand. For each part, a Compact Action Skeleton Sequence (CASS) is defined. Part-wise skeleton-based motion dynamics highlights discriminative local features of the skeleton that helps to overcome the challenges of inter-class similarity and intra-class variation with improved recognition performance. The RIAC-Net architecture is inspired by the concept of inception-residual representation that unifies the Attention Driven Residues (ADR) with inception-based Spatio-Temporal Convolution Features (STCF) to learn efficient salient action features. An ablation study is also carried out to analyze the effect of ADR over simple residue-based action representation. The robustness of the proposed framework is evaluated by performing an extensive experiment on four challenging datasets: UT Kinect Action 3D, Florence 3D action, MSR Daily Action3D, and NTU RGB-D datasets, which consistently demonstrate the superiority of the proposed method over other state-of-the-art methods.

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition Based on Spatio-temporal Features

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-11164-8_58 ◽

2009 ◽

pp. 357-362

Author(s):

Nikhil Sawant ◽

K. K. Biswas

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

10.1007/978-3-642-17534-3_48 ◽

2010 ◽

pp. 391-398 ◽

Cited By ~ 1

Author(s):

Yanli Ji ◽

Atsushi Shimada ◽

Rin-ichiro Taniguchi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

IEEE Access ◽

10.1109/access.2018.2817253 ◽

2018 ◽

Vol 6 ◽

pp. 17913-17922 ◽

Cited By ~ 24

Author(s):

Lei Wang ◽

Yangyang Xu ◽

Jun Cheng ◽

Haiying Xia ◽

Jianqin Yin ◽

...

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) ◽

10.1109/iccais.2013.6720562 ◽

2013 ◽

Cited By ~ 1

Author(s):

Quang D. Tran ◽

Ngoc Q. Ly

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal ◽

Fusion Scheme

Download Full-text

Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/136 ◽

2019 ◽

Cited By ~ 2

Author(s):

Haoze Wu ◽

Jiawei Liu ◽

Zheng-Jun Zha ◽

Zhenzhong Chen ◽

Xiaoyan Sun

Keyword(s):

Action Recognition ◽

Computational Cost ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Temporal Features ◽

Temporal Decomposition ◽

Spatio Temporal ◽

Different Order ◽

High Computational Cost

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatio-temporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-Temporal Features

Human Motion Sensing and Recognition - Studies in Computational Intelligence ◽

10.1007/978-3-662-53692-6_11 ◽

2017 ◽

pp. 233-250 ◽

Cited By ~ 8

Author(s):

Honghai Liu ◽

Zhaojie Ju ◽

Xiaofei Ji ◽

Chee Seng Chan ◽

Mehdi Khoury

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition Based on Pose Spatio-Temporal Features

Journal of Computer-Aided Design & Computer Graphics ◽

10.3724/sp.j.1089.2018.16848 ◽

2018 ◽

Vol 30 (9) ◽

pp. 1615

Author(s):

Xiao Zheng ◽

Xiaodong Peng ◽

Jiaxuan Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Higher-level representation of local spatio-temporal features for human action recognition using Subspace Matching Kernels

2016 23rd International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2016.7900237 ◽

2016 ◽

Author(s):

Bisser Raytchev ◽

Hideaki Kawamoto ◽

Toru Tamaki ◽

Kazufumi Kaneda

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text