Integrally Cooperative Spatio-Temporal Feature Representation of Motion Joints for Action Recognition

Xin Chao; Zhenjie Hou; Jiuzhen Liang; Tianjin Yang

doi:10.3390/s20185180

Integrally Cooperative Spatio-Temporal Feature Representation of Motion Joints for Action Recognition

Sensors ◽

10.3390/s20185180 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5180

Author(s):

Xin Chao ◽

Zhenjie Hou ◽

Jiuzhen Liang ◽

Tianjin Yang

Keyword(s):

Action Recognition ◽

Motion Vector ◽

Human Action Recognition ◽

Human Action ◽

Subspace Learning ◽

Feature Representation ◽

Energy Fluctuation ◽

Cooperative Movement ◽

Spatio Temporal ◽

Inter Frame

In contemporary research on human action recognition, most methods separately consider the movement features of each joint. However, they ignore that human action is a result of integrally cooperative movement of each joint. Regarding the problem, this paper proposes an action feature representation, called Motion Collaborative Spatio-Temporal Vector (MCSTV) and Motion Spatio-Temporal Map (MSTM). MCSTV comprehensively considers the integral and cooperative between the motion joints. MCSTV weighted accumulates limbs’ motion vector to form a new vector to account for the movement features of human action. To describe the action more comprehensively and accurately, we extract key motion energy by key information extraction based on inter-frame energy fluctuation, project the energy to three orthogonal axes and stitch them in temporal series to construct the MSTM. To combine the advantages of MSTM and MCSTV, we propose Multi-Target Subspace Learning (MTSL). MTSL projects MSTM and MCSTV into a common subspace and makes them complement each other. The results on MSR-Action3D and UTD-MHAD show that our method has higher recognition accuracy than most existing human action recognition algorithms.

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

A new spatial-temporal histograms of gradients descriptor and HOD-VLAD encoding for human action recognition

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319400095 ◽

2019 ◽

Vol 17 (02) ◽

pp. 1940009 ◽

Cited By ~ 2

Author(s):

Bo Lin ◽

Bin Fang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Feature Representation ◽

Gaussian Kernel ◽

Feature Descriptor ◽

Local Descriptors ◽

Spatio Temporal ◽

Encoding Method

Automatic human action recognition is a core functionality of systems for video surveillance and human object interaction. In the whole recognition system, feature description and encoding represent two crucial key steps. In order to construct a powerful action recognition framework, it is important that the two steps must provide reliable performance. In this paper, we proposed a new human action feature descriptor which is called spatio-temporal histograms of gradients (SPHOG). SPHOG is based on the spatial and temporal derivation signal, which extracts the gradient changes between consecutive frames. Compared to the traditional descriptors histograms of optical flow, our proposed SPHOG costs less computation resource. In order to incorporate the distribution information of local descriptors into Vector of Locally Aggregated Descriptors (VLAD), which is a popular encoding approach for Bag-of-Feature representation, a Gaussian kernel is implanted to compute the weighted distance histograms of local descriptors. By doing this, the encoding schema for bag-of-feature (BOF) representation is more effective. We validated our proposed algorithm for human action recognition on three public available datasets KTH, UCF Sports and HMDB51. The evaluation experiment results indicate that the proposed descriptor and encoding method can improve the efficiency of human action recognition and the recognition accuracy.

Download Full-text

Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm

Neurocomputing ◽

10.1016/j.neucom.2018.05.121 ◽

2019 ◽

Vol 348 ◽

pp. 145-157 ◽

Cited By ~ 1

Author(s):

Bo Lin ◽

Bin Fang ◽

Weibin Yang ◽

Jiye Qian

Keyword(s):

Action Recognition ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Scattering Transform ◽

Feature Encoding ◽

Spatio Temporal

Download Full-text

VIEW-ROBUST HUMAN ACTION RECOGNITION BASED ON SPATIO-TEMPORAL SELF SIMILARITIES

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2020.01.00010 ◽

2020 ◽

Vol 15 (1) ◽

Author(s):

K. Pradeep Reddy

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatio Temporal

Download Full-text

Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos

MultiMedia Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-319-51811-4_30 ◽

2016 ◽

pp. 365-378 ◽

Cited By ~ 13

Author(s):

Ionut C. Duta ◽

Bogdan Ionescu ◽

Kiyoharu Aizawa ◽

Nicu Sebe

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatio Temporal

Download Full-text

Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Multimedia Tools and Applications ◽

10.1007/s11042-019-08587-w ◽

2020 ◽

Vol 79 (17-18) ◽

pp. 12349-12371

Author(s):

Qingshan She ◽

Gaoyuan Mu ◽

Haitao Gan ◽

Yingle Fan

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Context Aware ◽

Global Context ◽

Spatio Temporal

Download Full-text

Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition

Applied Sciences ◽

10.3390/app10124412 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4412

Author(s):

Ammar Mohsin Butt ◽

Muhammad Haroon Yousaf ◽

Fiza Murtaza ◽

Saima Nazir ◽

Serestina Viriri ◽

...

Keyword(s):

Action Recognition ◽

Feature Vector ◽

Human Action Recognition ◽

Human Action ◽

Compact Representation ◽

Agglomerative Clustering ◽

Residual Vector ◽

Benchmark Datasets ◽

Codebook Generation ◽

Spatio Temporal

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.

Download Full-text

Depth Sequential Information Entropy Maps and Multi-Label Subspace Learning for Human Action Recognition

IEEE Access ◽

10.1109/access.2020.3006067 ◽

2020 ◽

Vol 8 ◽

pp. 135118-135130

Author(s):

Tianjin Yang ◽

Zhenjie Hou ◽

Jiuzhen Liang ◽

Yuwan Gu ◽

Xin Chao

Keyword(s):

Action Recognition ◽

Information Entropy ◽

Human Action Recognition ◽

Human Action ◽

Subspace Learning ◽

Sequential Information

Download Full-text

Feature Extraction and Representation for Distributed Multi-View Human Action Recognition

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ◽

10.1109/jetcas.2013.2256824 ◽

2013 ◽

Vol 3 (2) ◽

pp. 145-154 ◽

Cited By ~ 7

Author(s):

Jiajia Luo ◽

Wei Wang ◽

Hairong Qi

Keyword(s):

Action Recognition ◽

Approximation Error ◽

Human Action Recognition ◽

Human Action ◽

Base Station ◽

Feature Representation ◽

Superior Performance ◽

Feature Descriptor ◽

Testing Stage ◽

New Feature

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.

Download Full-text

Spatio-temporal feature extraction and representation for RGB-D human action recognition

Pattern Recognition Letters ◽

10.1016/j.patrec.2014.03.024 ◽

2014 ◽

Vol 50 ◽

pp. 139-148 ◽

Cited By ~ 36

Author(s):

Jiajia Luo ◽

Wei Wang ◽

Hairong Qi

Keyword(s):

Feature Extraction ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatio Temporal ◽

Temporal Feature

Download Full-text