Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

Abstract: In deep neural networks, human action detection is one of the most demanding and complex tasks. Human gesture recognition is the same as human action recognition. Gesture is defined as a series of bodily motions that communicate a message. Gestures are a more natural and preferable way for humans to engage with computers, thereby bridging the gap between humans and robots. The finest communication platform for the deaf and dumb is human action recognition. We propose in this work to create a system for hand gesture identification that recognizes hand movements, hand characteristics such as peak calculation and angle calculation, and then converts gesture photos into text. Index Terms: Human action recognition, Deaf and dumb, CNN.

Download Full-text

Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

10.1007/978-3-642-17534-3_48 ◽

2010 ◽

pp. 391-398 ◽

Cited By ~ 1

Author(s):

Yanli Ji ◽

Atsushi Shimada ◽

Rin-ichiro Taniguchi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) ◽

10.1109/iccais.2013.6720562 ◽

2013 ◽

Cited By ~ 1

Author(s):

Quang D. Tran ◽

Ngoc Q. Ly

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal ◽

Fusion Scheme

Download Full-text

Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/136 ◽

2019 ◽

Cited By ~ 2

Author(s):

Haoze Wu ◽

Jiawei Liu ◽

Zheng-Jun Zha ◽

Zhenzhong Chen ◽

Xiaoyan Sun

Keyword(s):

Action Recognition ◽

Computational Cost ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Temporal Features ◽

Temporal Decomposition ◽

Spatio Temporal ◽

Different Order ◽

High Computational Cost

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatio-temporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-Temporal Features

Human Motion Sensing and Recognition - Studies in Computational Intelligence ◽

10.1007/978-3-662-53692-6_11 ◽

2017 ◽

pp. 233-250 ◽

Cited By ~ 8

Author(s):

Honghai Liu ◽

Zhaojie Ju ◽

Xiaofei Ji ◽

Chee Seng Chan ◽

Mehdi Khoury

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text