Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition

Guangle Yao; Tao Lei; Xianyuan Liu; Ping Jiang

doi:10.3390/app8101835

Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition

Applied Sciences ◽

10.3390/app8101835 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1835 ◽

Cited By ~ 1

Author(s):

Guangle Yao ◽

Tao Lei ◽

Xianyuan Liu ◽

Ping Jiang

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Temporal Evolution ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Temporal Scale ◽

Research Attention ◽

Action Units ◽

Multi Temporal

As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered as a temporal sequence of action units. These action units which we referred to as action atoms, can capture the key semantic and characteristic spatiotemporal features of actions in different temporal scales. We first investigate Res3D, a powerful 3D CNN architecture and create the variants of Res3D for different temporal scale. In each temporal scale, we design some practices to transfer the knowledge learned from RGB to optical flow (OF) and build RGB and OF streams to extract deep spatiotemporal information using Res3D. Then we propose an unsupervised method to mine action atoms in the deep spatiotemporal space. Finally, we use long short-term memory (LSTM) to model the temporal evolution of atoms for action recognition. The experimental results show that our proposed multi-temporal-scale spatiotemporal atoms modeling method achieves recognition performance comparable to that of state-of-the-art methods on two challenging action recognition datasets: UCF101 and HMDB51.

Download Full-text

DMMs-Based Multiple Features Fusion for Human Action Recognition

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015100102 ◽

2015 ◽

Vol 6 (4) ◽

pp. 23-39 ◽

Cited By ~ 18

Author(s):

Mohammad Farhad Bulbul ◽

Yunsheng Jiang ◽

Jinwen Ma

Keyword(s):

Action Recognition ◽

Recognition Performance ◽

Recognition Task ◽

Human Action Recognition ◽

Fusion Rule ◽

Local Binary Patterns ◽

Human Action ◽

Decision Fusion ◽

Soft Decision ◽

Depth Sensors

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.

Download Full-text

A Flexible High-Level Fusion for an Accurate Human Action Recognition System

Journal of Circuits System and Computers ◽

10.1142/s021812662050190x ◽

2020 ◽

Vol 29 (12) ◽

pp. 2050190

Author(s):

Amel Ben Mahjoub ◽

Mohamed Atri

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Recognition Rate ◽

Human Action Recognition ◽

Local Binary Patterns ◽

Low Complexity ◽

Recognition System ◽

Human Action ◽

Collaborative Representation ◽

Depth Video

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.

Download Full-text

Robust human action recognition via long short-term memory

The 2013 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2013.6706797 ◽

2013 ◽

Cited By ~ 36

Author(s):

Alexander Grushin ◽

Derek D. Monner ◽

James A. Reggia ◽

Ajay Mishra

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Human Action Recognition ◽

Human Action ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Multi-Temporal Convolutions for Human Action Recognition in Videos

10.1109/ijcnn52387.2021.9533515 ◽

2021 ◽

Author(s):

Alexandros Stergiou ◽

Ronald Poppe

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Multi Temporal

Download Full-text

Human Action Recognition Based on Hybrid Features

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.373-375.1188 ◽

2013 ◽

Vol 373-375 ◽

pp. 1188-1191

Author(s):

Ju Zhong ◽

Hua Wen Liu ◽

Chun Li Lin

Keyword(s):

Action Recognition ◽

Time Domain ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Extraction Methods ◽

Hybrid Features ◽

Motion Feature ◽

Feature Based ◽

Efficient Recognition

The extraction methods of both the shape feature based on Fourier descriptors and the motion feature in time domain were introduced. These features were fused to get a hybrid feature which had higher distinguish ability. This combined representation was used for human action recognition. The experimental results show the proposed hybrid feature has efficient recognition performance in the Weizmann action database .

Download Full-text

Lattice Long Short-Term Memory for Human Action Recognition

2017 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2017.236 ◽

2017 ◽

Cited By ~ 29

Author(s):

Lin Sun ◽

Kui Jia ◽

Kevin Chen ◽

Dit Yan Yeung ◽

Bertram E. Shi ◽

...

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Human Action Recognition ◽

Human Action ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

3D Skeletal Human Action Recognition Using a CNN Fusion Model

Mathematical Problems in Engineering ◽

10.1155/2021/6650632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Meng Li ◽

Qiumei Sun

Keyword(s):

Action Recognition ◽

Human Activity ◽

Recognition Performance ◽

Human Action Recognition ◽

Smart Homes ◽

Human Action ◽

Fusion Model ◽

Spatiotemporal Information ◽

Skeletal Activity ◽

Public Datasets

Smart homes have become central in the sustainability of buildings. Recognizing human activity in smart homes is the key tool to achieve home automation. Recently, two-stream Convolutional Neural Networks (CNNs) have shown promising performance for video-based human action recognition. However, such models cannot act directly on the 3D skeletal sequences due to its limitation to the 2D image video inputs. Considering the powerful effect of 3D skeletal data for describing human activity, in this study, we present a novel method to recognize the skeletal human activity in sustainable smart homes using a CNN fusion model. Our proposed method can represent the spatiotemporal information of each 3D skeletal sequence into three images and three image sequences through gray value encoding, referred to as skeletal trajectory shape images (STSIs) and skeletal pose image (SPI) sequences, and build a CNNs’ fusion model with three STSIs and three SPI sequences as input for skeletal activity recognition. Such three STSIs and three SPI sequences are, respectively, generated in three orthogonal planes as complementary to each other. The proposed CNN fusion model allows the hierarchical learning of spatiotemporal features, offering better action recognition performance. Experimental results on three public datasets show that our method outperforms the state-of-the-art methods.

Download Full-text

Deep Learning for Human Action Recognition with Convolution Neural Network

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206466 ◽

2020 ◽

pp. 376-380

Author(s):

S. Karthickkumar ◽

K. Kumar

Keyword(s):

Neural Network ◽

Health Care ◽

Deep Learning ◽

Action Recognition ◽

Human Activities ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Two Dimensional ◽

Consumer Behavior Analysis

In recent years, deep learning for human action recognition is one of the most popular researches. It has a variety of applications such as surveillance, health care, and consumer behavior analysis, robotics. In this paper to propose a Two-Dimensional (2D) Convolutional Neural Network for recognizing Human Activities. Here the WISDM dataset is used to tarin and test the data. It can have the Activities like sitting, standing and downstairs, upstairs, running. The human activity recognition performance of our 2D-CNN based method which shows 93.17% accuracy.

Download Full-text

Convolutional Long Short-Term Memory Hybrid Networks for Skeletal Based Human Action Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8085.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 955-961

Keyword(s):

Time Series ◽

Action Recognition ◽

Short Term Memory ◽

Human Action Recognition ◽

Human Action ◽

Short Term ◽

Term Memory ◽

Temporal Modeling ◽

Long Short Term Memory ◽

Spatio Temporal

The objective is to develop a time series image representation of the skeletal action data and use it for recognition through a convolutional long short-term deep learning framework. Consequently, Kinect captured human skeletal data is transformed into a Joint Change Distance Image (JCDI) descriptor which maps the time changes in the joints. Subsequently, JCDIs are decoded spatially well with a Convolutional (CNN). Temporal decomposition is executed on long short term memory (LSTM) with data changes along x , y and z position vectors of the skeleton. We propose a combination of CNN and LSTM which maps the spatio temporal information to generate a generalized time series features for recognition. Finally, scores are fused from spatially vibrant CNNs and temporally sound LSTMs for action recognition. Publicly available action datasets such as NTU RGBD, MSR Action, UTKinect and G3D were used as test inputs for experimentation. The results showed a better performance due to spatio temporal modeling at both the representation and the recognition stages when compared to other state-of-the-arts

Download Full-text