scholarly journals Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition

2018 ◽  
Vol 8 (10) ◽  
pp. 1835 ◽  
Author(s):  
Guangle Yao ◽  
Tao Lei ◽  
Xianyuan Liu ◽  
Ping Jiang

As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered as a temporal sequence of action units. These action units which we referred to as action atoms, can capture the key semantic and characteristic spatiotemporal features of actions in different temporal scales. We first investigate Res3D, a powerful 3D CNN architecture and create the variants of Res3D for different temporal scale. In each temporal scale, we design some practices to transfer the knowledge learned from RGB to optical flow (OF) and build RGB and OF streams to extract deep spatiotemporal information using Res3D. Then we propose an unsupervised method to mine action atoms in the deep spatiotemporal space. Finally, we use long short-term memory (LSTM) to model the temporal evolution of atoms for action recognition. The experimental results show that our proposed multi-temporal-scale spatiotemporal atoms modeling method achieves recognition performance comparable to that of state-of-the-art methods on two challenging action recognition datasets: UCF101 and HMDB51.

Author(s):  
Mohammad Farhad Bulbul ◽  
Yunsheng Jiang ◽  
Jinwen Ma

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.


2020 ◽  
Vol 29 (12) ◽  
pp. 2050190
Author(s):  
Amel Ben Mahjoub ◽  
Mohamed Atri

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.


2013 ◽  
Vol 373-375 ◽  
pp. 1188-1191
Author(s):  
Ju Zhong ◽  
Hua Wen Liu ◽  
Chun Li Lin

The extraction methods of both the shape feature based on Fourier descriptors and the motion feature in time domain were introduced. These features were fused to get a hybrid feature which had higher distinguish ability. This combined representation was used for human action recognition. The experimental results show the proposed hybrid feature has efficient recognition performance in the Weizmann action database .


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Meng Li ◽  
Qiumei Sun

Smart homes have become central in the sustainability of buildings. Recognizing human activity in smart homes is the key tool to achieve home automation. Recently, two-stream Convolutional Neural Networks (CNNs) have shown promising performance for video-based human action recognition. However, such models cannot act directly on the 3D skeletal sequences due to its limitation to the 2D image video inputs. Considering the powerful effect of 3D skeletal data for describing human activity, in this study, we present a novel method to recognize the skeletal human activity in sustainable smart homes using a CNN fusion model. Our proposed method can represent the spatiotemporal information of each 3D skeletal sequence into three images and three image sequences through gray value encoding, referred to as skeletal trajectory shape images (STSIs) and skeletal pose image (SPI) sequences, and build a CNNs’ fusion model with three STSIs and three SPI sequences as input for skeletal activity recognition. Such three STSIs and three SPI sequences are, respectively, generated in three orthogonal planes as complementary to each other. The proposed CNN fusion model allows the hierarchical learning of spatiotemporal features, offering better action recognition performance. Experimental results on three public datasets show that our method outperforms the state-of-the-art methods.


Author(s):  
S. Karthickkumar ◽  
K. Kumar

In recent years, deep learning for human action recognition is one of the most popular researches. It has a variety of applications such as surveillance, health care, and consumer behavior analysis, robotics. In this paper to propose a Two-Dimensional (2D) Convolutional Neural Network for recognizing Human Activities. Here the WISDM dataset is used to tarin and test the data. It can have the Activities like sitting, standing and downstairs, upstairs, running. The human activity recognition performance of our 2D-CNN based method which shows 93.17% accuracy.


The objective is to develop a time series image representation of the skeletal action data and use it for recognition through a convolutional long short-term deep learning framework. Consequently, Kinect captured human skeletal data is transformed into a Joint Change Distance Image (JCDI) descriptor which maps the time changes in the joints. Subsequently, JCDIs are decoded spatially well with a Convolutional (CNN). Temporal decomposition is executed on long short term memory (LSTM) with data changes along x , y and z position vectors of the skeleton. We propose a combination of CNN and LSTM which maps the spatio temporal information to generate a generalized time series features for recognition. Finally, scores are fused from spatially vibrant CNNs and temporally sound LSTMs for action recognition. Publicly available action datasets such as NTU RGBD, MSR Action, UTKinect and G3D were used as test inputs for experimentation. The results showed a better performance due to spatio temporal modeling at both the representation and the recognition stages when compared to other state-of-the-arts


Sign in / Sign up

Export Citation Format

Share Document