Action Recognition Using a Spatial-Temporal Network for Wild Felines

Liqi Feng; Yaqin Zhao; Yichao Sun; Wenxuan Zhao; Jiaxi Tang

doi:10.3390/ani11020485

Action Recognition Using a Spatial-Temporal Network for Wild Felines

Animals ◽

10.3390/ani11020485 ◽

2021 ◽

Vol 11 (2) ◽

pp. 485

Author(s):

Liqi Feng ◽

Yaqin Zhao ◽

Yichao Sun ◽

Wenxuan Zhao ◽

Jiaxi Tang

Keyword(s):

Behavior Analysis ◽

Action Recognition ◽

Video Clip ◽

Human Action Recognition ◽

Human Action ◽

Knee Joints ◽

Temporal Part ◽

Stream Network ◽

Temporal Features ◽

Static Action

Behavior analysis of wild felines has significance for the protection of a grassland ecological environment. Compared with human action recognition, fewer researchers have focused on feline behavior analysis. This paper proposes a novel two-stream architecture that incorporates spatial and temporal networks for wild feline action recognition. The spatial portion outlines the object region extracted by Mask region-based convolutional neural network (R-CNN) and builds a Tiny Visual Geometry Group (VGG) network for static action recognition. Compared with VGG16, the Tiny VGG network can reduce the number of network parameters and avoid overfitting. The temporal part presents a novel skeleton-based action recognition model based on the bending angle fluctuation amplitude of the knee joints in a video clip. Due to its temporal features, the model can effectively distinguish between different upright actions, such as standing, ambling, and galloping, particularly when the felines are occluded by objects such as plants, fallen trees, and so on. The experimental results showed that the proposed two-stream network model can effectively outline the wild feline targets in captured images and can significantly improve the performance of wild feline action recognition due to its spatial and temporal features.

Download Full-text

Human action recognition to human behavior analysis

2017 International Conference on Information and Digital Technologies (IDT) ◽

10.1109/dt.2017.8012111 ◽

2017 ◽

Author(s):

Neziha Jaouedi ◽

Noureddine Boujnah ◽

Oumayma Htiwich ◽

Med Salim Bouhlel

Keyword(s):

Behavior Analysis ◽

Action Recognition ◽

Human Behavior ◽

Human Action Recognition ◽

Human Action ◽

Human Behavior Analysis

Download Full-text

I3D-Shufflenet Based Human Action Recognition

Algorithms ◽

10.3390/a13110301 ◽

2020 ◽

Vol 13 (11) ◽

pp. 301

Author(s):

Guocheng Liu ◽

Caixia Zhang ◽

Qingyang Xu ◽

Ruoshi Cheng ◽

Yong Song ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Convolution Kernel ◽

Histogram Of Oriented Gradients ◽

Temporal Features ◽

Convolution Kernels

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.

Download Full-text

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition from Motion Trajectory using Fourier Temporal Features of Skeleton Joints

2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) ◽

10.1109/icacce.2018.8441712 ◽

2018 ◽

Author(s):

Naresh Kumar ◽

Nagarajan Sukavanam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Trajectory ◽

Temporal Features

Download Full-text

Human Action Recognition Based on Spatio-temporal Features

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-11164-8_58 ◽

2009 ◽

pp. 357-362

Author(s):

Nikhil Sawant ◽

K. K. Biswas

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

MSST-RT: Multi-Stream Spatial-Temporal Relative Transformer for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s21165339 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5339

Author(s):

Yan Sun ◽

Yixin Shen ◽

Liyan Ma

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatial And Temporal Patterns ◽

Relay Nodes ◽

Dynamic Information ◽

Final Model ◽

Temporal Features ◽

Temporal Dimensions ◽

Short Range Correlations

Skeleton-based human action recognition has made great progress, especially with the development of a graph convolution network (GCN). The most important work is ST-GCN, which automatically learns both spatial and temporal patterns from skeleton sequences. However, this method still has some imperfections: only short-range correlations are appreciated, due to the limited receptive field of graph convolution. However, long-range dependence is essential for recognizing human action. In this work, we propose the use of a spatial-temporal relative transformer (ST-RT) to overcome these defects. Through introducing relay nodes, ST-RT avoids the transformer architecture, breaking the inherent skeleton topology in spatial and the order of skeleton sequence in temporal dimensions. Furthermore, we mine the dynamic information contained in motion at different scales. Finally, four ST-RTs, which extract spatial-temporal features from four kinds of skeleton sequence, are fused to form the final model, multi-stream spatial-temporal relative transformer (MSST-RT), to enhance performance. Extensive experiments evaluate the proposed methods on three benchmarks for skeleton-based action recognition: NTU RGB+D, NTU RGB+D 120 and UAV-Human. The results demonstrate that MSST-RT is on par with SOTA in terms of performance.

Download Full-text

Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

10.1007/978-3-642-17534-3_48 ◽

2010 ◽

pp. 391-398 ◽

Cited By ~ 1

Author(s):

Yanli Ji ◽

Atsushi Shimada ◽

Rin-ichiro Taniguchi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

IEEE Access ◽

10.1109/access.2018.2817253 ◽

2018 ◽

Vol 6 ◽

pp. 17913-17922 ◽

Cited By ~ 24

Author(s):

Lei Wang ◽

Yangyang Xu ◽

Jun Cheng ◽

Haiying Xia ◽

Jianqin Yin ◽

...

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text