Non-Rigid Objects Recognition: Automatic Human Action Recognition in Video Sequences

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

Human action recognition in crowded surveillance video sequences by using features taken from key-point trajectories

CVPR 2011 WORKSHOPS ◽

10.1109/cvprw.2011.5981713 ◽

2011 ◽

Cited By ~ 10

Author(s):

Masaki Takahashi ◽

Masahide Naemura ◽

Mahito Fujii ◽

Shin'ichi Satoh

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Surveillance Video ◽

Point Trajectories

Download Full-text

Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture

Electronics ◽

10.3390/electronics9121993 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1993

Author(s):

Malik Ali Gul ◽

Muhammad Haroon Yousaf ◽

Shah Nawaz ◽

Zaka Ur Rehman ◽

HyungWon Kim

Keyword(s):

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Patient Monitoring ◽

Human Action Recognition ◽

Confidence Score ◽

Human Action ◽

Human Activity Recognition ◽

Video Sequences ◽

Human Actions

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.

Download Full-text

Discriminative Action Recognition Using Supervised Latent Topic Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.190-191.1125 ◽

2012 ◽

Vol 190-191 ◽

pp. 1125-1128

Author(s):

Huan Xin Zou ◽

Hao Sun ◽

Ke Feng Ji

Keyword(s):

Action Recognition ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Discriminative Learning ◽

Action Categorization ◽

Latent Topic ◽

Topic Structure

We present a discriminative learning method for human action recognition from video sequences. Our model combines a bag-of-words component with supervised latent topic models. The supervised latent Dirichlet allocation (sLDA) topic model, which employs discriminative learning using labeled data under a generative framework, is introduced to discover the latent topic structure which is most relevant to action categorization. We test our algorithm on two challenging datasets. Experimental results demonstrate the effectiveness of our algorithm.

Download Full-text

Human Action Recognition in Video Sequences Using Deep Belief Networks

Traitement du signal ◽

10.18280/ts.370105 ◽

2020 ◽

Vol 37 (1) ◽

pp. 37-44

Author(s):

Mehrez Abdellaoui ◽

Ali Douik

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Belief Networks ◽

Deep Belief Networks

Download Full-text

Human Action Recognition Based on a Spatio-Temporal Video Autoencoder

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420400017 ◽

2020 ◽

Vol 34 (11) ◽

pp. 2040001

Author(s):

Anderson Carlos Sousa e Santos ◽

Helio Pedrini

Keyword(s):

Action Recognition ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Computer Vision Applications ◽

Recognition Systems ◽

Spatio Temporal ◽

Human Operators ◽

Accuracy Rates

Due to rapid advances in the development of surveillance cameras with high sampling rates, low cost, small size and high resolution, video-based action recognition systems have become more commonly used in various computer vision applications. Human operators can be supported with the aid of such systems to detect events of interest in video sequences, improving recognition results and reducing failure cases. In this work, we propose and evaluate a method to learn two-dimensional (2D) representations from video sequences based on an autoencoder framework. Spatial and temporal information is explored through a multi-stream convolutional neural network in the context of human action recognition. Experimental results on the challenging UCF101 and HMDB51 datasets demonstrate that our representation is capable of achieving competitive accuracy rates when compared to other approaches available in the literature.

Download Full-text

Human action recognition based on multi-scale feature maps from depth video sequences

Multimedia Tools and Applications ◽

10.1007/s11042-021-11193-4 ◽

2021 ◽

Author(s):

Chang Li ◽

Qian Huang ◽

Xing Li ◽

Qianhan Wu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Depth Video

Download Full-text

Fast and reliable human action recognition in video sequences by sequential analysis

2017 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2017.8297028 ◽

2017 ◽

Author(s):

Hui Fang ◽

Jeyarajan Thiyagalingam ◽

Nik Bessis ◽

Eran Edirisinghe

Keyword(s):

Action Recognition ◽

Sequential Analysis ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences

Download Full-text

Histogram of Oriented Gradient-Based Fusion of Features for Human Action Recognition in Action Video Sequences

Sensors ◽

10.3390/s20247299 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7299

Author(s):

Chirag I. Patel ◽

Dileep Labana ◽

Sharnil Pandya ◽

Kirit Modi ◽

Hemant Ghayvat ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Moving Object ◽

Video Sequences ◽

Feature Descriptor ◽

Histogram Of Oriented Gradient ◽

Fusion Technique

Human Action Recognition (HAR) is the classification of an action performed by a human. The goal of this study was to recognize human actions in action video sequences. We present a novel feature descriptor for HAR that involves multiple features and combining them using fusion technique. The major focus of the feature descriptor is to exploits the action dissimilarities. The key contribution of the proposed approach is to built robust features descriptor that can work for underlying video sequences and various classification models. To achieve the objective of the proposed work, HAR has been performed in the following manner. First, moving object detection and segmentation are performed from the background. The features are calculated using the histogram of oriented gradient (HOG) from a segmented moving object. To reduce the feature descriptor size, we take an averaging of the HOG features across non-overlapping video frames. For the frequency domain information we have calculated regional features from the Fourier hog. Moreover, we have also included the velocity and displacement of moving object. Finally, we use fusion technique to combine these features in the proposed work. After a feature descriptor is prepared, it is provided to the classifier. Here, we have used well-known classifiers such as artificial neural networks (ANNs), support vector machine (SVM), multiple kernel learning (MKL), Meta-cognitive Neural Network (McNN), and the late fusion methods. The main objective of the proposed approach is to prepare a robust feature descriptor and to show the diversity of our feature descriptor. Though we are using five different classifiers, our feature descriptor performs relatively well across the various classifiers. The proposed approach is performed and compared with the state-of-the-art methods for action recognition on two publicly available benchmark datasets (KTH and Weizmann) and for cross-validation on the UCF11 dataset, HMDB51 dataset, and UCF101 dataset. Results of the control experiments, such as a change in the SVM classifier and the effects of the second hidden layer in ANN, are also reported. The results demonstrate that the proposed method performs reasonably compared with the majority of existing state-of-the-art methods, including the convolutional neural network-based feature extractors.

Download Full-text