3D Skeletal Human Action Recognition Using a CNN Fusion Model

Mathematical Problems in Engineering ◽

10.1155/2021/6650632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Meng Li ◽

Qiumei Sun

Keyword(s):

Action Recognition ◽

Human Activity ◽

Recognition Performance ◽

Human Action Recognition ◽

Smart Homes ◽

Human Action ◽

Fusion Model ◽

Spatiotemporal Information ◽

Skeletal Activity ◽

Public Datasets

Smart homes have become central in the sustainability of buildings. Recognizing human activity in smart homes is the key tool to achieve home automation. Recently, two-stream Convolutional Neural Networks (CNNs) have shown promising performance for video-based human action recognition. However, such models cannot act directly on the 3D skeletal sequences due to its limitation to the 2D image video inputs. Considering the powerful effect of 3D skeletal data for describing human activity, in this study, we present a novel method to recognize the skeletal human activity in sustainable smart homes using a CNN fusion model. Our proposed method can represent the spatiotemporal information of each 3D skeletal sequence into three images and three image sequences through gray value encoding, referred to as skeletal trajectory shape images (STSIs) and skeletal pose image (SPI) sequences, and build a CNNs’ fusion model with three STSIs and three SPI sequences as input for skeletal activity recognition. Such three STSIs and three SPI sequences are, respectively, generated in three orthogonal planes as complementary to each other. The proposed CNN fusion model allows the hierarchical learning of spatiotemporal features, offering better action recognition performance. Experimental results on three public datasets show that our method outperforms the state-of-the-art methods.

Download Full-text

DMMs-Based Multiple Features Fusion for Human Action Recognition

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015100102 ◽

2015 ◽

Vol 6 (4) ◽

pp. 23-39 ◽

Cited By ~ 18

Author(s):

Mohammad Farhad Bulbul ◽

Yunsheng Jiang ◽

Jinwen Ma

Keyword(s):

Action Recognition ◽

Recognition Performance ◽

Recognition Task ◽

Human Action Recognition ◽

Fusion Rule ◽

Local Binary Patterns ◽

Human Action ◽

Decision Fusion ◽

Soft Decision ◽

Depth Sensors

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.

Download Full-text

Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Journal of Electrical and Computer Engineering ◽

10.1155/2017/8191537 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6

Author(s):

Shirui Huo ◽

Tianrui Hu ◽

Ce Li

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Test Sample ◽

Human Action ◽

Superior Performance ◽

Depth Image ◽

Collaborative Representation ◽

Depth Images ◽

Spatiotemporal Information ◽

Depth Motion Maps

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.

Download Full-text

A Comprehensive Survey on Human Action Recognition

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3933.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 902-908

Keyword(s):

Feature Extraction ◽

Action Recognition ◽

Human Activity ◽

Human Action Recognition ◽

Present Situation ◽

Human Action ◽

Multiple Parameters ◽

Multiple Feature ◽

Comprehensive Survey ◽

Complete Process

The present The present situation is having many challenges in security and surveillance of Human Action recognition (HAR). HAR has many fields and many techniques to provide modern and technical action implementation. We have studied multiple parameters and techniques used in HAR. We have come out with a list of outcomes and drawbacks of each technique present in different researches. This paper presents the survey on the complete process of recognition of human activity and provides survey on different Motion History Imaging (MHI) methods, model based, multiview and multiple feature extraction based recognition methods.

Download Full-text

Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture

Electronics ◽

10.3390/electronics9121993 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1993

Author(s):

Malik Ali Gul ◽

Muhammad Haroon Yousaf ◽

Shah Nawaz ◽

Zaka Ur Rehman ◽

HyungWon Kim

Keyword(s):

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Patient Monitoring ◽

Human Action Recognition ◽

Confidence Score ◽

Human Action ◽

Human Activity Recognition ◽

Video Sequences ◽

Human Actions

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.

Download Full-text

EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment

Sensors ◽

10.3390/s21175699 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5699

Author(s):

Vijeta Sharma ◽

Manjari Gupta ◽

Ajai Kumar ◽

Deepti Mishra

Keyword(s):

Action Recognition ◽

Human Activity ◽

Classroom Environment ◽

Total Duration ◽

Human Action Recognition ◽

Human Action ◽

Research Area ◽

Future Research ◽

Student Activities ◽

Novel Approach

Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.

Download Full-text

Semi-Supervised Image-to-Video Adaptation for Video Action Recognition

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37355 ◽

2021 ◽

Vol 9 (VIII) ◽

pp. 760-763

Author(s):

Rohan Munshi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Camera Motion ◽

Video Adaptation ◽

Surveillance Video ◽

Systematic Classification ◽

The Public ◽

Background Clutter ◽

Public Datasets

Given a sequence of images i.e. video, the task given a sequence of images i.e. video, the task of action recognition is to identify the most same action among the action sequences learned by the system. Such human action recognition is based on evidence gathered from videos. It has a lot of applications including surveillance, video indexing, biometrics, telehealth, and human-computer interaction. Vision-based human activity recognition is plagued by numerous challenges thanks to reading changes, occlusion, variation in execution rate, camera motion, and background clutter. In this survey, we provide an overview and report of the existing methods based on their ability to handle these challenges as well as how these methods can be generalized and their ability to detect abnormal actions. Such systematic classification can facilitate researchers to spot the acceptable ways on the market to deal with every one of the challenges visaged and their limitations. In addition to this, we also identify the public datasets and the challenges posed by them. From this survey, we have a tendency to draw conclusions relating to however well a challenge has been resolved, and that we determine potential analysis areas that need more work.

Download Full-text

Human Action Recognition Using Median Background and Max Pool Convolution with Nearest Neighbor

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2019040103 ◽

2019 ◽

Vol 10 (2) ◽

pp. 34-47 ◽

Cited By ~ 1

Author(s):

Bagavathi Lakshmi ◽

S.Parthasarathy

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Nearest Neighbor ◽

Human Action Recognition ◽

Human Action ◽

Human Activity Recognition ◽

Machine Learning Algorithms ◽

Support Vector

Discovering human activities on mobile devices is a challenging task for human action recognition. The ability of a device to recognize its user's activity is important because it enables context-aware applications and behavior. Recently, machine learning algorithms have been increasingly used for human action recognition. During the past few years, principal component analysis and support vector machines is widely used for robust human activity recognition. However, with global dynamic tendency and complex tasks involved, this robust human activity recognition (HAR) results in error and complexity. To deal with this problem, a machine learning algorithm is proposed and explores its application on HAR. In this article, a Max Pool Convolution Neural Network based on Nearest Neighbor (MPCNN-NN) is proposed to perform efficient and effective HAR using smartphone sensors by exploiting the inherent characteristics. The MPCNN-NN framework for HAR consists of three steps. In the first step, for each activity, the features of interest or foreground frame are detected using Median Background Subtraction. The second step consists of organizing the features (i.e. postures) that represent the strongest generic discriminating features (i.e. postures) based on Max Pool. The third and the final step is the HAR based on Nearest Neighbor that postures which maximizes the probability. Experiments have been conducted to demonstrate the superiority of the proposed MPCNN-NN framework on human action dataset, KARD (Kinect Activity Recognition Dataset).

Download Full-text

Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition

Applied Sciences ◽

10.3390/app8101835 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1835 ◽

Cited By ~ 1

Author(s):

Guangle Yao ◽

Tao Lei ◽

Xianyuan Liu ◽

Ping Jiang

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Temporal Evolution ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Temporal Scale ◽

Research Attention ◽

Action Units ◽

Multi Temporal

As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered as a temporal sequence of action units. These action units which we referred to as action atoms, can capture the key semantic and characteristic spatiotemporal features of actions in different temporal scales. We first investigate Res3D, a powerful 3D CNN architecture and create the variants of Res3D for different temporal scale. In each temporal scale, we design some practices to transfer the knowledge learned from RGB to optical flow (OF) and build RGB and OF streams to extract deep spatiotemporal information using Res3D. Then we propose an unsupervised method to mine action atoms in the deep spatiotemporal space. Finally, we use long short-term memory (LSTM) to model the temporal evolution of atoms for action recognition. The experimental results show that our proposed multi-temporal-scale spatiotemporal atoms modeling method achieves recognition performance comparable to that of state-of-the-art methods on two challenging action recognition datasets: UCF101 and HMDB51.

Download Full-text

Human Action Recognition Based on Hybrid Features

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.373-375.1188 ◽

2013 ◽

Vol 373-375 ◽

pp. 1188-1191

Author(s):

Ju Zhong ◽

Hua Wen Liu ◽

Chun Li Lin

Keyword(s):

Action Recognition ◽

Time Domain ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Extraction Methods ◽

Hybrid Features ◽

Motion Feature ◽

Feature Based ◽

Efficient Recognition

The extraction methods of both the shape feature based on Fourier descriptors and the motion feature in time domain were introduced. These features were fused to get a hybrid feature which had higher distinguish ability. This combined representation was used for human action recognition. The experimental results show the proposed hybrid feature has efficient recognition performance in the Weizmann action database .

Download Full-text

Deep Learning for Human Action Recognition with Convolution Neural Network

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206466 ◽

2020 ◽

pp. 376-380

Author(s):

S. Karthickkumar ◽

K. Kumar

Keyword(s):

Neural Network ◽

Health Care ◽

Deep Learning ◽

Action Recognition ◽

Human Activities ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Two Dimensional ◽

Consumer Behavior Analysis

In recent years, deep learning for human action recognition is one of the most popular researches. It has a variety of applications such as surveillance, health care, and consumer behavior analysis, robotics. In this paper to propose a Two-Dimensional (2D) Convolutional Neural Network for recognizing Human Activities. Here the WISDM dataset is used to tarin and test the data. It can have the Activities like sitting, standing and downstairs, upstairs, running. The human activity recognition performance of our 2D-CNN based method which shows 93.17% accuracy.

Download Full-text