EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment

Vijeta Sharma; Manjari Gupta; Ajai Kumar; Deepti Mishra

doi:10.3390/s21175699

EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment

Sensors ◽

10.3390/s21175699 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5699

Author(s):

Vijeta Sharma ◽

Manjari Gupta ◽

Ajai Kumar ◽

Deepti Mishra

Keyword(s):

Action Recognition ◽

Human Activity ◽

Classroom Environment ◽

Total Duration ◽

Human Action Recognition ◽

Human Action ◽

Research Area ◽

Future Research ◽

Student Activities ◽

Novel Approach

Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.

Download Full-text

Deep Learning based Human Action Recognition

ITM Web of Conferences ◽

10.1051/itmconf/20214003014 ◽

2021 ◽

Vol 40 ◽

pp. 03014

Author(s):

Ritik Pandey ◽

Yadnesh Chikhale ◽

Ritik Verma ◽

Deepali Patil

Keyword(s):

Deep Learning ◽

Image Classification ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Research Area ◽

Video Clips ◽

Object Interaction ◽

Important Research Area ◽

Multiple Frames

Human action recognition has become an important research area in the fields of computer vision, image processing, and human-machine or human-object interaction due to its large number of real time applications. Action recognition is the identification of different actions from video clips (an arrangement of 2D frames) where the action may be performed in the video. This is a general construction of image classification tasks to multiple frames and then collecting the predictions from each frame. Different approaches are proposed in literature to improve the accuracy in recognition. In this paper we proposed a deep learning based model for Recognition and the main focus is on the CNN model for image classification. The action videos are converted into frames and pre-processed before sending to our model for recognizing different actions accurately..

Download Full-text

A Comprehensive Survey on Human Action Recognition

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3933.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 902-908

Keyword(s):

Feature Extraction ◽

Action Recognition ◽

Human Activity ◽

Human Action Recognition ◽

Present Situation ◽

Human Action ◽

Multiple Parameters ◽

Multiple Feature ◽

Comprehensive Survey ◽

Complete Process

The present The present situation is having many challenges in security and surveillance of Human Action recognition (HAR). HAR has many fields and many techniques to provide modern and technical action implementation. We have studied multiple parameters and techniques used in HAR. We have come out with a list of outcomes and drawbacks of each technique present in different researches. This paper presents the survey on the complete process of recognition of human activity and provides survey on different Motion History Imaging (MHI) methods, model based, multiview and multiple feature extraction based recognition methods.

Download Full-text

Recognition of Inter-Class Variation of Human Actions in Sports Video

10.21203/rs.3.rs-684295/v1 ◽

2021 ◽

Author(s):

Akila.K

Keyword(s):

Action Recognition ◽

Wavelet Packet ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Human Actions ◽

Linear Discriminant ◽

Novel Approach ◽

Current Events ◽

Rule Set

Abstract Background: Human action recognition encompasses a scope for an automatic analysis of current events from video and has varied applications in multi-various fields. Recognizing and understanding of human actions from videos still remains a difficult downside as a result of the massive variations in human look, posture and body size inside identical category.Objective: This paper focuses on a specific issue related to inter-class variation in Human Action Recognition.Approach: To discriminate the human actions among the category, a novel approach which is based on wavelet packet transformation for feature extraction. As we are concentrating on classifying similar actions non-linearity among the features are analyzed and discriminated by Deterministic Normalized - Linear Discriminant Analysis (DN-LDA). However the major part of the recognition system relays on classification part and the dynamic feeds are classified by Hidden Markov Model at the final stage based on rule set..Conclusion: Experiments results have shown that the proposed approach is discriminative for similar human action recognition and well adapted to the inter-class variation

Download Full-text

A Review on Computer Vision-Based Methods for Human Action Recognition

Journal of Imaging ◽

10.3390/jimaging6060046 ◽

2020 ◽

Vol 6 (6) ◽

pp. 46

Author(s):

Mahmoud Al-Faris ◽

John Chiverton ◽

David Ndzi ◽

Ahmed Isam Ahmed

Keyword(s):

Health Care ◽

Computer Vision ◽

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Generative Models ◽

Future Research ◽

Learning Technology ◽

Recognition Systems

Human action recognition targets recognising different actions from a sequence of observations and different environmental conditions. A wide different applications is applicable to vision based action recognition research. This can include video surveillance, tracking, health care, and human–computer interaction. However, accurate and effective vision based recognition systems continue to be a big challenging area of research in the field of computer vision. This review introduces the most recent human action recognition systems and provides the advances of state-of-the-art methods. To this end, the direction of this research is sorted out from hand-crafted representation based methods including holistic and local representation methods with various sources of data, to a deep learning technology including discriminative and generative models and multi-modality based methods. Next, the most common datasets of human action recognition are presented. This review introduces several analyses, comparisons and recommendations that help to find out the direction of future research.

Download Full-text

Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture

Electronics ◽

10.3390/electronics9121993 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1993

Author(s):

Malik Ali Gul ◽

Muhammad Haroon Yousaf ◽

Shah Nawaz ◽

Zaka Ur Rehman ◽

HyungWon Kim

Keyword(s):

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Patient Monitoring ◽

Human Action Recognition ◽

Confidence Score ◽

Human Action ◽

Human Activity Recognition ◽

Video Sequences ◽

Human Actions

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.

Download Full-text

A Novel Approach for Human Action Recognition from Silhouette Images

IETE Journal of Research ◽

10.1080/03772063.2016.1242383 ◽

2016 ◽

Vol 63 (2) ◽

pp. 160-171 ◽

Cited By ~ 9

Author(s):

Satyabrata Maity ◽

Debotosh Bhattacharjee ◽

Amlan Chakrabarti

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Novel Approach

Download Full-text

Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform

Sensors ◽

10.3390/s20102886 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2886 ◽

Cited By ~ 1

Author(s):

Junwoo Lee ◽

Bummo Ahn

Keyword(s):

Mobile Robot ◽

Real Time ◽

Action Recognition ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Research Area ◽

Training Dataset ◽

Kinect Camera ◽

Robot Platform

Human action recognition is an important research area in the field of computer vision that can be applied in surveillance, assisted living, and robotic systems interacting with people. Although various approaches have been widely used, recent studies have mainly focused on deep-learning networks using Kinect camera that can easily generate data on skeleton joints using depth data, and have achieved satisfactory performances. However, their models are deep and complex to achieve a higher recognition score; therefore, they cannot be applied to a mobile robot platform using a Kinect camera. To overcome these limitations, we suggest a method to classify human actions in real-time using a single RGB camera, which can be applied to the mobile robot platform as well. We integrated two open-source libraries, i.e., OpenPose and 3D-baseline, to extract skeleton joints on RGB images, and classified the actions using convolutional neural networks. Finally, we set up the mobile robot platform including an NVIDIA JETSON XAVIER embedded board and tracking algorithm to monitor a person continuously. We achieved an accuracy of 70% on the NTU-RGBD training dataset, and the whole process was performed on an average of 15 frames per second (FPS) on an embedded board system.

Download Full-text

Integrating Complementary Appearance, Posture and Motion Cues for RGB-D Action Recognition

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018500306 ◽

2018 ◽

Vol 27 (08) ◽

pp. 1850030 ◽

Cited By ~ 1

Author(s):

Wanjun Chen ◽

Erhu Zhang ◽

Yan Zhang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Geometric Feature ◽

Integration Scheme ◽

Shape Information ◽

Visual Appearance ◽

Motion Cues ◽

Novel Approach ◽

Depth Feature

This work presents a novel approach to multimodal human action recognition by jointly using visual RGB and depth (including skeleton joint positions) data captured from depth camera. For the depth feature extraction, Local Surface Geometric Feature (LSGF) is adopted to capture the geometric appearance and postures cues. Simultaneously, the improved dense trajectory feature (IDT) is extracted from RGB modality to jointly characterize the motion, visual appearance and trajectory shape information. These features from different modalities are complementary to each other. Then a two-stage integration scheme is proposed, which incorporates the probability weights of each classifier for action recognition. The proposed approach is evaluated on four publicly available human action databases: NJUST RGB-D Action, MSR-ActionPairs, MSR-DailyAct3D, and UTD-MHAD. Experimental results demonstrate that the proposed approach outperforms or is comparable to the state-of-the-art methods.

Download Full-text

Human Action Recognition Using Median Background and Max Pool Convolution with Nearest Neighbor

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2019040103 ◽

2019 ◽

Vol 10 (2) ◽

pp. 34-47 ◽

Cited By ~ 1

Author(s):

Bagavathi Lakshmi ◽

S.Parthasarathy

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Nearest Neighbor ◽

Human Action Recognition ◽

Human Action ◽

Human Activity Recognition ◽

Machine Learning Algorithms ◽

Support Vector

Discovering human activities on mobile devices is a challenging task for human action recognition. The ability of a device to recognize its user's activity is important because it enables context-aware applications and behavior. Recently, machine learning algorithms have been increasingly used for human action recognition. During the past few years, principal component analysis and support vector machines is widely used for robust human activity recognition. However, with global dynamic tendency and complex tasks involved, this robust human activity recognition (HAR) results in error and complexity. To deal with this problem, a machine learning algorithm is proposed and explores its application on HAR. In this article, a Max Pool Convolution Neural Network based on Nearest Neighbor (MPCNN-NN) is proposed to perform efficient and effective HAR using smartphone sensors by exploiting the inherent characteristics. The MPCNN-NN framework for HAR consists of three steps. In the first step, for each activity, the features of interest or foreground frame are detected using Median Background Subtraction. The second step consists of organizing the features (i.e. postures) that represent the strongest generic discriminating features (i.e. postures) based on Max Pool. The third and the final step is the HAR based on Nearest Neighbor that postures which maximizes the probability. Experiments have been conducted to demonstrate the superiority of the proposed MPCNN-NN framework on human action dataset, KARD (Kinect Activity Recognition Dataset).

Download Full-text

3D Skeletal Human Action Recognition Using a CNN Fusion Model

Mathematical Problems in Engineering ◽

10.1155/2021/6650632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Meng Li ◽

Qiumei Sun

Keyword(s):

Action Recognition ◽

Human Activity ◽

Recognition Performance ◽

Human Action Recognition ◽

Smart Homes ◽

Human Action ◽

Fusion Model ◽

Spatiotemporal Information ◽

Skeletal Activity ◽

Public Datasets

Smart homes have become central in the sustainability of buildings. Recognizing human activity in smart homes is the key tool to achieve home automation. Recently, two-stream Convolutional Neural Networks (CNNs) have shown promising performance for video-based human action recognition. However, such models cannot act directly on the 3D skeletal sequences due to its limitation to the 2D image video inputs. Considering the powerful effect of 3D skeletal data for describing human activity, in this study, we present a novel method to recognize the skeletal human activity in sustainable smart homes using a CNN fusion model. Our proposed method can represent the spatiotemporal information of each 3D skeletal sequence into three images and three image sequences through gray value encoding, referred to as skeletal trajectory shape images (STSIs) and skeletal pose image (SPI) sequences, and build a CNNs’ fusion model with three STSIs and three SPI sequences as input for skeletal activity recognition. Such three STSIs and three SPI sequences are, respectively, generated in three orthogonal planes as complementary to each other. The proposed CNN fusion model allows the hierarchical learning of spatiotemporal features, offering better action recognition performance. Experimental results on three public datasets show that our method outperforms the state-of-the-art methods.

Download Full-text