Challenges and Limitations in Human Action Recognition on Unmanned Aerial Vehicles: A Comprehensive Survey

Nashwan Adnan Othman; Ilhan Aydin

doi:10.18280/ts.380515

Challenges and Limitations in Human Action Recognition on Unmanned Aerial Vehicles: A Comprehensive Survey

Traitement du signal ◽

10.18280/ts.380515 ◽

2021 ◽

Vol 38 (5) ◽

pp. 1403-1411

Author(s):

Nashwan Adnan Othman ◽

Ilhan Aydin

Keyword(s):

Action Recognition ◽

Smart Cities ◽

Large Angle ◽

Human Action Recognition ◽

Human Action ◽

Human Detection ◽

Long Distance ◽

Benchmark Datasets ◽

Comprehensive Survey ◽

Aerial Vehicle

An Unmanned Aerial Vehicle (UAV), commonly called a drone, is an aircraft without a human pilot aboard. Making UAVs that can accurately discover individuals on the ground is very important for various applications, such as people searches, and surveillance. UAV integration in smart cities is challenging, however, because of problems and concerns such as privacy, safety, and ethical/legal use. Human action recognition-based UAVs can utilize modern technologies. Thus, it is essential for future development of the aforementioned applications. UAV-based human activity recognition is the procedure of classifying photo sequences with action labels. This paper offers a comprehensive study of UAV-based human action recognition techniques. Furthermore, we conduct empirical research studies to assess several factors that might influence the efficiency of human detection and action recognition techniques in UAVs. Benchmark datasets commonly utilized for UAV-based human action recognition are briefly explained. Our findings reveal that the existing human action recognition innovations can identify human actions on UAVs with some limitations in range, altitudes, long-distance, and a large angle of depression.

Download Full-text

Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences

Sensors ◽

10.3390/s21113642 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3642

Author(s):

Mohammad Farhad Bulbul ◽

Sadiya Tabussum ◽

Hazrat Ali ◽

Wenli Zheng ◽

Mi Young Lee ◽

...

Keyword(s):

Action Recognition ◽

Depth Map ◽

Human Action Recognition ◽

Human Action ◽

Collaborative Representation ◽

Auto Correlation ◽

Time Operation ◽

Real Time Operation ◽

Benchmark Datasets ◽

Depth Motion Maps

This paper proposes an action recognition framework for depth map sequences using the 3D Space-Time Auto-Correlation of Gradients (STACOG) algorithm. First, each depth map sequence is split into two sets of sub-sequences of two different frame lengths individually. Second, a number of Depth Motion Maps (DMMs) sequences from every set are generated and are fed into STACOG to find an auto-correlation feature vector. For two distinct sets of sub-sequences, two auto-correlation feature vectors are obtained and applied gradually to L2-regularized Collaborative Representation Classifier (L2-CRC) for computing a pair of sets of residual values. Next, the Logarithmic Opinion Pool (LOGP) rule is used to combine the two different outcomes of L2-CRC and to allocate an action label of the depth map sequence. Finally, our proposed framework is evaluated on three benchmark datasets named MSR-action 3D dataset, DHA dataset, and UTD-MHAD dataset. We compare the experimental results of our proposed framework with state-of-the-art approaches to prove the effectiveness of the proposed framework. The computational efficiency of the framework is also analyzed for all the datasets to check whether it is suitable for real-time operation or not.

Download Full-text

Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition

Applied Sciences ◽

10.3390/app10124412 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4412

Author(s):

Ammar Mohsin Butt ◽

Muhammad Haroon Yousaf ◽

Fiza Murtaza ◽

Saima Nazir ◽

Serestina Viriri ◽

...

Keyword(s):

Action Recognition ◽

Feature Vector ◽

Human Action Recognition ◽

Human Action ◽

Compact Representation ◽

Agglomerative Clustering ◽

Residual Vector ◽

Benchmark Datasets ◽

Codebook Generation ◽

Spatio Temporal

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.

Download Full-text

A Comprehensive Survey on Human Action Recognition

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3933.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 902-908

Keyword(s):

Feature Extraction ◽

Action Recognition ◽

Human Activity ◽

Human Action Recognition ◽

Present Situation ◽

Human Action ◽

Multiple Parameters ◽

Multiple Feature ◽

Comprehensive Survey ◽

Complete Process

The present The present situation is having many challenges in security and surveillance of Human Action recognition (HAR). HAR has many fields and many techniques to provide modern and technical action implementation. We have studied multiple parameters and techniques used in HAR. We have come out with a list of outcomes and drawbacks of each technique present in different researches. This paper presents the survey on the complete process of recognition of human activity and provides survey on different Motion History Imaging (MHI) methods, model based, multiview and multiple feature extraction based recognition methods.

Download Full-text

ACTION RECOGNITION USING UNDECIMATED DUAL TREE COMPLEX WAVELET TRANSFORM FROM DEPTH MOTION MAPS / DEPTH SEQUENCES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w12-203-2019 ◽

2019 ◽

Vol XLII-2/W12 ◽

pp. 203-209

Author(s):

B. H. Shekar ◽

P. Rathnakara Shetty ◽

M. Sharmila Kumari ◽

L. Mestetsky

Keyword(s):

Wavelet Transform ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Feature Descriptor ◽

Motion Information ◽

Complex Wavelet Transform ◽

Benchmark Datasets ◽

Complex Wavelet ◽

Learning Machine

<p><strong>Abstract.</strong> Accumulating the motion information from a video sequence is one of the highly challenging and significant phase in Human Action Recognition. To achieve this, several classical and compact representations are proposed by the research community with proven applicability. In this paper, we propose a compact Depth Motion Map based representation methodology with hastey striding, consisely accumulating the motion information. We extract Undecimated Dual Tree Complex Wavelet Transform features from the proposed DMM, to form an efficient feature descriptor. We designate a Sequential Extreme Learning Machine for classifying the human action secquences on benchmark datasets, MSR Action 3D dataset and DHA Dataset. We empirically prove the feasability of our method under standard protocols, achieving proven results.</p>

Download Full-text

A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector

The Visual Computer ◽

10.1007/s00371-015-1066-2 ◽

2015 ◽

Vol 32 (3) ◽

pp. 289-306 ◽

Cited By ~ 67

Author(s):

Debapratim Das Dawn ◽

Soharab Hossain Shaikh

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Interest Point ◽

Comprehensive Survey ◽

Spatio Temporal

Download Full-text

Segmentation and Selective Feature Extraction for Human Detection to the Direction of Action Recognition

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.147 ◽

2021 ◽

Vol 15 ◽

pp. 1371-1386

Author(s):

Lakhyadeep Konwar ◽

Anjan Kumar Talukdar ◽

Kandarpa Kumar Sarma ◽

Navajit Saikia ◽

Subhash Chandra Rajbangshi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Noise Removal ◽

Human Action ◽

Graph Cut ◽

Human Detection ◽

Classification Task ◽

Automation System ◽

Detection And Tracking ◽

Multiple Human Detection

Detection as well as classification of different object for machine vision application is a challenging task. Similar to the other object detection and classification task, human detection concept provides a major role for the ad- vancement in the design of an automatic visual surveillance system (AVSS). For the future automation system if it is possible to include human detection and tracking, human action recognition, usual as well as unusual event recognition etc. concept for future AVSS, it will be a greater success in the transformable world. In this paper we have proposed a proper human detection and tracking technique for human action recognition toward the design of AVSS. Here we use median filter for noise removal, graph cut for segment the human images, mathematical morphology to refine the segmentation mask, extract selective feature points by sing HOG, classify human objects by using SVM with polynomial ker- nel and finally particle filter for tracking those of detected human. Due to the above mentioned combinations our system can independent to the variations of lightening conditions, color, shape, size, clothing etc. and can handle the occlusion. Our system can easily detect and track human in different indoor as well as outdoor environ- ment with a automatic multiple human detection rate of 97:61% and total multiple human detection and tracking accuracy is about 92% for AVSS. Due to the use of HOG to extract features af- ter graph cut segmentation operation, our system requires less memory for store the trained data therefore processing speed as well as accuracy of detection and tracking will be better than other techniques which can be suitable for action classification task.

Download Full-text

A Fine Grainedresearch Over Human Action Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4677.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5376-5384

Keyword(s):

Action Recognition ◽

Video Retrieval ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Visual Surveillance ◽

Benchmark Datasets ◽

Active Research ◽

The Individual ◽

Individual Strategies

Human Action Recognition from videos has been an active research is in the computer vision due to its significant applicability in various real-time applications like video retrieval, human-robot interactions, and visual surveillance, etc. Though there are so many surveys over Human action Recognition, they are limited to various constraints like only focusing on the methods in few orientations only. Unlike the earlier ones, this paper provides a detailed survey according to the basic working methodology of Human action recognition system. Initially, a detailed illustration is given about various standard benchmark datasets. Further, following the methodology, the survey is accomplished in two phases, i.e., the survey over feature extraction approaches and the survey over action classification approaches. Further, a fine-grained survey is also accomplished under every phase based on the individual strategies

Download Full-text

An Architecture for Human Action Recognition in Smart Cities Video Surveillance Systems

Research and Innovation Forum 2020 - Springer Proceedings in Complexity ◽

10.1007/978-3-030-62066-0_5 ◽

2021 ◽

pp. 51-56

Author(s):

J. M. Llaurado-Fons ◽

Ana Martinez ◽

Francisco A. Pujol-López ◽

Higinio Mora

Keyword(s):

Video Surveillance ◽

Action Recognition ◽

Smart Cities ◽

Human Action Recognition ◽

Human Action ◽

Surveillance Systems

Download Full-text

Progress of Human Action Recognition Research in the Last Ten Years: A Comprehensive Survey

Archives of Computational Methods in Engineering ◽

10.1007/s11831-021-09681-9 ◽

2021 ◽

Author(s):

Pawan Kumar Singh ◽

Soumalya Kundu ◽

Titir Adhikary ◽

Ram Sarkar ◽

Debotosh Bhattacharjee

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Comprehensive Survey

Download Full-text

A Novel Parameter Initialization Technique Using RBM-NN for Human Action Recognition

Computational Intelligence and Neuroscience ◽

10.1155/2020/8852404 ◽

2020 ◽

Vol 2020 ◽

pp. 1-30

Author(s):

Deepika Roselind Johnson ◽

V.Rhymend Uthariaraj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Recognition Rate ◽

Human Action Recognition ◽

Activation Function ◽

Human Action ◽

Learning Technologies ◽

Svm Classifier ◽

Global Features ◽

Benchmark Datasets

Human action recognition is a trending topic in the field of computer vision and its allied fields. The goal of human action recognition is to identify any human action that takes place in an image or a video dataset. For instance, the actions include walking, running, jumping, throwing, and much more. Existing human action recognition techniques have their own set of limitations when it concerns model accuracy and flexibility. To overcome these limitations, deep learning technologies were implemented. In the deep learning approach, a model learns by itself to improve its recognition accuracy and avoids problems such as gradient eruption, overfitting, and underfitting. In this paper, we propose a novel parameter initialization technique using the Maxout activation function. Firstly, human action is detected and tracked from the video dataset to learn the spatial-temporal features. Secondly, the extracted feature descriptors are trained using the RBM-NN. Thirdly, the local features are encoded into global features using an integrated forward and backward propagation process via RBM-NN. Finally, an SVM classifier recognizes the human actions in the video dataset. The experimental analysis performed on various benchmark datasets showed an improved recognition rate when compared to other state-of-the-art learning models.

Download Full-text