scholarly journals Convolutional and Recurrent Neural Network for Human Action Recognition: application on American Sign Language

2019 ◽  
Author(s):  
Hernandez Vincent ◽  
Suzuki Tomoya ◽  
Venture Gentiane

AbstractHuman Action Recognition (HAR) is an important and difficult topic because of the important variability between tasks repeated several times by a subject and between subjects. This work is motivated by providing time-series signal classification and a robust validation and test approaches. This study proposes to classify 60 American Sign Language signs from data provided by the LeapMotion sensor by using a combined approach with Convolutional Neural Network (ConvNet) and Recurrent Neural Network with Long-Short Term Memory cells (LSTM) called ConvNet-LSTM. Moreover, a complete kinematic model of the right and left forearm/hand/fingers/thumb is proposed as well as the use of a simple data augmentation technique to improve the generalization of neural networks. Results showed an accuracy of 89.3% on a user-independent test set with data augmentation when using the ConvNet-LSTM, while LSTM alone provided an accuracy of 85.0% on the same test set. The result dropped respectively to 85.9% and 81.4% without data augmentation.

Author(s):  
Anantha Prabha P ◽  
Srimathi R ◽  
Srividhya R ◽  
Sowmiya T G

Human Action Recognition has been an active research topic since early 1980s due to its promising applications in many domains like video indexing, surveillance, gesture recognition, video retrieval and human-computer interactions where the actions in the form of videos or sensor datas are recognized. The extraction of relevant features from the video streams is the most challenging part. With the emergence of advanced artificial intelligence techniques, deep learning methods are adopted to achieve the goal. The proposed system presents a Recurrent Neural Network (RNN) methodology for Human Action Recognition using star skeleton as a representative descriptor of human posture. Star skeleton is the process of jointing the gross contour extremes of a body to its centroid. To use star skeleton as feature for action recognition, the feature is defined as a five-dimensional vector in star fashion because the head and four limbs are usually local extremes of human body. In our project, we assumed an action is composed of a series of star skeletons overtime. Therefore, images expressing human action which are time-sequential are transformed into a feature vector sequence. Then the feature vector sequence must be transformed into symbol sequence so that RNN can model the action. RNN is used because the features extracted are time dependent


2021 ◽  
Vol 167 ◽  
pp. 114403
Author(s):  
C.K.M. Lee ◽  
Kam K.H. Ng ◽  
Chun-Hsien Chen ◽  
H.C.W. Lau ◽  
S.Y. Chung ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4720
Author(s):  
Yujia Zhang ◽  
Lai-Man Po ◽  
Jingjing Xiong ◽  
Yasar Abbas Ur REHMAN ◽  
Kwok-Wai Cheung

Human action recognition methods in videos based on deep convolutional neural networks usually use random cropping or its variants for data augmentation. However, this traditional data augmentation approach may generate many non-informative samples (video patches covering only a small part of the foreground or only the background) that are not related to a specific action. These samples can be regarded as noisy samples with incorrect labels, which reduces the overall action recognition performance. In this paper, we attempt to mitigate the impact of noisy samples by proposing an Auto-augmented Siamese Neural Network (ASNet). In this framework, we propose backpropagating salient patches and randomly cropped samples in the same iteration to perform gradient compensation to alleviate the adverse gradient effects of non-informative samples. Salient patches refer to the samples containing critical information for human action recognition. The generation of salient patches is formulated as a Markov decision process, and a reinforcement learning agent called SPA (Salient Patch Agent) is introduced to extract patches in a weakly supervised manner without extra labels. Extensive experiments were conducted on two well-known datasets UCF-101 and HMDB-51 to verify the effectiveness of the proposed SPA and ASNet.


Teknik ◽  
2021 ◽  
Vol 42 (2) ◽  
pp. 137-148
Author(s):  
Vincentius Abdi Gunawan ◽  
Leonardus Sandy Ade Putra

Communication is essential in conveying information from one individual to another. However, not all individuals in the world can communicate verbally. According to WHO, deafness is a hearing loss that affects 466 million people globally, and 34 million are children. So it is necessary to have a non-verbal language learning method for someone who has hearing problems. The purpose of this study is to build a system that can identify non-verbal language so that it can be easily understood in real-time. A high success rate in the system needs a proper method to be applied in the system, such as machine learning supported by wavelet feature extraction and different classification methods in image processing. Machine learning was applied in the system because of its ability to recognize and compare the classification results in four different methods. The four classifications used to compare the hand gesture recognition from American Sign Language are the Multi-Class SVM classification, Backpropagation Neural Network Backpropagation, K - Nearest Neighbor (K-NN), and Naïve Bayes. The simulation test of the four classification methods that have been carried out obtained success rates of 99.3%, 98.28%, 97.7%, and 95.98%, respectively. So it can be concluded that the classification method using the Multi-Class SVM has the highest success rate in the introduction of American Sign Language, which reaches 99.3%. The whole system is designed and tested using MATLAB as supporting software and data processing.


Sign in / Sign up

Export Citation Format

Share Document