scholarly journals Learning Long-Term Temporal Features With Deep Neural Networks for Human Action Recognition

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 1840-1850 ◽  
Author(s):  
Sheng Yu ◽  
Li Xie ◽  
Lin Liu ◽  
Daoxun Xia
IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 17913-17922 ◽  
Author(s):  
Lei Wang ◽  
Yangyang Xu ◽  
Jun Cheng ◽  
Haiying Xia ◽  
Jianqin Yin ◽  
...  

Author(s):  
Prof. Rajeshwari. J. Kodulkar

Abstract: In deep neural networks, human action detection is one of the most demanding and complex tasks. Human gesture recognition is the same as human action recognition. Gesture is defined as a series of bodily motions that communicate a message. Gestures are a more natural and preferable way for humans to engage with computers, thereby bridging the gap between humans and robots. The finest communication platform for the deaf and dumb is human action recognition. We propose in this work to create a system for hand gesture identification that recognizes hand movements, hand characteristics such as peak calculation and angle calculation, and then converts gesture photos into text. Index Terms: Human action recognition, Deaf and dumb, CNN.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 301
Author(s):  
Guocheng Liu ◽  
Caixia Zhang ◽  
Qingyang Xu ◽  
Ruoshi Cheng ◽  
Yong Song ◽  
...  

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.


Data ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 104
Author(s):  
Ashok Sarabu ◽  
Ajit Kumar Santra

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.


Sign in / Sign up

Export Citation Format

Share Document