Ensuring Computers Understand Manual Operations in Production: Deep-Learning-Based Action Recognition in Industrial Workflows

Zeyu Jiao; Guozhu Jia; Yingjie Cai

doi:10.3390/app10030966

Ensuring Computers Understand Manual Operations in Production: Deep-Learning-Based Action Recognition in Industrial Workflows

Applied Sciences ◽

10.3390/app10030966 ◽

2020 ◽

Vol 10 (3) ◽

pp. 966

Author(s):

Zeyu Jiao ◽

Guozhu Jia ◽

Yingjie Cai

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Spatial Information ◽

Human Action ◽

Video Data ◽

Temporal Information ◽

Superior Performance ◽

Learning Networks ◽

Industrial Environment ◽

Convolutional Networks

In this study, we consider fully automated action recognition based on deep learning in the industrial environment. In contrast to most existing methods, which rely on professional knowledge to construct complex hand-crafted features, or only use basic deep-learning methods, such as convolutional neural networks (CNNs), to extract information from images in the production process, we exploit a novel and effective method, which integrates multiple deep-learning networks including CNNs, spatial transformer networks (STNs), and graph convolutional networks (GCNs) to process video data in industrial workflows. The proposed method extracts both spatial and temporal information from video data. The spatial information is extracted by estimating the human pose of each frame, and the skeleton image of the human body in each frame is obtained. Furthermore, multi-frame skeleton images are processed by GCN to obtain temporal information, meaning the action recognition results are predicted automatically. By training on a large human action dataset, Kinetics, we apply the proposed method to the real-world industrial environment and achieve superior performance compared with the existing methods.

Download Full-text

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418825093 ◽

2019 ◽

Vol 16 (1) ◽

pp. 172988141882509 ◽

Cited By ~ 3

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Yibin Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Superior Performance ◽

Video Sequences ◽

Depth Video ◽

Difference Images

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

Learning Temporal Information from Spatial Information Using CapsNets for Human Action Recognition

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683720 ◽

2019 ◽

Cited By ~ 4

Author(s):

Abdullah M. Algamdi ◽

Victor Sanchez ◽

Chang-Tsun Li

Keyword(s):

Action Recognition ◽

Spatial Information ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information

Download Full-text

Deep Learning for Human Action Recognition Survey

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.323328 ◽

2018 ◽

Vol 6 (10) ◽

pp. 323-328

Author(s):

K.Kiruba . ◽

D. Shiloah Elizabeth ◽

C Sunil Retmin Raj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text

Deep Learning for Human Action Recognition

2021 6th International Conference for Convergence in Technology (I2CT) ◽

10.1109/i2ct51068.2021.9418080 ◽

2021 ◽

Author(s):

R. U. Shekokar ◽

S. N. Kale

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Online Human Action Recognition Using Deep Learning for Indoor Smart Mobile Robots

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) ◽

10.1109/icccis51004.2021.9397242 ◽

2021 ◽

Author(s):

Jih-Tang Hsieh ◽

Meng-Lin Chiang ◽

Chiung-Yao Fang ◽

Sei-Wang Chen

Keyword(s):

Deep Learning ◽

Mobile Robots ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Smart Mobile

Download Full-text

Human Action Recognition based on Hybrid Deep Learning Model and Shearlet Transform

2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE) ◽

10.1109/icitee49829.2020.9271687 ◽

2020 ◽

Author(s):

Nemir Ahmed Al-Azzawi

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Learning Model ◽

Shearlet Transform ◽

Deep Learning Model

Download Full-text

Human Action Recognition for Pose-based Attention: Methods on the Framework of Image Processing and Deep Learning

2021 56th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST) ◽

10.1109/icest52640.2021.9483503 ◽

2021 ◽

Author(s):

Desislava Nikolova ◽

Ivaylo Vladimirov ◽

Zornitsa Terneva

Keyword(s):

Image Processing ◽

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Study on Machine Learning and Deep Learning Methods for Human Action Recognition

10.20944/preprints202005.0146.v1 ◽

2020 ◽

Author(s):

Gopika Rajendran ◽

Ojus Thomas Lee ◽

Arya Gopi ◽

Jais jose ◽

Neha Gautham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Action Recognition ◽

Human Body ◽

Human Performance ◽

Human Action Recognition ◽

Human Action ◽

Human Robot Interaction ◽

Learning Approach ◽

Action Sequence

With the evolution of computing technology in many application like human robot interaction, human computer interaction and health-care system, 3D human body models and their dynamic motions has gained popularity. Human performance accompanies human body shapes and their relative motions. Research on human activity recognition is structured around how the complex movement of a human body is identified and analyzed. Vision based action recognition from video is such kind of tasks where actions are inferred by observing the complete set of action sequence performed by human. Many techniques have been revised over the recent decades in order to develop a robust as well as effective framework for action recognition. In this survey, we summarize recent advances in human action recognition, namely the machine learning approach, deep learning approach and evaluation of these approaches.

Download Full-text

Deep Learning Based Spatiotemporal Human Action Recognition and Localization System

10.1109/icaicta53211.2021.9640274 ◽

2021 ◽

Author(s):

Jesslyn Nathania ◽

Nugraha Priya Utama

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Localization System

Download Full-text