TUHAD: Taekwondo Unit Technique Human Action Dataset with Key Frame-Based CNN Action Recognition

Jinkue Lee; Hoeryong Jung

doi:10.3390/s20174871

TUHAD: Taekwondo Unit Technique Human Action Dataset with Key Frame-Based CNN Action Recognition

Sensors ◽

10.3390/s20174871 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4871

Author(s):

Jinkue Lee ◽

Hoeryong Jung

Keyword(s):

Action Recognition ◽

Network Architecture ◽

Human Action ◽

Image Sequences ◽

Neural Network Architecture ◽

Key Frame ◽

Full Contact ◽

Proposed Model ◽

Input Configuration ◽

Quantitative Scoring

In taekwondo, poomsae (i.e., form) competitions have no quantitative scoring standards, unlike gyeorugi (i.e., full-contact sparring) in the Olympics. Consequently, there are diverse fairness issues regarding poomsae evaluation, and the demand for quantitative evaluation tools is increasing. Action recognition is a promising approach, but the extreme and rapid actions of taekwondo complicate its application. This study established the Taekwondo Unit technique Human Action Dataset (TUHAD), which consists of multimodal image sequences of poomsae actions. TUHAD contains 1936 action samples of eight unit techniques performed by 10 experts and captured by two camera views. A key frame-based convolutional neural network architecture was developed for taekwondo action recognition, and its accuracy was validated for various input configurations. A correlation analysis of the input configuration and accuracy demonstrated that the proposed model achieved a recognition accuracy of up to 95.833% (lowest accuracy of 74.49%). This study contributes to the research and development of taekwondo action recognition.

Download Full-text

A Novel Deep Learning Approach Using Contextual Embeddings for Toponym Resolution

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010028 ◽

2021 ◽

Vol 11 (1) ◽

pp. 28

Author(s):

Ana Bárbara Cardoso ◽

Bruno Martins ◽

Jacinto Estima

Keyword(s):

Network Architecture ◽

Additional Data ◽

Short Term Memory ◽

External Information ◽

Neural Network Architecture ◽

Novel Approach ◽

The Neural Network ◽

Proposed Model ◽

Textual Data ◽

Spatial Coordinates

This article describes a novel approach for toponym resolution with deep neural networks. The proposed approach does not involve matching references in the text against entries in a gazetteer, instead directly predicting geo-spatial coordinates. Multiple inputs are considered in the neural network architecture (e.g., the surrounding words are considered in combination with the toponym to disambiguate), using pre-trained contextual word embeddings (i.e., ELMo or BERT) as well as bi-directional Long Short-Term Memory units, which are both regularly used for modeling textual data. The intermediate representations are then used to predict a probability distribution over possible geo-spatial regions, and finally to predict the coordinates for the input toponym. The proposed model was tested on three datasets used on previous toponym resolution studies, specifically the (i) War of the Rebellion, (ii) Local–Global Lexicon, and (iii) SpatialML corpora. Moreover, we evaluated the effect of using (i) geophysical terrain properties as external information, including information on elevation or terrain development, among others, and (ii) additional data collected from Wikipedia articles, to further help with the training of the model. The obtained results show improvements using the proposed method, when compared to previous approaches, and specifically when BERT embeddings and additional data are involved.

Download Full-text

Key frame and skeleton extraction for deep learning-based human action recognition

10.1109/rivf51545.2021.9642132 ◽

2021 ◽

Author(s):

Hai-Hong Phan ◽

Trung Tin Nguyen ◽

Ngo Huu Phuc ◽

Nguyen Huu Nhan ◽

Do Minh Hieu ◽

...

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Skeleton Extraction ◽

Key Frame

Download Full-text

Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition

Pattern Recognition ◽

10.1016/j.patcog.2012.10.004 ◽

2013 ◽

Vol 46 (7) ◽

pp. 1810-1818 ◽

Cited By ~ 68

Author(s):

Li Liu ◽

Ling Shao ◽

Peter Rockett

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Feature Representation ◽

Frame Selection ◽

Motion Feature ◽

Key Frame ◽

Key Frame Selection

Download Full-text

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

Sensors ◽

10.3390/s20071825 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1825 ◽

Cited By ~ 5

Author(s):

Huy Hieu Pham ◽

Houssam Salmane ◽

Louahdi Khoudour ◽

Alain Crouzil ◽

Sergio A. Velastin ◽

...

Keyword(s):

Action Recognition ◽

Pose Estimation ◽

Network Architecture ◽

Human Action Recognition ◽

Human Action ◽

3D Pose Estimation ◽

Depth Cameras ◽

Depth Sensors ◽

Private And Public ◽

Spatio Temporal

We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems.

Download Full-text

Human Action Recognition Based on Improved MIL

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.2152 ◽

2015 ◽

Vol 713-715 ◽

pp. 2152-2155 ◽

Cited By ~ 1

Author(s):

Shao Ping Zhu

Keyword(s):

Computer Vision ◽

Action Recognition ◽

Heuristic Algorithm ◽

Human Action Recognition ◽

Human Action ◽

Image Sequences ◽

Multiple Instance Learning ◽

New Method ◽

Human Actions ◽

Human Actions Recognition

According to the problem that achieves robust human actions recognition from image sequences in computer vision, using the Iterative Querying Heuristic algorithm as a guide, a improved Multiple Instance Learning (MIL) method is proposed for human action recognition in video image sequences. Experiments show that the new method can quickly recognize human actions and achieve high recognition rates, and on the Weizmann database validate our analysis.

Download Full-text

A Deep Convolutional Neural Network Architecture for Cancer Diagnosis using Histopathological Images

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l9524.10101221 ◽

2021 ◽

Vol 10 (12) ◽

pp. 7-12

Author(s):

Karthika Gidijala ◽

◽

Mansa Devi Pappu ◽

Manasa Vavilapalli ◽

Mahesh Kothuru ◽

...

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Neural Networks ◽

Decision Making ◽

Deep Learning ◽

Cancer Diagnosis ◽

Network Architecture ◽

Neural Network Architecture ◽

Proposed Model ◽

Histopathological Images

Many different models of Convolution Neural Networks exist in the Deep Learning studies. The application and prudence of the algorithms is known only when they are implemented with strong datasets. The histopathological images of breast cancer are considered as to have much number of haphazard structures and textures. Dealing with such images is a challenging issue in deep learning. Working on wet labs and in coherence to the results many research have blogged with novel annotations in the research. In this paper, we are presenting a model that can work efficiently on the raw images with different resolutions and alleviating with the problems of the presence of the structures and textures. The proposed model achieves considerably good results useful for decision making in cancer diagnosis.

Download Full-text

Modified Neural Architecture Search (NAS) Using the Chromosome Non-Disjunction

Applied Sciences ◽

10.3390/app11188628 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8628

Author(s):

Kang-Moon Park ◽

Donghoon Shin ◽

Sung-Do Chi

Keyword(s):

Neural Network ◽

Network Architecture ◽

Simulation Studies ◽

Neural Network Architecture ◽

Conventional Model ◽

Neural Architecture ◽

The Neural Network ◽

Proposed Model ◽

Improve Cost

This paper proposes a deep neural network structuring methodology through a genetic algorithm (GA) using chromosome non-disjunction. The proposed model includes methods for generating and tuning the neural network architecture without the aid of human experts. Since the original neural architecture search (henceforth, NAS) was announced, NAS techniques, such as NASBot, NASGBO and CoDeepNEAT, have been widely adopted in order to improve cost- and/or time-effectiveness for human experts. In these models, evolutionary algorithms (EAs) are employed to effectively enhance the accuracy of the neural network architecture. In particular, CoDeepNEAT uses a constructive GA starting from minimal architecture. This will only work quickly if the solution architecture is small. On the other hand, the proposed methodology utilizes chromosome non-disjunction as a new genetic operation. Our approach differs from previous methodologies in that it includes a destructive approach as well as a constructive approach, and is similar to pruning methodologies, which realizes tuning of the previous neural network architecture. A case study applied to the sentence word ordering problem and AlexNet for CIFAR-10 illustrates the applicability of the proposed methodology. We show from the simulation studies that the accuracy of the model was improved by 0.7% compared to the conventional model without human expert.

Download Full-text

An Unsupervised Feature learning and clustering method for key frame extraction on human action recognition

2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER) ◽

10.1109/cyber.2017.8446179 ◽

2017 ◽

Author(s):

Xiaomin Pei ◽

Huijie Fan ◽

Yandong Tang

Keyword(s):

Action Recognition ◽

Feature Learning ◽

Human Action Recognition ◽

Human Action ◽

Unsupervised Feature Learning ◽

Clustering Method ◽

Key Frame Extraction ◽

Key Frame

Download Full-text

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/239 ◽

2017 ◽

Cited By ~ 204

Author(s):

Huifeng Guo ◽

Ruiming TANG ◽

Yunming Ye ◽

Zhenguo Li ◽

Xiuqiang He

Keyword(s):

Neural Network ◽

Network Architecture ◽

Feature Learning ◽

High Order ◽

Feature Engineering ◽

Neural Network Architecture ◽

User Behaviors ◽

Feature Interactions ◽

Deep Model ◽

Proposed Model

Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.

Download Full-text

A fast human action recognition algorithm based on key frame pinning

Real-time Photonic Measurements, Data Management, and Processing III ◽

10.1117/12.2502412 ◽

2018 ◽

Author(s):

Gangbo Sun ◽

Bo Lei ◽

Hai Tan ◽

Yin Xu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Key Frame

Download Full-text