Deep Learning-Based Violin Bowing Action Recognition

Shih-Wei Sun; Bao-Yun Liu; Pao-Chi Chang

doi:10.3390/s20205732

Deep Learning-Based Violin Bowing Action Recognition

Sensors ◽

10.3390/s20205732 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5732

Author(s):

Shih-Wei Sun ◽

Bao-Yun Liu ◽

Pao-Chi Chang

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Data Augmentation ◽

Inertial Sensors ◽

Three Dimensional ◽

Recognition System ◽

Depth Camera ◽

Decision Level ◽

Average Accuracy ◽

Level Fusion

We propose a violin bowing action recognition system that can accurately recognize distinct bowing actions in classical violin performance. This system can recognize bowing actions by analyzing signals from a depth camera and from inertial sensors that are worn by a violinist. The contribution of this study is threefold: (1) a dataset comprising violin bowing actions was constructed from data captured by a depth camera and multiple inertial sensors; (2) data augmentation was achieved for depth-frame data through rotation in three-dimensional world coordinates and for inertial sensing data through yaw, pitch, and roll angle transformations; and, (3) bowing action classifiers were trained using different modalities, to compensate for the strengths and weaknesses of each modality, based on deep learning methods with a decision-level fusion process. In experiments, large external motions and subtle local motions produced from violin bow manipulations were both accurately recognized by the proposed system (average accuracy > 80%).

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text

Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors

IEEE Transactions on Human-Machine Systems ◽

10.1109/thms.2014.2362520 ◽

2015 ◽

Vol 45 (1) ◽

pp. 51-61 ◽

Cited By ~ 134

Author(s):

Chen Chen ◽

Roozbeh Jafari ◽

Nasser Kehtarnavaz

Keyword(s):

Action Recognition ◽

Inertial Sensors ◽

Human Action Recognition ◽

Human Action ◽

Depth Camera

Download Full-text

Ani-GIFs: A Benchmark Dataset for Domain Generalization of Action Recognition from GIFs

10.22541/au.162464907.76209032/v1 ◽

2021 ◽

Author(s):

Shoumik Majumdar ◽

Shubhangi Jain ◽

Isidora Chara Tourni ◽

Arsenii Mustafin ◽

Diala Lteif ◽

...

Keyword(s):

Deep Learning ◽

Data Acquisition ◽

Action Recognition ◽

Data Augmentation ◽

Learning Models ◽

Visual Content ◽

Lack Of Information ◽

Temporal Features ◽

Synthetic Video ◽

Multiple Domains

Deep learning models perform remarkably well for the same task under the assumption that data is always coming from the same distribution. However, this is generally violated in practice, mainly due to the differences in the data acquisition techniques and the lack of information about the underlying source of new data. Domain Generalization targets the ability to generalize to test data of an unseen domain; while this problem is well-studied for images, such studies are significantly lacking in spatiotemporal visual content – videos and GIFs. This is due to (1) the challenging nature of misalignment of temporal features and the varying appearance/motion of actors and actions in different domains, and (2) spatiotemporal datasets being laborious to collect and annotate for multiple domains. We collect and present the first synthetic video dataset of Animated GIFs for domain generalization, Ani-GIFs, that is used to study domain gap of videos vs. GIFs, and animated vs. real GIFs, for the task of action recognition. We provide a training and testing setting for Ani-GIFs, and extend two domain generalization baseline approaches, based on data augmentation and explainability, to the spatiotemporal domain to catalyze research in this direction.

Download Full-text

Multi-Modality Emotion Recognition Model with GAT-Based Multi-Head Inter-Modality Attention

Sensors ◽

10.3390/s20174894 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4894 ◽

Cited By ~ 1

Author(s):

Changzeng Fu ◽

Chaoran Liu ◽

Carlos Toshinori Ishi ◽

Hiroshi Ishiguro

Keyword(s):

Emotion Recognition ◽

Research Question ◽

Weighted Average ◽

Artificial Agents ◽

Fusion Method ◽

Recognition Model ◽

Decision Level ◽

Average Accuracy ◽

Audio Sample ◽

Level Fusion

Emotion recognition has been gaining attention in recent years due to its applications on artificial agents. To achieve a good performance with this task, much research has been conducted on the multi-modality emotion recognition model for leveraging the different strengths of each modality. However, a research question remains: what exactly is the most appropriate way to fuse the information from different modalities? In this paper, we proposed audio sample augmentation and an emotion-oriented encoder-decoder to improve the performance of emotion recognition and discussed an inter-modality, decision-level fusion method based on a graph attention network (GAT). Compared to the baseline, our model improved the weighted average F1-scores from 64.18 to 68.31% and the weighted average accuracy from 65.25 to 69.88%.

Download Full-text

Data Augmentation in Deep Learning-Based Fusion of Depth and Inertial Sensing for Action Recognition

IEEE Sensors Letters ◽

10.1109/lsens.2018.2878572 ◽

2019 ◽

Vol 3 (1) ◽

pp. 1-4 ◽

Cited By ~ 16

Author(s):

Neha Dawar ◽

Sarah Ostadabbas ◽

Nasser Kehtarnavaz

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Data Augmentation ◽

Inertial Sensing

Download Full-text

Baseball Player Behavior Classification System Using Long Short-Term Memory with Multimodal Features

Sensors ◽

10.3390/s19061425 ◽

2019 ◽

Vol 19 (6) ◽

pp. 1425 ◽

Cited By ~ 3

Author(s):

Shih-Wei Sun ◽

Ting-Chen Mou ◽

Chih-Chieh Fang ◽

Pao-Chi Chang ◽

Kai-Lung Hua ◽

...

Keyword(s):

Deep Learning ◽

Classification System ◽

Short Term Memory ◽

Inertial Sensors ◽

Depth Camera ◽

Baseball Player ◽

Baseball Players ◽

Behavior System ◽

Vector Projection ◽

Multimodal Features

In this paper, a preliminary baseball player behavior classification system is proposed. By using multiple IoT sensors and cameras, the proposed method accurately recognizes many of baseball players’ behaviors by analyzing signals from heterogeneous sensors. The contribution of this paper is threefold: (i) signals from a depth camera and from multiple inertial sensors are obtained and segmented, (ii) the time-variant skeleton vector projection from the depth camera and the statistical features extracted from the inertial sensors are used as features, and (iii) a deep learning-based scheme is proposed for training behavior classifiers. The experimental results demonstrate that the proposed deep learning behavior system achieves an accuracy of greater than 95% compared to the proposed dataset.

Download Full-text

A Vision-based Human Action Recognition System for Moving Cameras Through Deep Learning

Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning ◽

10.1145/3372806.3372815 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ming-Jen Chang ◽

Jih-Tang Hsieh ◽

Chiung-Yao Fang ◽

Sei-Wang Chen

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Recognition System ◽

Human Action

Download Full-text

Crop Type Mapping from Optical and Radar Time Series Using Attention-Based Deep Learning

Remote Sensing ◽

10.3390/rs13224668 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4668

Author(s):

Stella Ofori-Ampofo ◽

Charlotte Pelletier ◽

Stefan Lang

Keyword(s):

Time Series ◽

Deep Learning ◽

Decision Level ◽

Type Mapping ◽

Learning Techniques ◽

Crop Type ◽

Single Sensor ◽

Decision Level Fusion ◽

Level Fusion ◽

Sentinel 2

Crop maps are key inputs for crop inventory production and yield estimation and can inform the implementation of effective farm management practices. Producing these maps at detailed scales requires exhaustive field surveys that can be laborious, time-consuming, and expensive to replicate. With a growing archive of remote sensing data, there are enormous opportunities to exploit dense satellite image time series (SITS), temporal sequences of images over the same area. Generally, crop type mapping relies on single-sensor inputs and is solved with the help of traditional learning algorithms such as random forests or support vector machines. Nowadays, deep learning techniques have brought significant improvements by leveraging information in both spatial and temporal dimensions, which are relevant in crop studies. The concurrent availability of Sentinel-1 (synthetic aperture radar) and Sentinel-2 (optical) data offers a great opportunity to utilize them jointly; however, optimizing their synergy has been understudied with deep learning techniques. In this work, we analyze and compare three fusion strategies (input, layer, and decision levels) to identify the best strategy that optimizes optical-radar classification performance. They are applied to a recent architecture, notably, the pixel-set encoder–temporal attention encoder (PSE-TAE) developed specifically for object-based classification of SITS and based on self-attention mechanisms. Experiments are carried out in Brittany, in the northwest of France, with Sentinel-1 and Sentinel-2 time series. Input and layer-level fusion competitively achieved the best overall F-score surpassing decision-level fusion by 2%. On a per-class basis, decision-level fusion increased the accuracy of dominant classes, whereas layer-level fusion improves up to 13% for minority classes. Against single-sensor baseline, multi-sensor fusion strategies identified crop types more accurately: for example, input-level outperformed Sentinel-2 and Sentinel-1 by 3% and 9% in F-score, respectively. We have also conducted experiments that showed the importance of fusion for early time series classification and under high cloud cover condition.

Download Full-text

Multimodal Biometric Recognition System Using Face and Finger Vein Biometric Traits with Feature and Decision Level Fusion Techniques

International Journal of Computer Theory and Engineering ◽

10.7763/ijcte.2021.v13.1300 ◽

2021 ◽

Vol 13 (4) ◽

pp. 123-128

Author(s):

Arjun B. C. ◽

◽

H. N. Prakash

Keyword(s):

Recognition System ◽

Biometric Recognition ◽

Decision Level ◽

Finger Vein ◽

Decision Level Fusion ◽

Multimodal Biometric Recognition ◽

Level Fusion

Download Full-text

An Online Cursive Handwritten Medical Words Recognition System for Busy Doctors in Developing Countries for Ensuring Efficient Healthcare Service Delivery

10.21203/rs.3.rs-992698/v1 ◽

2021 ◽

Author(s):

Shaira Tabassum ◽

Md Mahmudur Rahman ◽

Nuren Abedin ◽

Md Moshiur Rahman ◽

Mostafa Taufiq Ahmed ◽

...

Keyword(s):

Developing Countries ◽

Data Augmentation ◽

Recognition System ◽

Average Accuracy ◽

Recognition Result ◽

Machine Learning Approach ◽

Bidirectional Lstm ◽

Data Expansion ◽

Handwritten Recognition ◽

Multiple Languages

Abstract Doctors in developing countries are too busy to write digital prescriptions. Ninety-seven percent of Bangladeshi doctors write handwritten prescriptions, the majority of which lack legibility. Prescriptions are harder to read as they contain multiple languages. This paper proposes a machine learning approach to recognize doctors' handwriting to create digital prescriptions. A ‘Handwritten Medical Term Corpus’ dataset is developed containing 17,431 samples of 480 medical terms. In order to improve the recognition efficiency, this paper introduces a data augmentation technique to widen the variety and increase the sample size. A sequence of line data is extracted from the augmented images of 1,591,100 samples and fed to a Bidirectional LSTM. Data augmentation includes pattern Rotating, Shifting and Stretching (RSS). Eight different combinations are applied to evaluate the strength of the proposed method. The result shows 93.0% average accuracy (max: 94.5%, min: 92.1%) using Bidirectional LSTM and RSS data augmentation. This accuracy is 19.6% higher than the recognition result with no data expansion. The proposed handwritten recognition technology can be installed in a smartpen for busy doctors which will recognize the writings and digitize them in real-time. It is expected that the smartpen will contribute to reduce medical errors, save medical costs and ensure healthy living in developing countries.

Download Full-text