Few-Shot Learning with a Novel Voronoi Tessellation-Based Image Augmentation Method for Facial Palsy Detection

Olusola Oluwakemi Abayomi-Alli; Robertas Damaševičius; Rytis Maskeliūnas; Sanjay Misra

doi:10.3390/electronics10080978

Few-Shot Learning with a Novel Voronoi Tessellation-Based Image Augmentation Method for Facial Palsy Detection

Electronics ◽

10.3390/electronics10080978 ◽

2021 ◽

Vol 10 (8) ◽

pp. 978

Author(s):

Olusola Oluwakemi Abayomi-Alli ◽

Robertas Damaševičius ◽

Rytis Maskeliūnas ◽

Sanjay Misra

Keyword(s):

Deep Learning ◽

Facial Palsy ◽

Data Augmentation ◽

Voronoi Tessellation ◽

Class Imbalance ◽

Misclassification Rate ◽

Voronoi Cells ◽

Face Images ◽

Functional Consequences ◽

Voronoi Decomposition

Face palsy has adverse effects on the appearance of a person and has negative social and functional consequences on the patient. Deep learning methods can improve face palsy detection rate, but their efficiency is limited by insufficient data, class imbalance, and high misclassification rate. To alleviate the lack of data and improve the performance of deep learning models for palsy face detection, data augmentation methods can be used. In this paper, we propose a novel Voronoi decomposition-based random region erasing (VDRRE) image augmentation method consisting of partitioning images into randomly defined Voronoi cells as an alternative to rectangular based random erasing method. The proposed method augments the image dataset with new images, which are used to train the deep neural network. We achieved an accuracy of 99.34% using two-shot learning with VDRRE augmentation on palsy faces from Youtube Face Palsy (YFP) dataset, while normal faces are taken from Caltech Face Database. Our model shows an improvement over state-of-the-art methods in the detection of facial palsy from a small dataset of face images.

Download Full-text

Differential Biases and Variabilities of Deep Learning–Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study (Preprint)

10.2196/preprints.33049 ◽

2021 ◽

Author(s):

Dongchul Cha ◽

Chongwon Pae ◽

Se A Lee ◽

Gina Na ◽

Young Kyun Hur ◽

...

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Data Augmentation ◽

Class Imbalance ◽

Classification Model ◽

Kappa Statistics ◽

Test Set ◽

Diagnostic Characteristics ◽

Test Sets ◽

The Given

BACKGROUND Deep learning (DL)–based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application. OBJECTIVE This study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems. METHODS We used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models. RESULTS Our ML models had consistently high accuracies (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%), equivalent to those of otolaryngologists (balanced: mean 71.17%, SD 3.37%; imbalanced: mean 72.84%, SD 6.41%) and far better than those of nonotolaryngologists (balanced: mean 45.63%, SD 7.89%; imbalanced: mean 44.08%, SD 15.83%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07). CONCLUSIONS Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation.

Download Full-text

Face Image Age Estimation Based on Data Augmentation and Lightweight Convolutional Neural Network

Symmetry ◽

10.3390/sym12010146 ◽

2020 ◽

Vol 12 (1) ◽

pp. 146 ◽

Cited By ~ 6

Author(s):

Xinhua Liu ◽

Yao Zou ◽

Hailan Kuang ◽

Xiaolin Ma

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Age Estimation ◽

Data Augmentation ◽

Rapid Development ◽

Estimation Methods ◽

Face Images

Face images contain many important biological characteristics. The research directions of face images mainly include face age estimation, gender judgment, and facial expression recognition. Taking face age estimation as an example, the estimation of face age images through algorithms can be widely used in the fields of biometrics, intelligent monitoring, human-computer interaction, and personalized services. With the rapid development of computer technology, the processing speed of electronic devices has greatly increased, and the storage capacity has been greatly increased, allowing deep learning to dominate the field of artificial intelligence. Traditional age estimation methods first design features manually, then extract features, and perform age estimation. Convolutional neural networks (CNN) in deep learning have incomparable advantages in processing image features. Practice has proven that the accuracy of using convolutional neural networks to estimate the age of face images is far superior to traditional methods. However, as neural networks are designed to be deeper, and networks are becoming larger and more complex, this makes it difficult to deploy models on mobile terminals. Based on a lightweight convolutional neural network, an improved ShuffleNetV2 network based on the mixed attention mechanism (MA-SFV2: Mixed Attention-ShuffleNetV2) is proposed in this paper by transforming the output layer, merging classification and regression age estimation methods, and highlighting important features by preprocessing images and data augmentation methods. The influence of noise vectors such as the environmental information unrelated to faces in the image is reduced, so that the final age estimation accuracy can be comparable to the state-of-the-art.

Download Full-text

GAN-Based Data Augmentation and Anonymization for Mask Classification

10.5121/csit.2021.112315 ◽

2021 ◽

Author(s):

Mustafa Çelik ◽

Ahmet HaydarÖrnek

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Augmentation ◽

Synthetic Data ◽

Personal Data ◽

Original Data ◽

Classification Model ◽

Generative Adversarial Networks ◽

Face Images ◽

Learning Classifiers

Deep learning methods, especially convolutional neural networks (CNNs), have made a major contribution to computer vision. However, deep learning classifiers need large-scale annotated datasets to be trained without over-fitting. Also, in high-data diversity, trained models generalize better. However, collecting such a large-scale dataset remains challenging. Furthermore, it is invaluable for researchers to protect the subjects' confidentiality when using their personal data such as face images. In this paper, we propose a deep learning Generative Adversarial Networks (GANs) which generates synthetic samples for our mask classification model. Our contributions in this work are two-fold that the synthetics images provide. First, GANs' models can be used as an anonymization tool when the subjects' confidentiality is matters. Second, the generated masked/unmasked face images boost the performance of the mask classification model by using the synthetic images as a form of data augmentation. In our work, the classification accuracy using only traditional data augmentations is 93.71 %. By using both synthetic data and original data with traditional data augmentations the result is 95.50 %. It is shown that the GAN-generated synthetic data boosts the performance of deep learning classifiers.

Download Full-text

NIR Reflection Augmentation for DeepLearning-Based NIR Face Recognition

Symmetry ◽

10.3390/sym11101234 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1234 ◽

Cited By ~ 1

Author(s):

Jo ◽

Kim

Keyword(s):

Deep Learning ◽

Face Recognition ◽

Near Infrared ◽

Data Augmentation ◽

Recognition Rate ◽

Learning Approaches ◽

Training Set ◽

Simple Method ◽

Practical Applications ◽

Face Images

Face recognition using a near-infrared (NIR) sensor is widely applied to practical applications such as mobile unlocking or access control. However, unlike RGB sensors, few deep learning approaches have studied NIR face recognition. We conducted comparative experiments for the application of deep learning to NIR face recognition. To accomplish this, we gathered five public databases and trained two deep learning architectures. In our experiments, we found that simple architecture could have a competitive performance on the NIR face databases that are mostly composed of frontal face images. Furthermore, we propose a data augmentation method to train the architectures to improve recognition of users who wear glasses. With this augmented training set, the recognition rate for users who wear glasses increased by up to 16%. This result implies that the recognition of those who wear glasses can be overcome using this simple method without constructing an additional training set. Furthermore, the model that uses augmented data has symmetry with those trained with real glasses-wearing data regarding the recognition of people who wear glasses.

Download Full-text

Data Augmentation for Audio–Visual Emotion Recognition with an Efficient Multimodal Conditional GAN

Applied Sciences ◽

10.3390/app12010527 ◽

2022 ◽

Vol 12 (1) ◽

pp. 527

Author(s):

Fei Ma ◽

Yang Li ◽

Shiguang Ni ◽

Shaolun Huang ◽

Lin Zhang

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Data Augmentation ◽

Class Imbalance ◽

Real Data ◽

Training Data ◽

Visual Modality ◽

Emotional States ◽

Generative Adversarial Network ◽

Multimodal Data

Audio–visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human–machine interactions. With the help of deep learning, previous works have made great progress for audio–visual emotion recognition. However, these deep learning methods often require a large amount of data for training. In reality, data acquisition is difficult and expensive, especially for the multimodal data with different modalities. As a result, the training data may be in the low-data regime, which cannot be effectively used for deep learning. In addition, class imbalance may occur in the emotional data, which can further degrade the performance of audio–visual emotion recognition. To address these problems, we propose an efficient data augmentation framework by designing a multimodal conditional generative adversarial network (GAN) for audio–visual emotion recognition. Specifically, we design generators and discriminators for audio and visual modalities. The category information is used as their shared input to make sure our GAN can generate fake data of different categories. In addition, the high dependence between the audio modality and the visual modality in the generated multimodal data is modeled based on Hirschfeld–Gebelein–Re´nyi (HGR) maximal correlation. In this way, we relate different modalities in the generated data to approximate the real data. Then, the generated data are used to augment our data manifold. We further apply our approach to deal with the problem of class imbalance. To the best of our knowledge, this is the first work to propose a data augmentation strategy with a multimodal conditional GAN for audio–visual emotion recognition. We conduct a series of experiments on three public multimodal datasets, including eNTERFACE’05, RAVDESS, and CMEW. The results indicate that our multimodal conditional GAN has high effectiveness for data augmentation of audio–visual emotion recognition.

Download Full-text

Integrating Improved U-Net and Continuous Maximum Flow Algorithm for 3D Brain Tumor Image Segmentation

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.4.040412 ◽

2020 ◽

Vol 64 (4) ◽

pp. 40412-1-40412-11

Author(s):

Kexin Bai ◽

Qiang Li ◽

Ching-Hsin Wang

Keyword(s):

Brain Tumor ◽

Data Augmentation ◽

A Priori ◽

Class Imbalance ◽

Maximum Flow ◽

Magnetic Resonance Images ◽

Tumor Segmentation ◽

Similarity Coefficients ◽

Segmentation Algorithms ◽

Flow Algorithm

Abstract To address the issues of the relatively small size of brain tumor image datasets, severe class imbalance, and low precision in existing segmentation algorithms for brain tumor images, this study proposes a two-stage segmentation algorithm integrating convolutional neural networks (CNNs) and conventional methods. Four modalities of the original magnetic resonance images were first preprocessed separately. Next, preliminary segmentation was performed using an improved U-Net CNN containing deep monitoring, residual structures, dense connection structures, and dense skip connections. The authors adopted a multiclass Dice loss function to deal with class imbalance and successfully prevented overfitting using data augmentation. The preliminary segmentation results subsequently served as the a priori knowledge for a continuous maximum flow algorithm for fine segmentation of target edges. Experiments revealed that the mean Dice similarity coefficients of the proposed algorithm in whole tumor, tumor core, and enhancing tumor segmentation were 0.9072, 0.8578, and 0.7837, respectively. The proposed algorithm presents higher accuracy and better stability in comparison with some of the more advanced segmentation algorithms for brain tumor images.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning

The Korea Journal of BigData ◽

10.36498/kbigdt.2019.4.2.1 ◽

2019 ◽

Vol 4 (2) ◽

pp. 1-12

Author(s):

김영준 ◽

이인선 ◽

Hong Joo Lee ◽

김여정

Keyword(s):

Deep Learning ◽

Stock Price ◽

Data Augmentation ◽

Pattern Prediction

Download Full-text

Application of Deep Learning in Integrated Pest Management: A Real-Time System for Detection and Diagnosis of Oilseed Rape Pests

Mobile Information Systems ◽

10.1155/2019/4570808 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Yong He ◽

Hong Zeng ◽

Yangyang Fan ◽

Shuaisheng Ji ◽

Jianjian Wu

Keyword(s):

Deep Learning ◽

Integrated Pest Management ◽

Pest Management ◽

Real Time ◽

Oilseed Rape ◽

Data Augmentation ◽

Low Cost ◽

Response Speed ◽

Original Model ◽

Real Time System

In this paper, we proposed an approach to detect oilseed rape pests based on deep learning, which improves the mean average precision (mAP) to 77.14%; the result increased by 9.7% with the original model. We adopt this model to mobile platform to let every farmer able to use this program, which will diagnose pests in real time and provide suggestions on pest controlling. We designed an oilseed rape pest imaging database with 12 typical oilseed rape pests and compared the performance of five models, SSD w/Inception is chosen as the optimal model. Moreover, for the purpose of the high mAP, we have used data augmentation (DA) and added a dropout layer. The experiments are performed on the Android application we developed, and the result shows that our approach surpasses the original model obviously and is helpful for integrated pest management. This application has improved environmental adaptability, response speed, and accuracy by contrast with the past works and has the advantage of low cost and simple operation, which are suitable for the pest monitoring mission of drones and Internet of Things (IoT).

Download Full-text