scholarly journals Data Augmentation for Human Keypoint Estimation Deep Learning based Sign Language Translation

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1257
Author(s):  
Chan-Il Park ◽  
Chae-Bong Sohn

Deep learning technology has developed constantly and is applied in many fields. In order to correctly apply deep learning techniques, sufficient learning must be preceded. Various conditions are necessary for sufficient learning. One of the most important conditions is training data. Collecting sufficient training data is fundamental, because if the training data are insufficient, deep learning will not be done properly. Many types of training data are collected, but not all of them. So, we may have to collect them directly. Collecting takes a lot of time and hard work. To reduce this effort, the data augmentation method is used to increase the training data. Data augmentation has some common methods, but often requires different methods for specific data. For example, in order to recognize sign language, video data processed with openpose are used. In this paper, we propose a new data augmentation method for sign language data used for learning translation, and we expect to improve the learning performance, according to the proposed method.

Author(s):  
HyeonJung Park ◽  
Youngki Lee ◽  
JeongGil Ko

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.


2020 ◽  
Vol 10 (21) ◽  
pp. 7755 ◽  
Author(s):  
Liangliang Chen ◽  
Ning Yan ◽  
Hongmai Yang ◽  
Linlin Zhu ◽  
Zongwei Zheng ◽  
...  

Deep learning technology is outstanding in visual inspection. However, in actual industrial production, the use of deep learning technology for visual inspection requires a large number of training data with different acquisition scenarios. At present, the acquisition of such datasets is very time-consuming and labor-intensive, which limits the further development of deep learning in industrial production. To solve the problem of image data acquisition difficulty in industrial production with deep learning, this paper proposes a data augmentation method for deep learning based on multi-degree of freedom (DOF) automatic image acquisition and designs a multi-DOF automatic image acquisition system for deep learning. By designing random acquisition angles and random illumination conditions, different acquisition scenes in actual production are simulated. By optimizing the image acquisition path, a large number of accurate data can be obtained in a short time. In order to verify the performance of the dataset collected by the system, the fabric is selected as the research object after the system is built, and the dataset comparison experiment is carried out. The dataset comparison experiment confirms that the dataset obtained by the system is rich and close to the real application environment, which solves the problem of dataset insufficient in the application process of deep learning to a certain extent.


2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Yunong Tian ◽  
Guodong Yang ◽  
Zhe Wang ◽  
En Li ◽  
Zize Liang

Plant disease is one of the primary causes of crop yield reduction. With the development of computer vision and deep learning technology, autonomous detection of plant surface lesion images collected by optical sensors has become an important research direction for timely crop disease diagnosis. In this paper, an anthracnose lesion detection method based on deep learning is proposed. Firstly, for the problem of insufficient image data caused by the random occurrence of apple diseases, in addition to traditional image augmentation techniques, Cycle-Consistent Adversarial Network (CycleGAN) deep learning model is used in this paper to accomplish data augmentation. These methods effectively enrich the diversity of training data and provide a solid foundation for training the detection model. In this paper, on the basis of image data augmentation, densely connected neural network (DenseNet) is utilized to optimize feature layers of the YOLO-V3 model which have lower resolution. DenseNet greatly improves the utilization of features in the neural network and enhances the detection result of the YOLO-V3 model. It is verified in experiments that the improved model exceeds Faster R-CNN with VGG16 NET, the original YOLO-V3 model, and other three state-of-the-art networks in detection performance, and it can realize real-time detection. The proposed method can be well applied to the detection of anthracnose lesions on apple surfaces in orchards.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yu Pei ◽  
Zhiguo Luo ◽  
Ye Yan ◽  
Huijiong Yan ◽  
Jing Jiang ◽  
...  

The quality and quantity of training data are crucial to the performance of a deep-learning-based brain-computer interface (BCI) system. However, it is not practical to record EEG data over several long calibration sessions. A promising time- and cost-efficient solution is artificial data generation or data augmentation (DA). Here, we proposed a DA method for the motor imagery (MI) EEG signal called brain-area-recombination (BAR). For the BAR, each sample was first separated into two ones (named half-sample) by left/right brain channels, and the artificial samples were generated by recombining the half-samples. We then designed two schemas (intra- and adaptive-subject schema) corresponding to the single- and multi-subject scenarios. Extensive experiments using the classifier of EEGnet were conducted on two public datasets under various training set sizes. In both schemas, the BAR method can make the EEGnet have a better performance of classification (p < 0.01). To make a comparative investigation, we selected two common DA methods (noise-added and flipping), and the BAR method beat them (p < 0.05). Further, using the proposed BAR for augmentation, EEGnet achieved up to 8.3% improvement than a typical decoding algorithm CSP-SVM (p < 0.01), note that both the models were trained on the augmented dataset. This study shows that BAR usage can significantly improve the classification ability of deep learning to MI-EEG signals. To a certain extent, it may promote the development of deep learning technology in the field of BCI.


2021 ◽  
Vol 9 (1) ◽  
pp. 115
Author(s):  
Faisal Dharma Adhinata ◽  
Diovianto Putra Rakhmadani ◽  
Merlinda Wibowo ◽  
Akhmad Jayadi

The use of masks on the face in public places is an obligation for everyone because of the Covid-19 pandemic, which claims victims. Indonesia made 3M policies, one of which is to use masks to prevent coronavirus transmission. Currently, several researchers have developed a masked or non-masked face detection system. One of them is using deep learning techniques to classify a masked or non-masked face. Previous research used the MobileNetV2 transfer learning model, which resulted in an F-Measure value below 0.9. Of course, this result made the detection system not accurate enough. In this research, we propose a model with more parameters, namely the DenseNet201 model. The number of parameters of the DenseNet201 model is five times more than that of the MobileNetV2 model. The results obtained from several up to 30 epochs show that the DenseNet201 model produces 99% accuracy when training data. Then, we tested the matching feature on video data, the DenseNet201 model produced an F-Measure value of 0.98, while the MobileNetV2 model only produced an F-measure value of 0.67. These results prove the masked or non-masked face detection system is more accurate using the DenseNet201 model.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1052
Author(s):  
Leang Sim Nguon ◽  
Kangwon Seo ◽  
Jung-Hyun Lim ◽  
Tae-Jun Song ◽  
Sung-Hyun Cho ◽  
...  

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2021 ◽  
Vol 11 (11) ◽  
pp. 4753
Author(s):  
Gen Ye ◽  
Chen Du ◽  
Tong Lin ◽  
Yan Yan ◽  
Jack Jiang

(1) Background: Deep learning has become ubiquitous due to its impressive performance in various domains, such as varied as computer vision, natural language and speech processing, and game-playing. In this work, we investigated the performance of recent deep learning approaches on the laryngopharyngeal reflux (LPR) diagnosis task. (2) Methods: Our dataset is composed of 114 subjects with 37 pH-positive cases and 77 control cases. In contrast to prior work based on either reflux finding score (RFS) or pH monitoring, we directly take laryngoscope images as inputs to neural networks, as laryngoscopy is the most common and simple diagnostic method. The diagnosis task is formulated as a binary classification problem. We first tested a powerful backbone network that incorporates residual modules, attention mechanism and data augmentation. Furthermore, recent methods in transfer learning and few-shot learning were investigated. (3) Results: On our dataset, the performance is the best test classification accuracy is 73.4%, while the best AUC value is 76.2%. (4) Conclusions: This study demonstrates that deep learning techniques can be applied to classify LPR images automatically. Although the number of pH-positive images used for training is limited, deep network can still be capable of learning discriminant features with the advantage of technique.


2018 ◽  
Author(s):  
Uri Shaham

AbstractBiological measurements often contain systematic errors, also known as “batch effects”, which may invalidate downstream analysis when not handled correctly. The problem of removing batch effects is of major importance in the biological community. Despite recent advances in this direction via deep learning techniques, most current methods may not fully preserve the true biological patterns the data contains. In this work we propose a deep learning approach for batch effect removal. The crux of our approach is learning a batch-free encoding of the data, representing its intrinsic biological properties, but not batch effects. In addition, we also encode the systematic factors through a decoding mechanism and require accurate reconstruction of the data. Altogether, this allows us to fully preserve the true biological patterns represented in the data. Experimental results are reported on data obtained from two high throughput technologies, mass cytometry and single-cell RNA-seq. Beyond good performance on training data, we also observe that our system performs well on test data obtained from new patients, which was not available at training time. Our method is easy to handle, a publicly available code can be found at https://github.com/ushaham/BatchEffectRemoval2018.


Sign in / Sign up

Export Citation Format

Share Document