FI-Net: A Speech Emotion Recognition Framework with Feature Integration and Data Augmentation

Adversarial Data Augmentation Network for Speech Emotion Recognition

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipaasc47483.2019.9023347 ◽

2019 ◽

Cited By ~ 1

Author(s):

Lu Yi ◽

Man-Wai Mak

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition Using 2D-CNN with Data Augmentation

10.1109/icsecs52883.2021.00130 ◽

2021 ◽

Author(s):

Auliya Mujaddidurrahman ◽

Ferda Ernawan ◽

Adi Wibowo ◽

Eko Adi Sarwoko ◽

Aris Sugiharto ◽

...

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition

Download Full-text

Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments

Sensors ◽

10.3390/s18113744 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3744 ◽

Cited By ~ 4

Author(s):

Jaehun Bang ◽

Taeho Hur ◽

Dohyeong Kim ◽

Thien Huynh-The ◽

Jongwon Lee ◽

...

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Imbalanced Data ◽

Sampling Technique ◽

Cold Start ◽

Training Model ◽

Small Sample ◽

Speech Emotion Recognition ◽

Target User ◽

Cold Start Problem

Personalized emotion recognition provides an individual training model for each target user in order to mitigate the accuracy problem when using general training models collected from multiple users. Existing personalized speech emotion recognition research has a cold-start problem that requires a large amount of emotionally-balanced data samples from the target user when creating the personalized training model. Such research is difficult to apply in real environments due to the difficulty of collecting numerous target user speech data with emotionally-balanced label samples. Therefore, we propose the Robust Personalized Emotion Recognition Framework with the Adaptive Data Boosting Algorithm to solve the cold-start problem. The proposed framework incrementally provides a customized training model for the target user by reinforcing the dataset by combining the acquired target user speech with speech from other users, followed by applying SMOTE (Synthetic Minority Over-sampling Technique)-based data augmentation. The proposed method proved to be adaptive across a small number of target user datasets and emotionally-imbalanced data environments through iterative experiments using the IEMOCAP (Interactive Emotional Dyadic Motion Capture) database.

Download Full-text

Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3027600 ◽

2020 ◽

pp. 1-13

Author(s):

Lu Yi ◽

Man-Wai Mak

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition using Data Augmentation Method by Cycle-Generative Adversarial Networks

10.20944/preprints202104.0651.v1 ◽

2021 ◽

Author(s):

Arash Shilandari ◽

Hossein Marvi ◽

Hossein Khosravi

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Speech Processing ◽

Data Augmentation ◽

Generative Adversarial Networks ◽

Speech Emotion Recognition ◽

Support Vector ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks

Nowadays, and with the mechanization of life, speech processing has become so crucial for the interaction between humans and machines. Deep neural networks require a database with enough data for training. The more features are extracted from the speech signal, the more samples are needed to train these networks. Adequate training of these networks can be ensured when there is access to sufficient and varied data in each class. If there is not enough data; it is possible to use data augmentation methods to obtain a database with enough samples. One of the obstacles to developing speech emotion recognition systems is the Data sparsity problem in each class for neural network training. The current study has focused on making a cycle generative adversarial network for data augmentation in a system for speech emotion recognition. For each of the five emotions employed, an adversarial generating network is designed to generate data that is very similar to the main data in that class, as well as differentiate the emotions of the other classes. These networks are taught in an adversarial way to produce feature vectors like each class in the space of the main feature, and then they add to the training sets existing in the database to train the classifier network. Instead of using the common cross-entropy error to train generative adversarial networks and to remove the vanishing gradient problem, Wasserstein Divergence has been used to produce high-quality artificial samples. The suggested network has been tested to be applied for speech emotion recognition using EMODB as training, testing, and evaluating sets, and the quality of artificial data evaluated using two Support Vector Machine (SVM) and Deep Neural Network (DNN) classifiers. Moreover, it has been revealed that extracting and reproducing high-level features from acoustic features, speech emotion recognition with separating five primary emotions has been done with acceptable accuracy.

Download Full-text

CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition

10.21437/interspeech.2019-2293 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fang Bao ◽

Michael Neumann ◽

Ngoc Thang Vu

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition ◽

Style Transfer

Download Full-text

Speech emotion recognition using data augmentation

International Journal of Speech Technology ◽

10.1007/s10772-021-09883-3 ◽

2021 ◽

Author(s):

V. M. Praseetha ◽

P. P. Joby

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition ◽

Using Data

Download Full-text

CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation

10.21437/smm.2018-5 ◽

2018 ◽

Cited By ~ 9

Author(s):

Caroline Etienne ◽

Guillaume Fidanza ◽

Andrei Petrovskii ◽

Laurence Devillers ◽

Benoit Schmauch

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Speech Emotion Recognition

Download Full-text

Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder

Electronics ◽

10.3390/electronics10172086 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2086

Author(s):

Yangwei Ying ◽

Yuanwu Tu ◽

Hong Zhou

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Feature Learning ◽

Human Potential ◽

Speech Emotion Recognition ◽

Unsupervised Feature Learning ◽

Learning Techniques ◽

Speech Data ◽

Data Division ◽

Speech Features

Speech signals contain abundant information on personal emotions, which plays an important part in the representation of human potential characteristics and expressions. However, the deficiency of emotion speech data affects the development of speech emotion recognition (SER), which also limits the promotion of recognition accuracy. Currently, the most effective approach is to make use of unsupervised feature learning techniques to extract speech features from available speech data and generate emotion classifiers with these features. In this paper, we proposed to implement autoencoders such as a denoising autoencoder (DAE) and an adversarial autoencoder (AAE) to extract the features from LibriSpeech for model pre-training, and then conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets for classification. Considering the imbalance of data distribution in IEMOCAP, we developed a novel data augmentation approach to optimize the overlap shift between consecutive segments and redesigned the data division. The best classification accuracy reached 78.67% (weighted accuracy, WA) and 76.89% (unweighted accuracy, UA) with AAE. Compared with state-of-the-art results to our knowledge (76.18% of WA and 76.36% of UA with the supervised learning method), we achieved a slight advantage. This suggests that using unsupervised learning benefits the development of SER and provides a new approach to eliminate the problem of data scarcity.

Download Full-text

Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9053581 ◽

2020 ◽

Author(s):

Upasana Tiwari ◽

Meet Soni ◽

Rupayan Chakraborty ◽

Ashish Panda ◽

Sunil Kumar Kopparapu

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Noise Model ◽

Speech Emotion Recognition ◽

Noisy Conditions

Download Full-text