Speech emotion recognition with unsupervised feature learning

Zheng-wei Huang; Wen-tao Xue; Qi-rong Mao

doi:10.1631/fitee.1400323

Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder

Electronics ◽

10.3390/electronics10172086 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2086

Author(s):

Yangwei Ying ◽

Yuanwu Tu ◽

Hong Zhou

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Feature Learning ◽

Human Potential ◽

Speech Emotion Recognition ◽

Unsupervised Feature Learning ◽

Learning Techniques ◽

Speech Data ◽

Data Division ◽

Speech Features

Speech signals contain abundant information on personal emotions, which plays an important part in the representation of human potential characteristics and expressions. However, the deficiency of emotion speech data affects the development of speech emotion recognition (SER), which also limits the promotion of recognition accuracy. Currently, the most effective approach is to make use of unsupervised feature learning techniques to extract speech features from available speech data and generate emotion classifiers with these features. In this paper, we proposed to implement autoencoders such as a denoising autoencoder (DAE) and an adversarial autoencoder (AAE) to extract the features from LibriSpeech for model pre-training, and then conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets for classification. Considering the imbalance of data distribution in IEMOCAP, we developed a novel data augmentation approach to optimize the overlap shift between consecutive segments and redesigned the data division. The best classification accuracy reached 78.67% (weighted accuracy, WA) and 76.89% (unweighted accuracy, UA) with AAE. Compared with state-of-the-art results to our knowledge (76.18% of WA and 76.36% of UA with the supervised learning method), we achieved a slight advantage. This suggests that using unsupervised learning benefits the development of SER and provides a new approach to eliminate the problem of data scarcity.

Download Full-text

Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-030-75765-6_10 ◽

2021 ◽

pp. 118-130

Author(s):

Huan Zhao ◽

Yingxue Gao ◽

Yufeng Xiao

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Local Feature ◽

Speech Emotion Recognition

Download Full-text

Discriminative Feature Learning for Speech Emotion Recognition

Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30490-4_17 ◽

2019 ◽

pp. 198-210

Author(s):

Yuying Zhang ◽

Yuexian Zou ◽

Junyi Peng ◽

Danqing Luo ◽

Dongyan Huang

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Speech Emotion Recognition ◽

Discriminative Feature

Download Full-text

Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.784514 ◽

2021 ◽

Vol 15 ◽

Author(s):

Shiqing Zhang ◽

Ruixin Liu ◽

Xin Tao ◽

Xiaoming Zhao

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Feature Learning ◽

Learning Ability ◽

Speech Emotion Recognition ◽

Practical Applications ◽

Learning Techniques ◽

Challenges And Opportunities ◽

Comprehensive Survey ◽

Cross Language

Automatic speech emotion recognition (SER) is a challenging component of human-computer interaction (HCI). Existing literatures mainly focus on evaluating the SER performance by means of training and testing on a single corpus with a single language setting. However, in many practical applications, there are great differences between the training corpus and testing corpus. Due to the diversity of different speech emotional corpus or languages, most previous SER methods do not perform well when applied in real-world cross-corpus or cross-language scenarios. Inspired by the powerful feature learning ability of recently-emerged deep learning techniques, various advanced deep learning models have increasingly been adopted for cross-corpus SER. This paper aims to provide an up-to-date and comprehensive survey of cross-corpus SER, especially for various deep learning techniques associated with supervised, unsupervised and semi-supervised learning in this area. In addition, this paper also highlights different challenges and opportunities on cross-corpus SER tasks, and points out its future trends.

Download Full-text

Deep Residual Local Feature Learning for Speech Emotion Recognition

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-63830-6_21 ◽

2020 ◽

pp. 241-252

Author(s):

Sattaya Singkul ◽

Thakorn Chatchaisathaporn ◽

Boontawee Suntisrivaraporn ◽

Kuntpong Woraratpanya

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Local Feature ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition with Discriminative Feature Learning

10.21437/interspeech.2020-2237 ◽

2020 ◽

Author(s):

Huan Zhou ◽

Kai Liu

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Speech Emotion Recognition ◽

Discriminative Feature

Download Full-text

Unsupervised Feature Learning for EEG-based Emotion Recognition

2017 International Conference on Cyberworlds (CW) ◽

10.1109/cw.2017.19 ◽

2017 ◽

Author(s):

Zirui Lan ◽

Olga Sourina ◽

Lipo Wang ◽

Reinhold Scherer ◽

Gernot Muller-Putz

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Unsupervised Feature Learning

Download Full-text

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Communications in Computer and Information Science - Pattern Recognition ◽

10.1007/978-981-10-3005-5_53 ◽

2016 ◽

pp. 645-651 ◽

Cited By ~ 2

Author(s):

Shiqing Zhang ◽

Xiaoming Zhao ◽

Yuelong Chuang ◽

Wenping Guo ◽

Ying Chen

Keyword(s):

Emotion Recognition ◽

Feature Learning ◽

Deep Belief Network ◽

Speech Emotion Recognition ◽

Belief Network

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text

Classifier fusion for speech emotion recognition based on improved queuing voting algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00381 ◽

2009 ◽

Vol 29 (2) ◽

pp. 381-385 ◽

Cited By ~ 1

Author(s):

Li-qin FU ◽

Xia MAO ◽

Li-jiang CHEN

Keyword(s):

Emotion Recognition ◽

Classifier Fusion ◽

Speech Emotion Recognition

Download Full-text