Statistical Evaluation of Speech Features for Emotion Recognition

Speech Emotion Recognition Framework based on User Self-referential Speech Features

2018 IEEE 7th Global Conference on Consumer Electronics (GCCE) ◽

10.1109/gcce.2018.8574676 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kyoungju Noh ◽

Seungeun Chung ◽

Jiyoun Lim ◽

Gague Kim ◽

Hyuntae Jeong

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Speech Features

Download Full-text

A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition

2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming ◽

10.1109/paap.2012.31 ◽

2012 ◽

Cited By ~ 11

Author(s):

Tsang-Long Pao ◽

Chun-Hsiang Wang ◽

Yu-Ji Li

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Speech Features

Download Full-text

Mobile-based Human Emotion Recognition based on Speech and Heart rate

Journal of Engineering ◽

10.31026/j.eng.2019.11.05 ◽

2019 ◽

Vol 25 (11) ◽

pp. 55-66

Author(s):

Huda Majed Swadi ◽

Hamid Mohammed Ali

Keyword(s):

Neural Network ◽

Heart Rate ◽

Emotion Recognition ◽

Speech Signal ◽

Rate Measurement ◽

Human Speech ◽

Human Emotion ◽

Server Side ◽

The Neural Network ◽

Speech Features

Mobile-based human emotion recognition is very challenging subject, most of the approaches suggested and built in this field utilized various contexts that can be derived from the external sensors and the smartphone, but these approaches suffer from different obstacles and challenges. The proposed system integrated human speech signal and heart rate, in one system, to leverage the accuracy of the human emotion recognition. The proposed system is designed to recognize four human emotions; angry, happy, sad and normal. In this system, the smartphone is used to record user speech and send it to a server. The smartwatch, fixed on user wrist, is used to measure user heart rate while the user is speaking and send it, via Bluetooth, to the smartphone which in turn sends it to the server. At the server side, the speech features are extracted from the speech signal to be classified by neural network. To minimize the misclassification of the neural network, the user heart rate measurement is used to direct the extracted speech features to either excited (angry and happy) neural network or to the calm (sad and normal) neural network. In spite of the challenges associated with the system, the system achieved 96.49% for known speakers and 79.05% for unknown speakers

Download Full-text

IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION

Journal of Information and Communication Technology ◽

10.32890/jict2015.14.0.8156 ◽

2015 ◽

Author(s):

Hasrul Mohd Nazid ◽

Hariharan Muthusamy ◽

Vikneswaran Vijean ◽

Sazali Yaacob

Keyword(s):

Emotion Recognition ◽

Principal Component ◽

Feature Reduction ◽

Speech Emotion Recognition ◽

Emotional Speech ◽

Two Stage ◽

Linear Discriminant ◽

Speaker Independent ◽

Speech Features ◽

And Gender

In the recent years, researchers are focusing to improve the accuracy of speech emotion recognition. Generally, high emotion recognition accuracies were obtained for two-class emotion recognition, but multi-class emotion recognition is still a challenging task . The main aim of this work is to propose a two-stage feature reduction using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for improving the accuracy of the speech emotion recognition (ER) system. Short-term speech features were extracted from the emotional speech signals. Experiments were carried out using four different supervised classifi ers with two different emotional speech databases. From the experimental results, it can be inferred that the proposed method provides better accuracies of 87.48% for speaker dependent (SD) and gender dependent (GD) ER experiment, 85.15% for speaker independent (SI) ER experiment, and 87.09% for gender independent (GI) experiment.

Download Full-text

Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

Applied Sciences ◽

10.3390/app11177967 ◽

2021 ◽

Vol 11 (17) ◽

pp. 7967

Author(s):

Sung-Woo Byun ◽

Ju-Hee Kim ◽

Seok-Pil Lee

Keyword(s):

Emotion Recognition ◽

Acoustic Feature ◽

Natural Interaction ◽

Text Data ◽

Feature Vectors ◽

Proposed Model ◽

Accurate Performance ◽

Speech Features ◽

Personal Assistants ◽

Deep Learning Model

Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.

Download Full-text

Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition

2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS) ◽

10.1109/icspcs.2014.7021120 ◽

2014 ◽

Cited By ~ 3

Author(s):

Melissa N Stolar ◽

Margaret Lech ◽

Ian S Burnett

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Deep Neural Network ◽

Graphical Representation ◽

Speech Features

Download Full-text

Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.14 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 1

Author(s):

Bagus Tris Atmaja ◽

Masato Akagi

Keyword(s):

Emotion Recognition ◽

Multitask Learning ◽

Speech Emotion Recognition ◽

Word Embeddings ◽

Concordance Correlation ◽

Acoustic Networks ◽

Overall Evaluation ◽

Speech Features ◽

Two Parameters ◽

Emotion Labels

Abstract The majority of research in speech emotion recognition (SER) is conducted to recognize emotion categories. Recognizing dimensional emotion attributes is also important, however, and it has several advantages over categorical emotion. For this research, we investigate dimensional SER using both speech features and word embeddings. The concatenation network joins acoustic networks and text networks from bimodal features. We demonstrate that those bimodal features, both are extracted from speech, improve the performance of dimensional SER over unimodal SER either using acoustic features or word embeddings. A significant improvement on the valence dimension is contributed by the addition of word embeddings to SER system, while arousal and dominance dimensions are also improved. We proposed a multitask learning (MTL) approach for the prediction of all emotional attributes. This MTL maximizes the concordance correlation between predicted emotion degrees and true emotion labels simultaneously. The findings suggest that the use of MTL with two parameters is better than other evaluated methods in representing the interrelation of emotional attributes. In unimodal results, speech features attain higher performance on arousal and dominance, while word embeddings are better for predicting valence. The overall evaluation uses the concordance correlation coefficient score of the three emotional attributes. We also discuss some differences between categorical and dimensional emotion results from psychological and engineering perspectives.

Download Full-text

Multiclass SVM-based language-independent emotion recognition using selective speech features

2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2014.6968337 ◽

2014 ◽

Cited By ~ 2

Author(s):

Kokane Amol T. ◽

Ram Mohana Reddy Guddeti

Keyword(s):

Emotion Recognition ◽

Speech Features ◽

Multiclass Svm

Download Full-text

Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder

Electronics ◽

10.3390/electronics10172086 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2086

Author(s):

Yangwei Ying ◽

Yuanwu Tu ◽

Hong Zhou

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Feature Learning ◽

Human Potential ◽

Speech Emotion Recognition ◽

Unsupervised Feature Learning ◽

Learning Techniques ◽

Speech Data ◽

Data Division ◽

Speech Features

Speech signals contain abundant information on personal emotions, which plays an important part in the representation of human potential characteristics and expressions. However, the deficiency of emotion speech data affects the development of speech emotion recognition (SER), which also limits the promotion of recognition accuracy. Currently, the most effective approach is to make use of unsupervised feature learning techniques to extract speech features from available speech data and generate emotion classifiers with these features. In this paper, we proposed to implement autoencoders such as a denoising autoencoder (DAE) and an adversarial autoencoder (AAE) to extract the features from LibriSpeech for model pre-training, and then conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets for classification. Considering the imbalance of data distribution in IEMOCAP, we developed a novel data augmentation approach to optimize the overlap shift between consecutive segments and redesigned the data division. The best classification accuracy reached 78.67% (weighted accuracy, WA) and 76.89% (unweighted accuracy, UA) with AAE. Compared with state-of-the-art results to our knowledge (76.18% of WA and 76.36% of UA with the supervised learning method), we achieved a slight advantage. This suggests that using unsupervised learning benefits the development of SER and provides a new approach to eliminate the problem of data scarcity.

Download Full-text

Neural Architecture Search of SPD Manifold Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/413 ◽

2021 ◽

Author(s):

Rhea Sanjay Sukthanker ◽

Zhiwu Huang ◽

Suryansh Kumar ◽

Erik Goron Endsjo ◽

Yan Wu ◽

...

Keyword(s):

Emotion Recognition ◽

Statistical Evaluation ◽

State Of The Art ◽

Search Space ◽

Positive Definite ◽

The State ◽

Cell Design ◽

Training Process ◽

Neural Architecture ◽

Symmetric Positive Definite

In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks, aiming to automate the design of SPD neural architectures. To address this problem, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem with a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the state-of-the-art SPD networks and traditional NAS algorithms. Empirical results show that our algorithm excels in discovering better performing SPD network design and provides models that are more than three times lighter than searched by the state-of-the-art NAS algorithms.

Download Full-text