Speech emotion recognition -- A framework for measuring for emotional states from voice characteristics

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

International Journal of Synthetic Emotions ◽

10.4018/ijse.2016010105 ◽

2016 ◽

Vol 7 (1) ◽

pp. 58-68 ◽

Cited By ~ 4

Author(s):

Imen Trabelsi ◽

Med Salim Bouhlel

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Recognition Rate ◽

Gaussian Mixture ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional States ◽

Wide Range ◽

Leibler Divergence ◽

Perceptual Linear Prediction

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Download Full-text

Call Redistribution for a Call Center Based on Speech Emotion Recognition

Applied Sciences ◽

10.3390/app10134653 ◽

2020 ◽

Vol 10 (13) ◽

pp. 4653 ◽

Cited By ~ 2

Author(s):

Milana Bojanić ◽

Vlado Delić ◽

Alexey Karpov

Keyword(s):

Health Care ◽

Emotion Recognition ◽

Elderly People ◽

Waiting Time ◽

Call Center ◽

Call Centers ◽

Speech Emotion Recognition ◽

Emotional States ◽

Short Period ◽

Care Support

Call center operators communicate with callers in different emotional states (anger, anxiety, fear, stress, joy, etc.). Sometimes a number of calls coming in a short period of time have to be answered and processed. In the moments when all call center operators are busy, the system puts that call on hold, regardless of its urgency. This research aims to improve the functionality of call centers by recognition of call urgency and redistribution of calls in a queue. It could be beneficial for call centers giving health care support for elderly people and emergency call centers. The proposed recognition of call urgency and consequent call ranking and redistribution is based on emotion recognition in speech, giving greater priority to calls featuring emotions such as fear, anger and sadness, and less priority to calls featuring neutral speech and happiness. Experimental results, obtained in a simulated call center, show a significant reduction in waiting time for calls estimated as more urgent, especially the calls featuring the emotions of fear and anger.

Download Full-text

Speech emotion recognition - a framework for measuring for emotional states from voice characteristics

PsycEXTRA Dataset ◽

10.1037/e578442014-088 ◽

2011 ◽

Author(s):

Jarek Krajewski

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Emotional States

Download Full-text

Emotion Recognition of Manipuri Speech using Convolution Neural Network.

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9896.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 2364-2366

Keyword(s):

Emotion Recognition ◽

Consumer Electronics ◽

Speech Emotion Recognition ◽

Emotional States ◽

Human Machine Interaction ◽

Interaction Field ◽

Testing Dataset ◽

Voice Interaction ◽

Machine Communication ◽

Mel Frequency Cepstral Coefficient

over the recent years much advancement are made in terms of artificial intelligence, machine learning, human-machine interaction etc. Voice interaction with the machine or giving command to it to perform a specific task is increasingly popular. Many consumer electronics are integrated with SIRI, Alexa, cortana, Google assist etc. But machines have limitation that they cannot interact with a person like a human conversational partner. It cannot recognize Human Emotion and react to them. Emotion Recognition from speech is a cutting edge research topic in the Human machines Interaction field. There is a demand to design a more rugged man-machine communication system, as machines are indispensable to our lives. Many researchers are working currently on speech emotion recognition(SER) to improve the man machines interaction. To achieve this goal, a computer should be able to recognize emotional states and react to them in the same way as we humans do. The effectiveness of the speech emotion recognition(SER) system depends on quality of extracted features and the type of classifiers used . In this paper we tried to identify four basic emotions: anger, sadness, neutral, happiness from speech. Here we used audio file of short Manipuri speech taken from movies as training and testing dataset . This paper use CNN to identify four different emotions using MFCC (Mel Frequency Cepstral Coefficient )as features extraction technique from speech.

Download Full-text

Creation of speech corpus for emotion analysis in Gujarati language and its evaluation by various speech parameters

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i5.pp4752-4758 ◽

2020 ◽

Vol 10 (5) ◽

pp. 4752

Author(s):

Vishal P. Tank ◽

S. K. Hadia

Keyword(s):

Artificial Intelligence ◽

Facial Expression ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Emotional States ◽

Emotional Speech ◽

Speech Corpus ◽

Speech Database ◽

Machine Communication ◽

Gujarati Language

In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official’s language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.

Download Full-text

Speech Emotion Recognition: Framework for Measuring Emotional States From Voice Characteristics

PsycEXTRA Dataset ◽

10.1037/e705042011-001 ◽

2011 ◽

Author(s):

Jarek Krajewski ◽

Sebastian Schnieder ◽

Tom Laufenberg

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Emotional States

Download Full-text

Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks

Soft Computing ◽

10.1007/s00500-020-05501-7 ◽

2021 ◽

Vol 25 (3) ◽

pp. 1717-1730

Author(s):

Esma Mansouri-Benssassi ◽

Juan Ye

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Affective Computing ◽

Third Party ◽

Speech Emotion Recognition ◽

Expression Recognition ◽

Emotional States ◽

Learning Approaches ◽

Feature Representations ◽

The Face

AbstractEmotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects.

Download Full-text

Study of Speech Emotion Recognition Based on Prosodic Parameters and Facial Expression Features

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.1677 ◽

2012 ◽

Vol 241-244 ◽

pp. 1677-1681

Author(s):

Yu Tai Wang ◽

Jie Han ◽

Xiao Qing Jiang ◽

Jing Zou ◽

Hui Zhao

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Recognition Rate ◽

Single Mode ◽

Gaussian Mixture ◽

Speech Emotion Recognition ◽

Emotional States ◽

Prosodic Features ◽

Single Model ◽

Model Recognition

The present status of speech emotion recognition was introduced in the paper. The emotional databases of Chinese speech and facial expressions were established with the noise stimulus and movies evoking subjects' emtion. For different emotional states, we analyzed the single-mode speech emotion recognitions based the prosodic features and the geometric features of facial expression. Then, we discussed the bimodal emotion recognition by the use of Gaussian Mixture Model. The experimental results show that, the bimodal emotion recognition rate combined with facial expression is about 6% higher than the single model recognition rate merely using prosodic features.

Download Full-text

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch015 ◽

2020 ◽

pp. 283-293

Author(s):

Imen Trabelsi ◽

Med Salim Bouhlel

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Recognition Rate ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional States ◽

Perceptual Linear Prediction

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text