A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Gintautas Tamulevičius; Gražina Korvel; Anil Bora Yayak; Povilas Treigys; Jolita Bernatavičienė; Bożena Kostek

doi:10.3390/electronics9101725

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Electronics ◽

10.3390/electronics9101725 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1725

Author(s):

Gintautas Tamulevičius ◽

Gražina Korvel ◽

Anil Bora Yayak ◽

Povilas Treigys ◽

Jolita Bernatavičienė ◽

...

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Audio Signal ◽

Speech Emotion Recognition ◽

Two Dimensional ◽

Acoustic Feature ◽

Emotional Information ◽

Feature Spaces ◽

Speaker Independent ◽

Acoustic Representation

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation only. The assumption is that the speech audio signal carries sufficient emotional information to detect and retrieve it. Several two-dimensional acoustic feature spaces, such as cochleagrams, spectrograms, mel-cepstrograms, and fractal dimension-based space, are employed as the representations of speech emotional features. A convolutional neural network (CNN) is used as a classifier. The results show the superiority of cochleagrams over other feature spaces utilized. In the CNN-based speaker-independent cross-linguistic speech emotion recognition (SER) experiment, the accuracy of over 90% is achieved, which is close to the monolingual case of SER.

Download Full-text

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network

Sensors ◽

10.3390/s20216008 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6008 ◽

Cited By ~ 1

Author(s):

Misbah Farooq ◽

Fawad Hussain ◽

Naveed Khan Baloch ◽

Fawad Riasat Raja ◽

Heejung Yu ◽

...

Keyword(s):

Neural Network ◽

Feature Selection ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional Speech ◽

Human Machine Interaction ◽

Speaker Independent

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

Download Full-text

Improving speech emotion recognition based on acoustic words emotion dictionary

Natural Language Engineering ◽

10.1017/s1351324920000339 ◽

2020 ◽

pp. 1-15

Author(s):

Wang Wei ◽

Xinyi Cao ◽

He Li ◽

Lingjie Shen ◽

Yaqin Feng ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotion Classification ◽

Acoustic Features ◽

Emotional Information ◽

Average Recall

Abstract To improve speech emotion recognition, a U-acoustic words emotion dictionary (AWED) features model is proposed based on an AWED. The method models emotional information from acoustic words level in different emotion classes. The top-list words in each emotion are selected to generate the AWED vector. Then, the U-AWED model is constructed by combining utterance-level acoustic features with the AWED features. Support vector machine and convolutional neural network are employed as the classifiers in our experiment. The results show that our proposed method in four tasks of emotion classification all provides significant improvement in unweighted average recall.

Download Full-text

Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition

10.21437/interspeech.2017-562 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ya-Tse Wu ◽

Hsuan-Yu Chen ◽

Yu-Hsien Liao ◽

Li-Wei Kuo ◽

Chi-Chun Lee

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Neural Responses

Download Full-text

Robust Speech Emotion Recognition for Sindhi Language based on Deep Convolutional Neural Network

2021 International Conference on Communications, Information System and Computer Engineering (CISCE) ◽

10.1109/cisce52179.2021.9445883 ◽

2021 ◽

Author(s):

Muddasar Laghari ◽

Muhammad Junaid Tahir ◽

Abdullah Azeem ◽

Waqar Riaz ◽

Yi Zhou

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition

Download Full-text

Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

Multimedia Tools and Applications ◽

10.1007/s11042-020-10329-2 ◽

2021 ◽

Author(s):

Manju D. Pawar ◽

Rajendra D. Kokate

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Convolution Neural Network ◽

Speech Emotion Recognition

Download Full-text

A DFC taxonomy of Speech emotion recognition based on convolutional neural network from speech signal

2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA) ◽

10.1109/citisia50690.2020.9371841 ◽

2020 ◽

Author(s):

Surendra Malla ◽

Abeer Alsadoon ◽

Simi Kamini Bajaj

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Signal ◽

Speech Emotion Recognition

Download Full-text

Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04212-7_6 ◽

2018 ◽

pp. 62-71 ◽

Cited By ~ 1

Author(s):

Linjuan Zhang ◽

Longbiao Wang ◽

Jianwu Dang ◽

Lili Guo ◽

Haotian Guan

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Perceptual Features

Download Full-text

Attention gated tensor neural network architectures for speech emotion recognition

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.103173 ◽

2022 ◽

Vol 71 ◽

pp. 103173

Author(s):

Sandeep Kumar Pandey ◽

Hanumant Singh Shekhawat ◽

S.R.M Prasanna

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Network Architectures ◽

Neural Network Architectures

Download Full-text

Emotion Recognition From Speech Using Perceptual Filter and Neural Network

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch004 ◽

2020 ◽

pp. 78-91 ◽

Cited By ~ 2

Author(s):

Revathi A. ◽

Sasikaladevi N.

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Vector Quantization ◽

Group Performance ◽

Back Propagation ◽

Critical Band ◽

Emotion Classification ◽

Back Propagation Algorithm ◽

Propagation Algorithm ◽

Speaker Independent

This chapter on multi speaker independent emotion recognition encompasses the use of perceptual features with filters spaced in Equivalent rectangular bandwidth (ERB) and BARK scale and vector quantization (VQ) classifier for classifying groups and artificial neural network with back propagation algorithm for emotion classification in a group. Performance can be improved by using the large amount of data in a pertinent emotion to adequately train the system. With the limited set of data, this proposed system has provided consistently better accuracy for the perceptual feature with critical band analysis done in ERB scale.

Download Full-text