scholarly journals A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition

2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Zou Cairong ◽  
Zhang Xinran ◽  
Zha Cheng ◽  
Zhao Li

The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN) in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram) as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.

2020 ◽  
Vol 17 (8) ◽  
pp. 3786-3789
Author(s):  
P. Gayathri ◽  
P. Gowri Priya ◽  
L. Sravani ◽  
Sandra Johnson ◽  
Visanth Sampath

Recognition of emotions is the aspect of speech recognition that is gaining more attention and the need for it is growing enormously. Although there are methods to identify emotion using machine learning techniques, we assume in this paper that calculating deltas and delta-deltas for customized features not only preserves effective emotional information, but also that the impact of irrelevant emotional factors, leading to a reduction in misclassification. Furthermore, Speech Emotion Recognition (SER) often suffers from the silent frames and irrelevant emotional frames. Meanwhile, the process of attention has demonstrated exceptional performance in learning related feature representations for specific tasks. Inspired by this, propose a Convolutionary Recurrent Neural Networks (ACRNN) based on Attention to learn discriminative features for SER, where the Mel-spectrogram with deltas and delta-deltas is used as input. Finally, experimental results show the feasibility of the proposed method and attain state-of-the-art performance in terms of unweighted average recall.


2014 ◽  
Vol 543-547 ◽  
pp. 2192-2195 ◽  
Author(s):  
Chen Chen Huang ◽  
Wei Gong ◽  
Wen Long Fu ◽  
Dong Yu Feng

As the most important medium of communication in human beings life, speech carries abundant emotional information. In recent years, how to recognize the speakers emotional state automatically from the speech is attracting extensive attention of researchers in various fields. In this paper, we studied the method of speech emotion recognition. We collected a total of 360 sentences from four speakers with the emotional statement about happiness, anger, surprise, sadness, and extracted eight emotional characteristics from these voice data. Contribution analysis method is proposed to determine the value of emotion characteristic parameters. We also have used the weighted Euclidean distance template matching to identify the speech emotion, got more than 80% of the average emotional recognition rate.


Sign in / Sign up

Export Citation Format

Share Document