Random Deep Belief Networks for Recognizing Emotions from Speech Signals

Computational Intelligence and Neuroscience ◽

10.1155/2017/1945630 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 19

Author(s):

Guihua Wen ◽

Huihui Li ◽

Jubing Huang ◽

Danyang Li ◽

Eryang Xun

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Majority Voting ◽

Speech Emotion Recognition ◽

Speech Signals ◽

Belief Networks ◽

Deep Belief Networks ◽

Emotion Label ◽

The Rich ◽

Emotion Labels

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

Download Full-text

Feature Fusion Methods for Robust Speech Emotion Recognition Based on Deep Belief Networks

Proceedings of the Fifth International Conference on Network, Communication and Computing - ICNCC '16 ◽

10.1145/3033288.3033295 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ao Wu ◽

Yongming Huang ◽

Guobao Zhang

Keyword(s):

Emotion Recognition ◽

Feature Fusion ◽

Speech Emotion Recognition ◽

Belief Networks ◽

Deep Belief Networks ◽

Fusion Methods

Download Full-text

Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks

Materials Today Proceedings ◽

10.1016/j.matpr.2021.12.246 ◽

2021 ◽

Author(s):

N. Senthilkumar ◽

S. Karpakam ◽

M. Gayathri Devi ◽

R. Balakumaresan ◽

P. Dhilipkumar

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Belief Networks ◽

Deep Belief Networks

Download Full-text

Speech Emotion Recognition Based on Deep Belief Networks and Wavelet Packet Cepstral Coefficients

International Journal of Simulation Systems Science & Technology ◽

10.5013/ijssst.a.17.28.28 ◽

2016 ◽

Author(s):

Special Issues Editor

Keyword(s):

Emotion Recognition ◽

Wavelet Packet ◽

Speech Emotion Recognition ◽

Belief Networks ◽

Deep Belief Networks ◽

Cepstral Coefficients

Download Full-text

Development of speech emotion recognition system using deep belief networks in malayalam language

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2017.8125919 ◽

2017 ◽

Cited By ~ 1

Author(s):

Athira Chandran ◽

D. Pravena ◽

D. Govind

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Speech Emotion Recognition ◽

Belief Networks ◽

Deep Belief Networks ◽

Malayalam Language

Download Full-text

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-017-0644-8 ◽

2017 ◽

Vol 10 (5) ◽

pp. 1787-1798 ◽

Cited By ~ 19

Author(s):

Yongming Huang ◽

Kexin Tian ◽

Ao Wu ◽

Guobao Zhang

Keyword(s):

Emotion Recognition ◽

Feature Fusion ◽

Speech Emotion Recognition ◽

Belief Networks ◽

Deep Belief Networks ◽

Noise Condition ◽

Fusion Methods

Download Full-text

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Sensors ◽

10.3390/s21051579 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1579 ◽

Cited By ~ 1

Author(s):

Kyoung Ju Noh ◽

Chi Yoon Jeong ◽

Jiyoun Lim ◽

Seungeun Chung ◽

Gague Kim ◽

...

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Domain Adaptation ◽

Classification Model ◽

Speech Emotion Recognition ◽

Target Domain ◽

Model Generalization ◽

Speech Database ◽

Emotion Labels ◽

Temporal Feature

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

Download Full-text

Speech Emotional Features Extraction Based on Electroglottograph

Neural Computation ◽

10.1162/neco_a_00523 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3294-3317 ◽

Cited By ~ 7

Author(s):

Lijiang Chen ◽

Xia Mao ◽

Pengfei Wei ◽

Angelo Compare

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Vocal Tract ◽

Vocal Folds ◽

Distribution Coefficients ◽

Speech Emotion Recognition ◽

Support Vector ◽

Power Law Distribution ◽

Transform Coefficients ◽

Better Than

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

Download Full-text