scholarly journals A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms

2021 ◽  
Vol 11 (4) ◽  
pp. 1890
Author(s):  
Sung-Woo Byun ◽  
Seok-Pil Lee

The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5212 ◽  
Author(s):  
Tursunov Anvarjon ◽  
Mustaqeem ◽  
Soonil Kwon

Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.


2021 ◽  
Vol 193 (12) ◽  
Author(s):  
Salar Valizadeh Moghadam ◽  
Ahmad Sharafati ◽  
Hajar Feizi ◽  
Seyed Mohammad Saeid Marjaie ◽  
Seyed Babak Haji Seyed Asadollah ◽  
...  

Author(s):  
C. Fernando Mugarra Gonzalez ◽  
Stanisław Jankowski ◽  
Jacek J. Dusza ◽  
Vicente Carrilero López ◽  
Javier M. Duart Clemente

Sign in / Sign up

Export Citation Format

Share Document