scholarly journals Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

2021 ◽  
Vol 11 (17) ◽  
pp. 7967
Author(s):  
Sung-Woo Byun ◽  
Ju-Hee Kim ◽  
Seok-Pil Lee

Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.

Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1307
Author(s):  
Haoriqin Wang ◽  
Huaji Zhu ◽  
Huarui Wu ◽  
Xiaomin Wang ◽  
Xiao Han ◽  
...  

In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Document Frequency) method to process and analyze the text data and compare it with the Word2vec, GloVe, and TF-IDF methods. Combined with the agricultural word segmentation dictionary, we applied Word2vec with the TF-IDF method, effectively solving the problem of high dimension and sparse data in the rice-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results show that rice-related question similarity matching based on Coattention-DenseGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the rice-related question dataset. The precision and F1 values of the proposed model were 96.3% and 96.9%, respectively. Compared with seven other kinds of question similarity matching models, we present a new state-of-the-art method with our rice-related question dataset.


Author(s):  
Miao Cheng ◽  
Ah Chung Tsoi

As a general means of expression, audio analysis and recognition have attracted much attention for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand the emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human–machine interfaces. Though there have been several the-state-of-the-arts auditory methods devised to audio recognition, most of them focus on discriminative usage of acoustic features, while feedback efficiency of recognition demands is ignored. This makes possible application of AER, and rapid learning of emotion patterns is desired. In order to make predication of audio emotion possible, the speaker-dependent patterns of audio emotions are learned with multiresolution analysis, and fractal dimension (FD) features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn the intrinsic characteristics of auditory emotions, while the utterance features are learned from FDs of each sub-band. Experimental results show the proposed method is able to provide comparative performance for AER.


Author(s):  
Ajeet Ram Pathak ◽  
Somesh Bhalsing ◽  
Shivani Desai ◽  
Monica Gandhi ◽  
Pranathi Patwardhan

2021 ◽  
Author(s):  
Harisu Abdullahi Shehu ◽  
William Browne ◽  
Hedwig Eisenbarth

Emotion recognition has become an increasingly important area of research due to the increasing number of CCTV cameras in the past few years. Deep network-based methods have made impressive progress in performing emotion recognition-based tasks, achieving high performance on many datasets and their related competitions such as the ImageNet challenge. However, deep networks are vulnerable to adversarial attacks. Due to their homogeneous representation of knowledge across all images, a small change to the input image made by an adversary might result in a large decrease in the accuracy of the algorithm. By detecting heterogeneous facial landmarks using the machine learning library Dlib we hypothesize we can build robustness to adversarial attacks. The residual neural network (ResNet) model has been used as an example of a deep learning model. While the accuracy achieved by ResNet showed a decrease of up to 22%, our proposed approach has shown strong resistance to an attack and showed only a little (< 0.3%) or no decrease when the attack is launched on the data. Furthermore, the proposed approach has shown considerably less execution time compared to the ResNet model.


Recently Plant phenotyping has gained the attention of many researchers such that it plays a vital role in the context of enhancing agricultural productivity. Indian economy highly depends on agriculture and this factor elevates the importance of early disease detection of the crops within the agricultural fields. Addressing this problem several researchers have proposed Computer Vision and Pattern recognition based mechanisms through which they have attempted to identify the infected crops in the early stages.in this scenario, CNN convolution neural network-based architecture has demonstrated exceptional performance when compared with state-of-art mechanisms. This paper introduces an enhanced RCNN recurrent convolution neural network-based architecture that enhances the prediction accuracy while detecting the crop diseases in early stages. Based on the simulative studies is observed that the proposed model outperforms when compared with CNN and other state-of-art mechanisms.


2021 ◽  
pp. 1-12
Author(s):  
Irfan Javid ◽  
Ahmed Khalaf Zager Alsaedi ◽  
Rozaida Binti Ghazali ◽  
Yana Mazwin ◽  
Muhammad Zulqarnain

In previous studies, various machine-driven decision support systems based on recurrent neural networks (RNN) were ordinarily projected for the detection of cardiovascular disease. However, the majority of these approaches are restricted to feature preprocessing. In this paper, we concentrate on both, including, feature refinement and the removal of the predictive model’s problems, e.g., underfitting and overfitting. By evading overfitting and underfitting, the model will demonstrate good enactment on equally the training and testing datasets. Overfitting the training data is often triggered by inadequate network configuration and inappropriate features. We advocate using Chi2 statistical model to remove irrelevant features when searching for the best-configured gated recurrent unit (GRU) using an exhaustive search strategy. The suggested hybrid technique, called Chi2 GRU, is tested against traditional ANN and GRU models, as well as different progressive machine learning models and antecedently revealed strategies for cardiopathy prediction. The prediction accuracy of proposed model is 92.17% . In contrast to formerly stated approaches, the obtained outcomes are promising. The study’s results indicate that medical practitioner will use the proposed diagnostic method to reliably predict heart disease.


2021 ◽  
Vol 19 (02) ◽  
pp. 2150006
Author(s):  
Fatemeh Nazem ◽  
Fahimeh Ghasemi ◽  
Afshin Fassihi ◽  
Alireza Mehri Dehnavi

Binding site prediction for new proteins is important in structure-based drug design. The identified binding sites may be helpful in the development of treatments for new viral outbreaks in the world when there is no information available about their pockets with COVID-19 being a case in point. Identification of the pockets using computational methods, as an alternative method, has recently attracted much interest. In this study, the binding site prediction is viewed as a semantic segmentation problem. An improved 3D version of the U-Net model based on the dice loss function is utilized to predict the binding sites accurately. The performance of the proposed model on the independent test datasets and SARS-COV-2 shows the segmentation model could predict the binding sites with a more accurate shape than the recently published deep learning model, i.e. DeepSite. Therefore, the model may help predict the binding sites of proteins and could be used in drug design for novel proteins.


Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 145 ◽  
Author(s):  
Zhenglong Xiang ◽  
Xialei Dong ◽  
Yuanxiang Li ◽  
Fei Yu ◽  
Xing Xu ◽  
...  

Most of the existing research papers study the emotion recognition of Minnan songs from the perspectives of music analysis theory and music appreciation. However, these investigations do not explore any possibility of carrying out an automatic emotion recognition of Minnan songs. In this paper, we propose a model that consists of four main modules to classify the emotion of Minnan songs by using the bimodal data—song lyrics and audio. In the proposed model, an attention-based Long Short-Term Memory (LSTM) neural network is applied to extract lyrical features, and a Convolutional Neural Network (CNN) is used to extract the audio features from the spectrum. Then, two kinds of extracted features are concatenated by multimodal compact bilinear pooling, and finally, the concatenated features are input to the classifying module to determine the song emotion. We designed three experiment groups to investigate the classifying performance of combinations of the four main parts, the comparisons of proposed model with the current approaches and the influence of a few key parameters on the performance of emotion recognition. The results show that the proposed model exhibits better performance over all other experimental groups. The accuracy, precision and recall of the proposed model exceed 0.80 in a combination of appropriate parameters.


Sign in / Sign up

Export Citation Format

Share Document