Rapid Assessment of Non-Verbal Auditory Perception in Normal-Hearing Participants and Cochlear Implant Users

Agathe Pralus; Ruben Hermann; Fanny Cholvy; Pierre-Emmanuel Aguera; Annie Moulin; Pascal Barone; Nicolas Grimault; Eric Truy; Barbara Tillmann; Anne Caclin

doi:10.3390/jcm10102093

Rapid Assessment of Non-Verbal Auditory Perception in Normal-Hearing Participants and Cochlear Implant Users

Journal of Clinical Medicine ◽

10.3390/jcm10102093 ◽

2021 ◽

Vol 10 (10) ◽

pp. 2093

Author(s):

Agathe Pralus ◽

Ruben Hermann ◽

Fanny Cholvy ◽

Pierre-Emmanuel Aguera ◽

Annie Moulin ◽

...

Keyword(s):

Change Detection ◽

Emotion Recognition ◽

Auditory Perception ◽

Visual Cues ◽

Short Term Memory ◽

Rapid Assessment ◽

Normal Hearing ◽

Stream Segregation ◽

Pitch Change ◽

Small Pitch

In the case of hearing loss, cochlear implants (CI) allow for the restoration of hearing. Despite the advantages of CIs for speech perception, CI users still complain about their poor perception of their auditory environment. Aiming to assess non-verbal auditory perception in CI users, we developed five listening tests. These tests measure pitch change detection, pitch direction identification, pitch short-term memory, auditory stream segregation, and emotional prosody recognition, along with perceived intensity ratings. In order to test the potential benefit of visual cues for pitch processing, the three pitch tests included half of the trials with visual indications to perform the task. We tested 10 normal-hearing (NH) participants with material being presented as original and vocoded sounds, and 10 post-lingually deaf CI users. With the vocoded sounds, the NH participants had reduced scores for the detection of small pitch differences, and reduced emotion recognition and streaming abilities compared to the original sounds. Similarly, the CI users had deficits for small differences in the pitch change detection task and emotion recognition, as well as a decreased streaming capacity. Overall, this assessment allows for the rapid detection of specific patterns of non-verbal auditory perception deficits. The current findings also open new perspectives about how to enhance pitch perception capacities using visual cues.

Download Full-text

Supplemental Material for Pitch-Change Detection and Pitch-Direction Discrimination in Children

Psychomusicology Music Mind and Brain ◽

10.1037/a0033301.supp ◽

2013 ◽

Keyword(s):

Change Detection ◽

Pitch Change ◽

Direction Discrimination ◽

Pitch Direction

Download Full-text

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Sensors ◽

10.3390/s21051579 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1579 ◽

Cited By ~ 1

Author(s):

Kyoung Ju Noh ◽

Chi Yoon Jeong ◽

Jiyoun Lim ◽

Seungeun Chung ◽

Gague Kim ◽

...

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Domain Adaptation ◽

Classification Model ◽

Speech Emotion Recognition ◽

Target Domain ◽

Model Generalization ◽

Speech Database ◽

Emotion Labels ◽

Temporal Feature

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

Download Full-text

A EEG-based emotion recognition model with rhythm and time characteristics

Brain Informatics ◽

10.1186/s40708-019-0100-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 7

Author(s):

Jianzhuo Yan ◽

Shangbin Chen ◽

Sinuo Deng

Keyword(s):

Emotion Recognition ◽

Time Scales ◽

Time Scale ◽

Short Term Memory ◽

Recognition Model ◽

Memory Network ◽

Long Short Term Memory ◽

Valence And Arousal ◽

Memory Characteristics

Abstract As an advanced function of the human brain, emotion has a significant influence on human studies, works, and other aspects of life. Artificial Intelligence has played an important role in recognizing human emotion correctly. EEG-based emotion recognition (ER), one application of Brain Computer Interface (BCI), is becoming more popular in recent years. However, due to the ambiguity of human emotions and the complexity of EEG signals, the EEG-ER system which can recognize emotions with high accuracy is not easy to achieve. Based on the time scale, this paper chooses the recurrent neural network as the breakthrough point of the screening model. According to the rhythmic characteristics and temporal memory characteristics of EEG, this research proposes a Rhythmic Time EEG Emotion Recognition Model (RT-ERM) based on the valence and arousal of Long–Short-Term Memory Network (LSTM). By applying this model, the classification results of different rhythms and time scales are different. The optimal rhythm and time scale of the RT-ERM model are obtained through the results of the classification accuracy of different rhythms and different time scales. Then, the classification of emotional EEG is carried out by the best time scales corresponding to different rhythms. Finally, by comparing with other existing emotional EEG classification methods, it is found that the rhythm and time scale of the model can contribute to the accuracy of RT-ERM.

Download Full-text

Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks

Remote Sensing ◽

10.3390/rs10111827 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1827 ◽

Cited By ~ 24

Author(s):

Ahram Song ◽

Jaewan Choi ◽

Youkyung Han ◽

Yongil Kim

Keyword(s):

Deep Learning ◽

Change Detection ◽

Spatial Information ◽

Short Term Memory ◽

Hyperspectral Images ◽

Convolutional Network ◽

Ground Truth Data ◽

Fully Convolutional Network ◽

Training Samples ◽

Multi Temporal

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.

Download Full-text

Effect of sinusoidally amplitude modulated broadband noise stimuli on stream segregation in individuals with sensorineural hearing loss

Auditory and Vestibular Research ◽

10.18502/avr.v29i4.4640 ◽

2020 ◽

Author(s):

Jawahar Antony P ◽

Animesh Barman

Keyword(s):

Hearing Loss ◽

Sensorineural Hearing Loss ◽

Amplitude Modulation ◽

Modulation Frequency ◽

Study Groups ◽

Normal Hearing ◽

Broadband Noise ◽

Sensorineural Hearing ◽

Stream Segregation ◽

Temporal Cues

Background and Aim: Auditory stream segregation is a phenomenon that splits sounds into different streams. The temporal cues that contribute for stream segregation have been previously studied in normal hearing people. In people with sensorineural hearing loss (SNHL), the cues for temporal envelope coding is not usually affected, while the temporal fine structure cues are affected. These two temporal cues depend on the amplitude modulation frequency. The present study aimed to evaluate the effect of sinusoidal amplitude modulated (SAM) broadband noises on stream segregation in individuals with SNHL. Methods: Thirty normal hearing subjects and 30 subjects with mild to moderate bilateral SNHL participated in the study. Two experiments were performed; in the first experiment, the AB sequence of broadband SAM stimuli was presented, while in the second experiment, only B sequence was presented. A low (16 Hz) and a high (256 kHz) standard modulation frequency were used in these experiments. The subjects were asked to find the irregularities in the rhythmic sequence. Results: Both the study groups could identify the irregularities similarly in both the experiments. The minimum cumulative delay was slightly higher in the SNHL group. Conclusion: It is suggested that the temporal cues provided by the broadband SAM noises for low and high standard modulation frequencies were not used for stream segregation by either normal hearing subjects or those with SNHL. Keywords: Stream segregation; sinusoidal amplitude modulation; sensorineural hearing loss

Download Full-text

Do change detection measures underestimate the capacity of visual short-term memory?

Journal of Vision ◽

10.1167/7.9.663 ◽

2010 ◽

Vol 7 (9) ◽

pp. 663-663 ◽

Cited By ~ 2

Author(s):

J.-F. Delvenne ◽

A. Cleeremans ◽

C. Laloyaux

Keyword(s):

Change Detection ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Visual Short Term Memory

Download Full-text

Bimodal Emotion Recognition Model for Minnan Songs

Information ◽

10.3390/info11030145 ◽

2020 ◽

Vol 11 (3) ◽

pp. 145 ◽

Cited By ~ 1

Author(s):

Zhenglong Xiang ◽

Xialei Dong ◽

Yuanxiang Li ◽

Fei Yu ◽

Xing Xu ◽

...

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Short Term Memory ◽

Music Appreciation ◽

Research Papers ◽

Audio Features ◽

Analysis Theory ◽

Proposed Model ◽

Song Lyrics ◽

Long Short Term Memory

Most of the existing research papers study the emotion recognition of Minnan songs from the perspectives of music analysis theory and music appreciation. However, these investigations do not explore any possibility of carrying out an automatic emotion recognition of Minnan songs. In this paper, we propose a model that consists of four main modules to classify the emotion of Minnan songs by using the bimodal data—song lyrics and audio. In the proposed model, an attention-based Long Short-Term Memory (LSTM) neural network is applied to extract lyrical features, and a Convolutional Neural Network (CNN) is used to extract the audio features from the spectrum. Then, two kinds of extracted features are concatenated by multimodal compact bilinear pooling, and finally, the concatenated features are input to the classifying module to determine the song emotion. We designed three experiment groups to investigate the classifying performance of combinations of the four main parts, the comparisons of proposed model with the current approaches and the influence of a few key parameters on the performance of emotion recognition. The results show that the proposed model exhibits better performance over all other experimental groups. The accuracy, precision and recall of the proposed model exceed 0.80 in a combination of appropriate parameters.

Download Full-text

Decoding feature information in human auditory cortex—A comparison of auditory perception, short-term memory and imagery

The Journal of the Acoustical Society of America ◽

10.1121/1.4708771 ◽

2012 ◽

Vol 131 (4) ◽

pp. 3386-3386

Author(s):

Annika Carola Linke ◽

Rhodri Cusack

Keyword(s):

Auditory Cortex ◽

Auditory Perception ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Feature Information ◽

Human Auditory Cortex

Download Full-text

Comparing word and emotion recognition by listeners with normal hearing using unprocessed and vocoded speech stimuli

The Journal of the Acoustical Society of America ◽

10.1121/1.5146813 ◽

2020 ◽

Vol 148 (4) ◽

pp. 2465-2465

Author(s):

Shae D. Morgan

Keyword(s):

Emotion Recognition ◽

Normal Hearing ◽

Speech Stimuli ◽

Vocoded Speech

Download Full-text

Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

Applied Sciences ◽

10.3390/app112411738 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11738

Author(s):

Thomas Teixeira ◽

Éric Granger ◽

Alessandro Lameiras Koerich

Keyword(s):

Neural Networks ◽

Facial Expression ◽

Emotion Recognition ◽

Facial Expressions ◽

Convolutional Neural Networks ◽

Affective Computing ◽

Spatial Information ◽

Short Term Memory ◽

State Of The Art ◽

Fine Tuning

Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.

Download Full-text