Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners

RETRACTED ARTICLE: Deep convolutional neural network-based speech enhancement to improve speech intelligibility and quality for hearing-impaired listeners

Medical & Biological Engineering & Computing ◽

10.1007/s11517-018-1933-x ◽

2018 ◽

Vol 57 (3) ◽

pp. 757-757 ◽

Cited By ~ 1

Author(s):

P. F. Khaleelur Rahiman ◽

V. S. Jayanthi ◽

A. N. Jayanthi

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Hearing Impaired ◽

Deep Convolutional Neural Network ◽

Retracted Article

Download Full-text

Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201014 ◽

2021 ◽

Vol 40 (1) ◽

pp. 849-864

Author(s):

Nasir Saleem ◽

Muhammad Irfan Khattak ◽

Mu’ath Al-Hasan ◽

Atif Jan

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Deep Neural Networks ◽

Training Data ◽

Learning Approaches ◽

Performance Gain ◽

Noisy Speech ◽

Time Frequency ◽

Training Scheme

Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.

Download Full-text

Speech Enhancement from Fused Features Based on Deep Neural Network and Gated Recurrent Unit Network

10.21203/rs.3.rs-554205/v1 ◽

2021 ◽

Author(s):

Youming Wang ◽

Jiali Han ◽

Tianqi Zhang ◽

Didi Qing

Keyword(s):

Neural Network ◽

Power Spectrum ◽

Speech Enhancement ◽

Deep Neural Network ◽

Series Data ◽

Noisy Speech ◽

Deep Model ◽

Gated Recurrent Unit ◽

Unit Network

Abstract Speech is easily interfered by the external environment in reality, which will lose the important features. Deep learning method has become the mainstream method of speech enhancement because of its superior potential in complex nonlinear mapping problems. However, there are some problems are exist such as the deficiency for the learning the important information from previous time steps and long-term event dependencies. Due to the lack of the correlation in the same layer of Deep Neural Networks (DNNs), which is an existing typical intelligent deep model of speech signal, it is difficult to capture the long-term dependence between the time-series data. To overcome this problem, we propose a novel speech enhancement method from fused features based on deep neural network and gated recurrent unit network. The method takes advantage of both deep neural network and recurrent neural network to reduce the number of parameters and simultaneously improve speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of gated recurrent unit (GRU) network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53% respectively compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36% respectively. The advantage of the proposed method is that it uses of the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Download Full-text

Speech intelligibility of hearing impaired participants in long-term training of bone-conducted ultrasonic hearing aid

The Journal of the Acoustical Society of America ◽

10.1121/1.4805838 ◽

2013 ◽

Vol 133 (5) ◽

pp. 3383-3383

Author(s):

Toshie Matsui ◽

Ryota Shimokura ◽

Tadashi Nishimura ◽

Hiroshi Hosoi ◽

Seiji Nakagawa

Keyword(s):

Speech Intelligibility ◽

Hearing Aid ◽

Hearing Impaired

Download Full-text

Speech Intelligibility Enhancement Using Distortion Control

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.912-914.1391 ◽

2014 ◽

Vol 912-914 ◽

pp. 1391-1394

Author(s):

Yu Xiang Yang ◽

Jian Fen Ma

Keyword(s):

Noise Reduction ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Signal Distortion ◽

Gain Function ◽

Noisy Speech ◽

Speech Distortion ◽

Distortion Control ◽

Enhancement Algorithm

In order to improve the intelligibility of the noisy speech, a novel speech enhancement algorithm using distortion control is proposed. The reason why current speech enhancement algorithm cannot improve speech intelligibility is that these algorithms aim to minimize the overall distortion of the enhanced speech. However, different speech distortions make different contributions to the speech intelligibility. The distortion in excess of 6.02dB has the most detrimental effects on speech intelligibility. In the process of noise reduction, the type of speech distortion can be determined by signal distortion ratio. The distortion in excess of 6.02dB can be properly controlled via tuning the gain function of the speech enhancement algorithm. The experiment results show that the proposed algorithm can improve the intelligibility of the noisy speech considerably.

Download Full-text

Speech intelligibility of hearing impaired participants in long-term training of bone-conducted ultrasonic hearing aid

10.1121/1.4799193 ◽

2013 ◽

Cited By ~ 1

Author(s):

Toshie Matsui ◽

Ryota Shimokura ◽

Tadashi Nishimura ◽

Hiroshi Hosoi ◽

Seiji Nakagawa

Keyword(s):

Speech Intelligibility ◽

Hearing Aid ◽

Hearing Impaired

Download Full-text

The Correlation between Speech Discrimination and Speech Intelligibility Based on Slope Degree of Sensory/Neural Hearing Impaired Students

Journal of speech-language & hearing disorders ◽

10.15724/jslhd.2013.22.2.012 ◽

2013 ◽

Vol 22 (2) ◽

pp. 197-214

Author(s):

CHOI, SUNG KYU

Keyword(s):

Speech Intelligibility ◽

Hearing Impaired ◽

Speech Discrimination ◽

Sensory Neural ◽

Hearing Impaired Students

Download Full-text

Measuring the Influence of Noise Reduction on Listening Effort in Hearing-Impaired Listeners Using Response Times to an Arithmetic Task in Noise

Trends in Hearing ◽

10.1177/23312165211014437 ◽

2021 ◽

Vol 25 ◽

pp. 233121652110144

Author(s):

Ilja Reinten ◽

Inge De Ronde-Brons ◽

Rolph Houben ◽

Wouter Dreschler

Keyword(s):

Noise Reduction ◽

Hearing Aids ◽

Speech Intelligibility ◽

Response Times ◽

Hearing Impaired ◽

Listening Effort ◽

Arithmetic Task ◽

Test Retest Reliability ◽

Influence Of Noise ◽

Measured Response

Single microphone noise reduction (NR) in hearing aids can provide a subjective benefit even when there is no objective improvement in speech intelligibility. A possible explanation lies in a reduction of listening effort. Previously, we showed that response times (a proxy for listening effort) to an auditory-only dual-task were reduced by NR in normal-hearing (NH) listeners. In this study, we investigate if the results from NH listeners extend to the hearing-impaired (HI), the target group for hearing aids. In addition, we assess the relevance of the outcome measure for studying and understanding listening effort. Twelve HI subjects were asked to sum two digits of a digit triplet in noise. We measured response times to this task, as well as subjective listening effort and speech intelligibility. Stimuli were presented at three signal-to-noise ratios (SNR; –5, 0, +5 dB) and in quiet. Stimuli were processed with ideal or nonideal NR, or unprocessed. The effect of NR on response times in HI listeners was significant only in conditions where speech intelligibility was also affected (–5 dB SNR). This is in contrast to the previous results with NH listeners. There was a significant effect of SNR on response times for HI listeners. The response time measure was reasonably correlated ( R142 = 0.54) to subjective listening effort and showed a sufficient test–retest reliability. This study thus presents an objective, valid, and reliable measure for evaluating an aspect of listening effort of HI listeners.

Download Full-text

A Robust Dual-Microphone Generalized Sidelobe Canceller Using a Bone-Conduction Sensor for Speech Enhancement

Sensors ◽

10.3390/s21051878 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1878

Author(s):

Yi Zhou ◽

Haiping Wang ◽

Yijing Chu ◽

Hongqing Liu

Keyword(s):

Speech Enhancement ◽

Speech Intelligibility ◽

Bone Conduction ◽

Speech Quality ◽

Generalized Sidelobe Canceller ◽

Spatially Distributed ◽

Interference Signals ◽

Adaptive Noise ◽

Adaptive Noise Canceller ◽

Sidelobe Canceller

The use of multiple spatially distributed microphones allows performing spatial filtering along with conventional temporal filtering, which can better reject the interference signals, leading to an overall improvement of the speech quality. In this paper, we propose a novel dual-microphone generalized sidelobe canceller (GSC) algorithm assisted by a bone-conduction (BC) sensor for speech enhancement, which is named BC-assisted GSC (BCA-GSC) algorithm. The BC sensor is relatively insensitive to the ambient noise compared to the conventional air-conduction (AC) microphone. Hence, BC speech can be analyzed to generate very accurate voice activity detection (VAD), even in a high noise environment. The proposed algorithm incorporates the VAD information obtained by the BC speech into the adaptive blocking matrix (ABM) and adaptive noise canceller (ANC) in GSC. By using VAD to control ABM and combining VAD with signal-to-interference ratio (SIR) to control ANC, the proposed method could suppress interferences and improve the overall performance of GSC significantly. It is verified by experiments that the proposed GSC system not only improves speech quality remarkably but also boosts speech intelligibility.

Download Full-text

Mechanisms of Spectrotemporal Modulation Detection for Normal- and Hearing-Impaired Listeners

Trends in Hearing ◽

10.1177/2331216520978029 ◽

2021 ◽

Vol 25 ◽

pp. 233121652097802

Author(s):

Emmanuel Ponsot ◽

Léo Varnet ◽

Nicolas Wallaert ◽

Elza Daoud ◽

Shihab A. Shamma ◽

...

Keyword(s):

Speech Intelligibility ◽

Hearing Impaired ◽

Temporal Modulation ◽

Computational Tools ◽

Cochlear Tuning ◽

Cochlear Hearing Loss ◽

Nonlinear Processing ◽

Linear Band ◽

Band Pass ◽

Modulation Filter

Spectrotemporal modulations (STM) are essential features of speech signals that make them intelligible. While their encoding has been widely investigated in neurophysiology, we still lack a full understanding of how STMs are processed at the behavioral level and how cochlear hearing loss impacts this processing. Here, we introduce a novel methodological framework based on psychophysical reverse correlation deployed in the modulation space to characterize the mechanisms underlying STM detection in noise. We derive perceptual filters for young normal-hearing and older hearing-impaired individuals performing a detection task of an elementary target STM (a given product of temporal and spectral modulations) embedded in other masking STMs. Analyzed with computational tools, our data show that both groups rely on a comparable linear (band-pass)–nonlinear processing cascade, which can be well accounted for by a temporal modulation filter bank model combined with cross-correlation against the target representation. Our results also suggest that the modulation mistuning observed for the hearing-impaired group results primarily from broader cochlear filters. Yet, we find idiosyncratic behaviors that cannot be captured by cochlear tuning alone, highlighting the need to consider variability originating from additional mechanisms. Overall, this integrated experimental-computational approach offers a principled way to assess suprathreshold processing distortions in each individual and could thus be used to further investigate interindividual differences in speech intelligibility.

Download Full-text