Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners

Author(s):  
Dongmei Wang ◽  
Philipos C. Loizou ◽  
John H. L. Hansen
2021 ◽  
Vol 40 (1) ◽  
pp. 849-864
Author(s):  
Nasir Saleem ◽  
Muhammad Irfan Khattak ◽  
Mu’ath Al-Hasan ◽  
Atif Jan

Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.


2021 ◽  
Author(s):  
Youming Wang ◽  
Jiali Han ◽  
Tianqi Zhang ◽  
Didi Qing

Abstract Speech is easily interfered by the external environment in reality, which will lose the important features. Deep learning method has become the mainstream method of speech enhancement because of its superior potential in complex nonlinear mapping problems. However, there are some problems are exist such as the deficiency for the learning the important information from previous time steps and long-term event dependencies. Due to the lack of the correlation in the same layer of Deep Neural Networks (DNNs), which is an existing typical intelligent deep model of speech signal, it is difficult to capture the long-term dependence between the time-series data. To overcome this problem, we propose a novel speech enhancement method from fused features based on deep neural network and gated recurrent unit network. The method takes advantage of both deep neural network and recurrent neural network to reduce the number of parameters and simultaneously improve speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of gated recurrent unit (GRU) network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53% respectively compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36% respectively. The advantage of the proposed method is that it uses of the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.


2013 ◽  
Vol 133 (5) ◽  
pp. 3383-3383
Author(s):  
Toshie Matsui ◽  
Ryota Shimokura ◽  
Tadashi Nishimura ◽  
Hiroshi Hosoi ◽  
Seiji Nakagawa

2014 ◽  
Vol 912-914 ◽  
pp. 1391-1394
Author(s):  
Yu Xiang Yang ◽  
Jian Fen Ma

In order to improve the intelligibility of the noisy speech, a novel speech enhancement algorithm using distortion control is proposed. The reason why current speech enhancement algorithm cannot improve speech intelligibility is that these algorithms aim to minimize the overall distortion of the enhanced speech. However, different speech distortions make different contributions to the speech intelligibility. The distortion in excess of 6.02dB has the most detrimental effects on speech intelligibility. In the process of noise reduction, the type of speech distortion can be determined by signal distortion ratio. The distortion in excess of 6.02dB can be properly controlled via tuning the gain function of the speech enhancement algorithm. The experiment results show that the proposed algorithm can improve the intelligibility of the noisy speech considerably.


2013 ◽  
Author(s):  
Toshie Matsui ◽  
Ryota Shimokura ◽  
Tadashi Nishimura ◽  
Hiroshi Hosoi ◽  
Seiji Nakagawa

2021 ◽  
Vol 25 ◽  
pp. 233121652110144
Author(s):  
Ilja Reinten ◽  
Inge De Ronde-Brons ◽  
Rolph Houben ◽  
Wouter Dreschler

Single microphone noise reduction (NR) in hearing aids can provide a subjective benefit even when there is no objective improvement in speech intelligibility. A possible explanation lies in a reduction of listening effort. Previously, we showed that response times (a proxy for listening effort) to an auditory-only dual-task were reduced by NR in normal-hearing (NH) listeners. In this study, we investigate if the results from NH listeners extend to the hearing-impaired (HI), the target group for hearing aids. In addition, we assess the relevance of the outcome measure for studying and understanding listening effort. Twelve HI subjects were asked to sum two digits of a digit triplet in noise. We measured response times to this task, as well as subjective listening effort and speech intelligibility. Stimuli were presented at three signal-to-noise ratios (SNR; –5, 0, +5 dB) and in quiet. Stimuli were processed with ideal or nonideal NR, or unprocessed. The effect of NR on response times in HI listeners was significant only in conditions where speech intelligibility was also affected (–5 dB SNR). This is in contrast to the previous results with NH listeners. There was a significant effect of SNR on response times for HI listeners. The response time measure was reasonably correlated ( R142 = 0.54) to subjective listening effort and showed a sufficient test–retest reliability. This study thus presents an objective, valid, and reliable measure for evaluating an aspect of listening effort of HI listeners.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1878
Author(s):  
Yi Zhou ◽  
Haiping Wang ◽  
Yijing Chu ◽  
Hongqing Liu

The use of multiple spatially distributed microphones allows performing spatial filtering along with conventional temporal filtering, which can better reject the interference signals, leading to an overall improvement of the speech quality. In this paper, we propose a novel dual-microphone generalized sidelobe canceller (GSC) algorithm assisted by a bone-conduction (BC) sensor for speech enhancement, which is named BC-assisted GSC (BCA-GSC) algorithm. The BC sensor is relatively insensitive to the ambient noise compared to the conventional air-conduction (AC) microphone. Hence, BC speech can be analyzed to generate very accurate voice activity detection (VAD), even in a high noise environment. The proposed algorithm incorporates the VAD information obtained by the BC speech into the adaptive blocking matrix (ABM) and adaptive noise canceller (ANC) in GSC. By using VAD to control ABM and combining VAD with signal-to-interference ratio (SIR) to control ANC, the proposed method could suppress interferences and improve the overall performance of GSC significantly. It is verified by experiments that the proposed GSC system not only improves speech quality remarkably but also boosts speech intelligibility.


2021 ◽  
Vol 25 ◽  
pp. 233121652097802
Author(s):  
Emmanuel Ponsot ◽  
Léo Varnet ◽  
Nicolas Wallaert ◽  
Elza Daoud ◽  
Shihab A. Shamma ◽  
...  

Spectrotemporal modulations (STM) are essential features of speech signals that make them intelligible. While their encoding has been widely investigated in neurophysiology, we still lack a full understanding of how STMs are processed at the behavioral level and how cochlear hearing loss impacts this processing. Here, we introduce a novel methodological framework based on psychophysical reverse correlation deployed in the modulation space to characterize the mechanisms underlying STM detection in noise. We derive perceptual filters for young normal-hearing and older hearing-impaired individuals performing a detection task of an elementary target STM (a given product of temporal and spectral modulations) embedded in other masking STMs. Analyzed with computational tools, our data show that both groups rely on a comparable linear (band-pass)–nonlinear processing cascade, which can be well accounted for by a temporal modulation filter bank model combined with cross-correlation against the target representation. Our results also suggest that the modulation mistuning observed for the hearing-impaired group results primarily from broader cochlear filters. Yet, we find idiosyncratic behaviors that cannot be captured by cochlear tuning alone, highlighting the need to consider variability originating from additional mechanisms. Overall, this integrated experimental-computational approach offers a principled way to assess suprathreshold processing distortions in each individual and could thus be used to further investigate interindividual differences in speech intelligibility.


Sign in / Sign up

Export Citation Format

Share Document