Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Haitao Lang; Jie Yang

doi:10.3390/electronics9071125

Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Electronics ◽

10.3390/electronics9071125 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1125

Author(s):

Haitao Lang ◽

Jie Yang

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Performance Comparison ◽

Acoustic Features ◽

Subjective Test ◽

Noisy Speech ◽

Acoustic Representation ◽

Ablation Study ◽

Fusion Feature ◽

Human Listener

Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.

Download Full-text

Application of Perceptual Filtering Models to Noisy Speech Signals Enhancement

Journal of Electrical and Computer Engineering ◽

10.1155/2012/282019 ◽

2012 ◽

Vol 2012 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Novlene Zoghlami ◽

Zied Lachiri

Keyword(s):

Speech Enhancement ◽

Noise Suppression ◽

Filter Banks ◽

Perceptual Quality ◽

Speech Signals ◽

Subjective Test ◽

Noisy Speech ◽

Estimation Algorithms ◽

Noise Estimate ◽

Quality Rating

This paper describes a new speech enhancement approach using perceptually based noise reduction. The proposed approach is based on the application of two perceptual filtering models to noisy speech signals: the gammatone and the gammachirp filter banks with nonlinear resolution according to the equivalent rectangular bandwidth (ERB) scale. The perceptual filtering gives a number of subbands that are individually spectral weighted and modified according to two different noise suppression rules. The importance of an accurate noise estimate is related to the reduction of the musical noise artifacts in the processed speech that appears after classic subtractive process. In this context, we use continuous noise estimation algorithms. The performance of the proposed approach is evaluated on speech signals corrupted by real-world noises. Using objective tests based on the perceptual quality PESQ score and the quality rating of signal distortion (SIG), noise distortion (BAK) and overall quality (OVRL), and subjective test based on the quality rating of automatic speech recognition (ASR), we demonstrate that our speech enhancement approach using filter banks modeling the human auditory system outperforms the conventional spectral modification algorithms to improve quality and intelligibility of the enhanced speech signal.

Download Full-text

Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-192088 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6881-6889

Author(s):

Jie Wang ◽

Linhuang Yan ◽

Jiayi Tian ◽

Minmin Yuan

Keyword(s):

Speech Enhancement ◽

Visual Processing ◽

Single Channel ◽

Signal To Noise Ratio ◽

Spectral Amplitude ◽

Signal To Noise ◽

Noisy Speech ◽

Time Frequency ◽

Perceptual Evaluation ◽

Noise Ratio

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.

Download Full-text

Single Channel Speech Enhancement Using Adaptive Soft-Thresholding with Bivariate EMD

ISRN Signal Processing ◽

10.1155/2013/724378 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Md. Ekramul Hamid ◽

Md. Khademul Islam Molla ◽

Xin Dang ◽

Takayoshi Nakai

Keyword(s):

Speech Enhancement ◽

Speech Signal ◽

Single Channel ◽

Complex Signal ◽

Intrinsic Mode Functions ◽

Noisy Speech ◽

Mode Decomposition ◽

Data Adaptive ◽

Improved Performance ◽

Short Time

This paper presents a novel data adaptive thresholding approach to single channel speech enhancement. The noisy speech signal and fractional Gaussian noise (fGn) are combined to produce the complex signal. The fGn is generated using the noise variance roughly estimated from the noisy speech signal. Bivariate empirical mode decomposition (bEMD) is employed to decompose the complex signal into a finite number of complex-valued intrinsic mode functions (IMFs). The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. Each IMF is divided into short time frames for local processing. The variance of IMF of fGn calculated within a frame is used as the reference term to classify corresponding noisy speech frame into noise and signal dominant frames. Only the noise dominant frames are soft-thresholded to reduce the noise effects. Then, all the frames as well as IMFs of speech are combined, yielding the enhanced speech signal. The experimental results show the improved performance of the proposed algorithm compared to the recently reported methods.

Download Full-text

Adaptive Single-Channel Speech Enhancement Method for a Push-To-Talk Enabled Wireless Communication Device

IEICE Transactions on Communications ◽

10.1587/transcom.2015ccp0023 ◽

2016 ◽

Vol E99.B (8) ◽

pp. 1745-1753

Author(s):

Hyoung-Gook KIM ◽

Jin Young KIM

Keyword(s):

Wireless Communication ◽

Speech Enhancement ◽

Single Channel ◽

Communication Device ◽

Enhancement Method

Download Full-text

Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement

10.21437/interspeech.2018-1439 ◽

2018 ◽

Author(s):

Li Chai ◽

Jun Du ◽

Chin-Hui Lee

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Single Channel ◽

Laplace Distribution ◽

Error Modeling ◽

Asymmetric Laplace Distribution

Download Full-text

Perceptual weighting deep neural networks for single-channel speech enhancement

2016 12th World Congress on Intelligent Control and Automation (WCICA) ◽

10.1109/wcica.2016.7578300 ◽

2016 ◽

Cited By ~ 2

Author(s):

Wei Han ◽

Xiongwei Zhang ◽

Gang Min ◽

Xingyu Zhou ◽

Wei Zhang

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Deep Neural Networks ◽

Single Channel ◽

Perceptual Weighting

Download Full-text

A New Weighted Loss for Single Channel Speech Enhancement under Low Signal-to-Noise Ratio Environment

2020 15th IEEE International Conference on Signal Processing (ICSP) ◽

10.1109/icsp48669.2020.9320989 ◽

2020 ◽

Author(s):

Jian Xiao ◽

Hongqing Liu ◽

Yi Zhou ◽

Zhen Luo

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Noise Ratio

Download Full-text

Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement based on Spherical Uncertainty Set

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2020.3042013 ◽

2020 ◽

pp. 1-1

Author(s):

Doerte Fischer ◽

Simon Doclo

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Uncertainty Set

Download Full-text

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽

10.3390/signals2030027 ◽

2021 ◽

Vol 2 (3) ◽

pp. 434-455

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Real Life ◽

Model Parameters ◽

Noise Variance ◽

Noisy Speech ◽

Kalman Gain ◽

Whitening Filter ◽

Prediction Coefficient

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Download Full-text

A single channel speech enhancement method based on masking properties and minimum statistics

6th International Conference on Signal Processing, 2002. ◽

10.1109/icosp.2002.1181091 ◽

2003 ◽

Cited By ~ 1

Author(s):

Jiang Xiaoping ◽

Fu Hua ◽

Yao Tianren

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Enhancement Method ◽

Minimum Statistics

Download Full-text