Single Channel Speech Enhancement Using Adaptive Soft-Thresholding with Bivariate EMD

ISRN Signal Processing ◽

10.1155/2013/724378 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Md. Ekramul Hamid ◽

Md. Khademul Islam Molla ◽

Xin Dang ◽

Takayoshi Nakai

Keyword(s):

Speech Enhancement ◽

Speech Signal ◽

Single Channel ◽

Complex Signal ◽

Intrinsic Mode Functions ◽

Noisy Speech ◽

Mode Decomposition ◽

Data Adaptive ◽

Improved Performance ◽

Short Time

This paper presents a novel data adaptive thresholding approach to single channel speech enhancement. The noisy speech signal and fractional Gaussian noise (fGn) are combined to produce the complex signal. The fGn is generated using the noise variance roughly estimated from the noisy speech signal. Bivariate empirical mode decomposition (bEMD) is employed to decompose the complex signal into a finite number of complex-valued intrinsic mode functions (IMFs). The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. Each IMF is divided into short time frames for local processing. The variance of IMF of fGn calculated within a frame is used as the reference term to classify corresponding noisy speech frame into noise and signal dominant frames. Only the noise dominant frames are soft-thresholded to reduce the noise effects. Then, all the frames as well as IMFs of speech are combined, yielding the enhanced speech signal. The experimental results show the improved performance of the proposed algorithm compared to the recently reported methods.

Download Full-text

Bivariate EMD-Based Data Adaptive Approach to the Analysis of Climate Variability

Discrete Dynamics in Nature and Society ◽

10.1155/2011/935034 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 7

Author(s):

Md. Khademul Islam Molla ◽

Poly Rani Ghosh ◽

Keikichi Hirose

Keyword(s):

Climate Change ◽

Climate Variability ◽

Reference Signal ◽

Complex Signal ◽

Intrinsic Mode Functions ◽

Climate Data ◽

Climate Signal ◽

Adaptive Approach ◽

Climate Change And Variability ◽

Data Adaptive

This paper presents a data adaptive approach for the analysis of climate variability using bivariate empirical mode decomposition (BEMD). The time series of climate factors: daily evaporation, maximum and minimum temperatures are taken into consideration in variability analysis. All climate data are collected from a specific area of Bihar in India. Fractional Gaussian noise (fGn) is used here as the reference signal. The climate signal and fGn (of same length) are combined to produce bivariate (complex) signal which is decomposed using BEMD into a finite number of sub-band signals named intrinsic mode functions (IMFs). Both of climate signal as well as fGn are decomposed together into IMFs. The instantaneous frequencies and Fourier spectrum of IMFs are observed to illustrate the property of BEMD. The lowest frequency oscillation of climate signal represents the annual cycle (AC) which is an important factor in analyzing climate change and variability. The energies of the fGn's IMFs are used to define the data adaptive threshold to separate AC. The IMFs of climate signal with energy exceeding such threshold are summed up to separate the AC. The interannual distance of climate signal is also illustrated for better understanding of climate change and variability.

Download Full-text

Preterm-Term Birth Classification Using EMD-Based Time-Domain Features of Single-Channel Electrohysterogram Data

10.21203/rs.3.rs-570938/v1 ◽

2021 ◽

Author(s):

Suparerk Janjarasjitt

Keyword(s):

Preterm Birth ◽

Time Domain ◽

Single Channel ◽

Support Vector ◽

Computational Results ◽

Intrinsic Mode Functions ◽

Term Birth ◽

Emg Signal ◽

Mode Decomposition ◽

The Time Domain

Abstract The preterm birth anticipation is a crucial task that can reduce the rate of preterm birth and also the complications of preterm birth. Electrohysterogram (EHG) or uterine electromyogram (EMG) data have been evidenced that they can provide an information useful for preterm birth anticipation. Four distinct time-domain features, i.e., mean absolute value, average amplitude change, difference absolute standard deviation value, and log detector, commonly applied to EMG signal processing are applied and investigated in this study. A single-channel of EHG data is decomposed into its constituent components, i.e., intrinsic mode functions, using empirical mode decomposition (EMD) before their time-domain features are extracted. The time-domain features of intrinsic mode functions of EHG data associated with preterm and term births are applied for preterm-term birth classification using support vector machine (SVM) with a radial basis function. The preterm-term classifications are validated using 10-fold cross validation. From the computational results, it is shown that the excellent preterm-term birth classification can be achieved using a single-channel of EHG data. The computational results further suggest that the best overall performance on preterm-term birth classification is obtained when thirteen (out of sixteen) EMD-based time-domain features are applied. The best accuracy, sensitivity, specificity, and F1-score achieved are, respectively, 0.9382, 0.9130, 0.9634, and 0.9366.

Download Full-text

Single-Channel Speech Enhancement Techniques for Distant Speech Recognition

Journal of Intelligent Systems ◽

10.1515/jisys-2012-0051 ◽

2013 ◽

Vol 22 (2) ◽

pp. 81-93

Author(s):

Jaya Kumar Ashwini ◽

Ramaswamy Kumaraswamy

Keyword(s):

Speech Recognition ◽

Impulse Response ◽

Speech Enhancement ◽

Environmental Conditions ◽

Speech Signal ◽

Single Channel ◽

A Priori ◽

Humanoid Robots ◽

Intelligent Home ◽

Office Environments

AbstractThis article presents an overview of the single-channel dereverberation methods suitable for distant speech recognition (DSR) application. The dereverberation methods are mainly classified based on the domain of enhancement of speech signal captured by a distant microphone. Many single-channel speech enhancement methods focus on either denoising or dereverberating the distorted speech signal. There are very few methods that consider both noise and reverberation effects. Such methods are discussed under a multistage approach in this article. The article concludes with a hypothesis that the methods that do not require an a priori reverberation impulse response is desirable in varying the environmental conditions for DSR applications such as intelligent home and office environments, humanoid robots, and automobiles rather than the methods that require an a priori reverberation impulse response.

Download Full-text

Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Electronics ◽

10.3390/electronics9071125 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1125

Author(s):

Haitao Lang ◽

Jie Yang

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Performance Comparison ◽

Acoustic Features ◽

Subjective Test ◽

Noisy Speech ◽

Acoustic Representation ◽

Ablation Study ◽

Fusion Feature ◽

Human Listener

Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.

Download Full-text

Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum

10.21437/interspeech.2011-425 ◽

2011 ◽

Author(s):

Kuldip Paliwal ◽

Belinda Schwerin ◽

Kamil Wójcicki

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Magnitude Spectrum ◽

Time Modulation ◽

Mmse Estimation ◽

Short Time

Download Full-text

A variant of SWEMDH technique based on variational mode decomposition for speech enhancement

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-210072 ◽

2021 ◽

Vol 25 (3) ◽

pp. 299-308

Author(s):

Poovarasan Selvaraj ◽

E. Chandra

Keyword(s):

White Noise ◽

Speech Enhancement ◽

Speech Signal ◽

High Frequency ◽

Hurst Exponent ◽

Sliding Window ◽

Variational Mode Decomposition ◽

Vocal Signal ◽

Mode Decomposition ◽

Vocal Signals

In Speech Enhancement (SE) techniques, the major challenging task is to suppress non-stationary noises including white noise in real-time application scenarios. Many techniques have been developed for enhancing the vocal signals; however, those were not effective for suppressing non-stationary noises very well. Also, those have high time and resource consumption. As a result, Sliding Window Empirical Mode Decomposition and Hurst (SWEMDH)-based SE method where the speech signal was decomposed into Intrinsic Mode Functions (IMFs) based on the sliding window and the noise factor in each IMF was chosen based on the Hurst exponent data. Also, the least corrupted IMFs were utilized to restore the vocal signal. However, this technique was not suitable for white noise scenarios. Therefore in this paper, a Variant of Variational Mode Decomposition (VVMD) with SWEMDH technique is proposed to reduce the complexity in real-time applications. The key objective of this proposed SWEMD-VVMDH technique is to decide the IMFs based on Hurst exponent and then apply the VVMD technique to suppress both low- and high-frequency noisy factors from the vocal signals. Originally, the noisy vocal signal is decomposed into many IMFs using SWEMDH technique. Then, Hurst exponent is computed to decide the IMFs with low-frequency noisy factors and Narrow-Band Components (NBC) is computed to decide the IMFs with high-frequency noisy factors. Moreover, VVMD is applied on the addition of all chosen IMF to remove both low- and high-frequency noisy factors. Thus, the speech signal quality is improved under non-stationary noises including additive white Gaussian noise. Finally, the experimental outcomes demonstrate the significant speech signal improvement under both non-stationary and white noise surroundings.

Download Full-text

Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417580034 ◽

2017 ◽

Vol 31 (08) ◽

pp. 1758003

Author(s):

Shifeng Ou ◽

Peng Song ◽

Ying Gao

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Single Channel ◽

Signal To Noise Ratio ◽

A Priori ◽

Processing System ◽

Phase Information ◽

Amplitude Spectra ◽

Phase Sensitive ◽

Short Time

The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.

Download Full-text

Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-192088 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6881-6889

Author(s):

Jie Wang ◽

Linhuang Yan ◽

Jiayi Tian ◽

Minmin Yuan

Keyword(s):

Speech Enhancement ◽

Visual Processing ◽

Single Channel ◽

Signal To Noise Ratio ◽

Spectral Amplitude ◽

Signal To Noise ◽

Noisy Speech ◽

Time Frequency ◽

Perceptual Evaluation ◽

Noise Ratio

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.

Download Full-text

Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

Advances in Acoustics and Vibration ◽

10.1155/2014/765454 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Erhan Deger ◽

Md. Khademul Islam Molla ◽

Keikichi Hirose ◽

Nobuaki Minematsu ◽

Md. Kamrul Hasan

Keyword(s):

Speech Enhancement ◽

Time Domain ◽

Noise Variance ◽

Intrinsic Mode Functions ◽

Soft Thresholding ◽

Second Stage ◽

Mode Decomposition ◽

Dct Domain ◽

Snr Improvement ◽

Thresholding Algorithm

This paper presents a two-stage soft thresholding algorithm based on discrete cosine transform (DCT) and empirical mode decomposition (EMD). In the first stage, noisy speech is decomposed into eight frequency bands and a specific noise variance is calculated for each one. Based on this variance, each band is denoised using soft thresholding in DCT domain. The remaining noise is eliminated in the second stage through a time domain soft thresholding strategy adapted to the intrinsic mode functions (IMFs) derived by applying EMD on the signal obtained from the first stage processing. Significantly better SNR improvement and perceptual speech quality results for different noise types prove the superiority of the proposed algorithm over recently reported techniques.

Download Full-text

Speech enhancement methods based on binaural cue coding

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-019-0164-x ◽

2019 ◽

Vol 2019 (1) ◽

Author(s):

Xianyun Wang ◽

Changchun Bao

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Speech Signal ◽

Deep Neural Network ◽

Experimental Results ◽

Channel Correlation ◽

Noisy Speech ◽

Time Frequency ◽

Improve Accuracy ◽

Better Than

AbstractAccording to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.

Download Full-text