A Unified Speech Enhancement System Based on Neural Beamforming With Parabolic Reflector

Tao Zhang; Yanzhang Geng; Jianhong Sun; Chen Jiao; Biyun Ding

doi:10.3390/app10072218

A Unified Speech Enhancement System Based on Neural Beamforming With Parabolic Reflector

Applied Sciences ◽

10.3390/app10072218 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2218

Author(s):

Tao Zhang ◽

Yanzhang Geng ◽

Jianhong Sun ◽

Chen Jiao ◽

Biyun Ding

Keyword(s):

Speech Enhancement ◽

Background Noise ◽

Signal To Noise Ratio ◽

Noise Levels ◽

Parabolic Reflector ◽

Time Frequency ◽

Low Snr ◽

Objective Metrics ◽

Babble Noise ◽

Fusion Framework

This paper presents a unified speech enhancement system to remove both background noise and interfering speech in serious noise environments by jointly utilizing the parabolic reflector model and neural beamformer. First, the amplification property of paraboloid is discussed, which significantly improves the Signal-to-Noise Ratio (SNR) of a desired signal. Therefore, an appropriate paraboloid channel is analyzed and designed through the boundary element method. On the other hand, a time-frequency masking approach and a mask-based beamforming approach are discussed and incorporated in an enhancement system. It is worth noticing that signals provided by the paraboloid and the beamformer are exactly complementary. Finally, these signals are employed in a learning-based fusion framework to further improve the system performance in low SNR environments. Experiments demonstrate that our system is effective and robust in five different noisy conditions (speech interfered with factory, pink, destroyer engine, volvo, and babble noise), as well as in different noise levels. Compared with the original noisy speech, significant average objective metrics improvements are about Δ STOI = 0.28, Δ PESQ = 1.31, Δ fwSegSNR = 11.9.

Download Full-text

Speech Enhancement Using Neuro-Fuzzy Classifier

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch009 ◽

2021 ◽

pp. 164-181

Author(s):

Judith Justin ◽

Vanithamani R.

Keyword(s):

Feature Extraction ◽

Speech Enhancement ◽

The Other ◽

Objective Measures ◽

Noise Levels ◽

Fuzzy Classifier ◽

Noisy Speech ◽

Enhancement Technique ◽

Time Frequency ◽

Neuro Fuzzy

In this chapter, a speech enhancement technique is implemented using a neuro-fuzzy classifier. Noisy speech sentences from NOIZEUS and AURORA databases are taken for the study. Feature extraction is implemented through modifications in amplitude magnitude spectrograms. A four class neuro-fuzzy classifier splits the noisy speech samples into noise-only part, signal only part, more noise-less signal part, and more signal-less noise part of the time-frequency units. Appropriate weights are applied in the enhancement phase. The enhanced speech sentence is evaluated using objective measures. An analysis of the performance of the Neuro-Fuzzy 4 (NF 4) classifier is done. A comparison of the performance of the classifier with other conventional techniques is done for various noises at different noise levels. It is observed that the numerical values of the measures obtained are better when compared to the others. An overall comparison of the performance of the NF 4 classifier is done and it is inferred that NF4 outperforms the other techniques in speech enhancement.

Download Full-text

A Radar Signal Recognition Approach via IIF-Net Deep Learning Models

Computational Intelligence and Neuroscience ◽

10.1155/2020/8858588 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Ji Li ◽

Huiqiang Zhang ◽

Jianping Ou ◽

Wei Wang

Keyword(s):

Recognition Accuracy ◽

Signal To Noise Ratio ◽

Recognition Rate ◽

Software Radio ◽

Radar Signal ◽

Signal Recognition ◽

Electromagnetic Environment ◽

Time Frequency ◽

Low Snr ◽

Radar Signals

In the increasingly complex electromagnetic environment of modern battlefields, how to quickly and accurately identify radar signals is a hotspot in the field of electronic countermeasures. In this paper, USRP N210, USRP-LW N210, and other general software radio peripherals are used to simulate the transmitting and receiving process of radar signals, and a total of 8 radar signals, namely, Barker, Frank, chaotic, P1, P2, P3, P4, and OFDM, are produced. The signal obtains time-frequency images (TFIs) through the Choi–Williams distribution function (CWD). According to the characteristics of the radar signal TFI, a global feature balance extraction module (GFBE) is designed. Then, a new IIF-Net convolutional neural network with fewer network parameters and less computation cost has been proposed. The signal-to-noise ratio (SNR) range is −10 to 6 dB in the experiments. The experiments show that when the SNR is higher than −2 dB, the signal recognition rate of IIF-Net is as high as 99.74%, and the signal recognition accuracy is still 92.36% when the SNR is −10 dB. Compared with other methods, IIF-Net has higher recognition rate and better robustness under low SNR.

Download Full-text

Radar Emitter Recognition Based on the Energy Cumulant of Short Time Fourier Transform and Reinforced Deep Belief Network

Sensors ◽

10.3390/s18093103 ◽

2018 ◽

Vol 18 (9) ◽

pp. 3103 ◽

Cited By ~ 11

Author(s):

Xuebao Wang ◽

Gaoming Huang ◽

Zhiwen Zhou ◽

Wei Tian ◽

Jialun Yao ◽

...

Keyword(s):

Fourier Transform ◽

Signal To Noise Ratio ◽

Recognition Rate ◽

Deep Belief Network ◽

Short Time Fourier Transform ◽

Belief Network ◽

Time Frequency ◽

Low Snr ◽

Emitter Recognition ◽

Short Time

To cope with the complex electromagnetic environment and varied signal styles, a novel method based on the energy cumulant of short time Fourier transform and reinforced deep belief network is proposed to gain a higher correct recognition rate for radar emitter intra-pulse signals at a low signal-to-noise ratio. The energy cumulant of short time Fourier transform is attained by calculating the accumulations of each frequency sample value with the different time samples. Before this procedure, the time frequency distribution via short time Fourier transform is processed by base noise reduction. The reinforced deep belief network is proposed to employ the input feature vectors for training to achieve the radar emitter recognition and classification. Simulation results manifest that the proposed method is feasible and robust in radar emitter recognition even at a low SNR.

Download Full-text

Emitter Signal Waveform Classification Based on Autocorrelation and Time-Frequency Analysis

Electronics ◽

10.3390/electronics8121419 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1419 ◽

Cited By ~ 1

Author(s):

Zhiyuan Ma ◽

Zhi Huang ◽

Anni Lin ◽

Guangming Huang

Keyword(s):

Hybrid Model ◽

Signal To Noise Ratio ◽

Recognition Rate ◽

Complete Classification ◽

Construction Technique ◽

Electronic Warfare ◽

Image Construction ◽

Time Frequency ◽

Low Snr ◽

Signal Waveform

Emitter signal waveform recognition and classification are necessary survival techniques in electronic warfare systems. The emitters use various techniques for power management and complex intra-pulse modulations, which can create what looks like a noisy signal to an intercept receiver, so emitter signal waveform recognition at a low signal-to-noise ratio (SNR) has gained increased attention. In this study, we propose an autocorrelation feature image construction technique (ACFICT) combined with a convolutional neural network (CNN) to maintain the unique feature of each signal, and a structure optimization for CNN input layer called hybrid model is designed to achieve image enhancement of the signal autocorrelation, which is different from using a single image combined with CNN to complete classification. We demonstrate the performance of ACFICT by comparing feature images generated by different signal pre-processing algorithms, and the evaluation indicators are signal recognition rate, image stability degree, and image restoration degree. This paper simulates six types of the signals by combining ACFICT with three types of hybrid model, the simulation results compared with the literature show that the proposed methods not only has a high universality, but also better adapts to waveform recognition at low SNR environment. When the SNR is –6 dB, the overall recognition rate of the method reaches 88%.

Download Full-text

Natural Speech Intelligibility in Theatres in Relation to Their Acoustics

Building Acoustics ◽

10.1260/1351-010x.18.3-4.293 ◽

2011 ◽

Vol 18 (3-4) ◽

pp. 293-311

Author(s):

Maarten P.M. Luykx ◽

Martijn L.S. Vercammen

Keyword(s):

Background Noise ◽

Speech Intelligibility ◽

Signal To Noise Ratio ◽

Speech Sound ◽

Sound Level ◽

Natural Speech ◽

Noise Levels ◽

Signal To Noise ◽

Background Levels ◽

Noise Ratio

There is a certain tendency in the design of theatres to make the halls quite large. From a perspective of natural speech intelligibility and strength of speech this is disadvantageous, because an actor's voice has a certain, limited loudness and consequently the signal-to-noise ratio at the listener may become too low. Based on the influence of signal/noise ratio on speech intelligibility, it is deduced that the strength G ≥ 6 dB and room volumes have to be limited to 4000–4500 m3 in order to maintain sufficient loudness for natural speech. Sound level measurements during performances with natural speech in a theatre have been performed, to determine background noise levels in the hall due to the audience and to investigate the signal-to-noise ratio of the actors voice at the audience. The background levels are mainly determined by installation noise and not by the influence of the audience.

Download Full-text

Speech Enhancement in Low SNR Environments by Designing a Time-Frequency Binary Mask

2018 IEEE 23rd International Conference on Digital Signal Processing (DSP) ◽

10.1109/icdsp.2018.8631645 ◽

2018 ◽

Author(s):

Shuai Cheng ◽

Haijian Zhang ◽

Guang Hua

Keyword(s):

Speech Enhancement ◽

Binary Mask ◽

Time Frequency ◽

Low Snr

Download Full-text

Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-192088 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6881-6889

Author(s):

Jie Wang ◽

Linhuang Yan ◽

Jiayi Tian ◽

Minmin Yuan

Keyword(s):

Speech Enhancement ◽

Visual Processing ◽

Single Channel ◽

Signal To Noise Ratio ◽

Spectral Amplitude ◽

Signal To Noise ◽

Noisy Speech ◽

Time Frequency ◽

Perceptual Evaluation ◽

Noise Ratio

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.

Download Full-text

Speech Enhancement Based on the Multi-Scales and Multi-Thresholds of the Auditory Perception Wavelet Transform

Archives of Acoustics ◽

10.2478/v10168-011-0037-5 ◽

2011 ◽

Vol 36 (3) ◽

pp. 519-532 ◽

Cited By ~ 2

Author(s):

Zhi Tao ◽

He-Ming Zhao ◽

Xiao-Jun Zhang ◽

Di Wu

Keyword(s):

Wavelet Transform ◽

Speech Enhancement ◽

Auditory Perception ◽

Noise Suppression ◽

Signal To Noise Ratio ◽

Masking Effect ◽

Auditory Masking ◽

Low Snr ◽

Residual Noise ◽

Multi Scales

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.

Download Full-text

Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement

Trends in Hearing ◽

10.1177/23312165211068629 ◽

2022 ◽

Vol 26 ◽

pp. 233121652110686

Author(s):

Tim Green ◽

Gaston Hilkhuysen ◽

Mark Huckvale ◽

Stuart Rosen ◽

Mike Brookes ◽

...

Keyword(s):

Speech Enhancement ◽

Hearing Aids ◽

Spatial Information ◽

Signal To Noise Ratio ◽

Target Position ◽

Spatial Diversity ◽

Processing Scheme ◽

Time Frequency ◽

Sentence Recognition ◽

Enhancement Method

A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.

Download Full-text

Radar signal recognition based on triplet convolutional neural network

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00821-8 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Lutao Liu ◽

Xinyu Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Signal To Noise Ratio ◽

Poor Performance ◽

Radar Signal ◽

Time Frequency ◽

Low Snr ◽

Triplet Loss ◽

Recognition Success ◽

Fully Connected

AbstractRecently, due to the wide application of low probability of intercept (LPI) radar, lots of recognition approaches about LPI radar signal modulations have been proposed. However, facing the increasingly complex electromagnetic environment, most existing methods have poor performance to identify different modulation types in low signal-to-noise ratio (SNR). This paper proposes an automatic recognition method for different LPI radar signal modulations. Firstly, time-domain signals are converted to time-frequency images (TFIs) by smooth pseudo-Wigner–Ville distribution. Then, these TFIs are fed into a designed triplet convolutional neural network (TCNN) to obtain high-dimensional feature vectors. In essence, TCNN is a CNN network that triplet loss is adopted to optimize parameters of the network in the training process. The participation of triplet loss can ensure that the distance between samples in different classes is greater than that between samples with the same label, improving the discriminability of TCNN. Eventually, a fully connected neural network is employed as the classifier to recognize different modulation types. Simulation shows that the overall recognition success rate can achieve 94% at − 10 dB, which proves the proposed method has a strong discriminating capability for the recognition of different LPI radar signal modulations, even under low SNR.

Download Full-text