scholarly journals A Unified Speech Enhancement System Based on Neural Beamforming With Parabolic Reflector

2020 ◽  
Vol 10 (7) ◽  
pp. 2218
Author(s):  
Tao Zhang ◽  
Yanzhang Geng ◽  
Jianhong Sun ◽  
Chen Jiao ◽  
Biyun Ding

This paper presents a unified speech enhancement system to remove both background noise and interfering speech in serious noise environments by jointly utilizing the parabolic reflector model and neural beamformer. First, the amplification property of paraboloid is discussed, which significantly improves the Signal-to-Noise Ratio (SNR) of a desired signal. Therefore, an appropriate paraboloid channel is analyzed and designed through the boundary element method. On the other hand, a time-frequency masking approach and a mask-based beamforming approach are discussed and incorporated in an enhancement system. It is worth noticing that signals provided by the paraboloid and the beamformer are exactly complementary. Finally, these signals are employed in a learning-based fusion framework to further improve the system performance in low SNR environments. Experiments demonstrate that our system is effective and robust in five different noisy conditions (speech interfered with factory, pink, destroyer engine, volvo, and babble noise), as well as in different noise levels. Compared with the original noisy speech, significant average objective metrics improvements are about Δ STOI = 0.28, Δ PESQ = 1.31, Δ fwSegSNR = 11.9.

Author(s):  
Judith Justin ◽  
Vanithamani R.

In this chapter, a speech enhancement technique is implemented using a neuro-fuzzy classifier. Noisy speech sentences from NOIZEUS and AURORA databases are taken for the study. Feature extraction is implemented through modifications in amplitude magnitude spectrograms. A four class neuro-fuzzy classifier splits the noisy speech samples into noise-only part, signal only part, more noise-less signal part, and more signal-less noise part of the time-frequency units. Appropriate weights are applied in the enhancement phase. The enhanced speech sentence is evaluated using objective measures. An analysis of the performance of the Neuro-Fuzzy 4 (NF 4) classifier is done. A comparison of the performance of the classifier with other conventional techniques is done for various noises at different noise levels. It is observed that the numerical values of the measures obtained are better when compared to the others. An overall comparison of the performance of the NF 4 classifier is done and it is inferred that NF4 outperforms the other techniques in speech enhancement.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Ji Li ◽  
Huiqiang Zhang ◽  
Jianping Ou ◽  
Wei Wang

In the increasingly complex electromagnetic environment of modern battlefields, how to quickly and accurately identify radar signals is a hotspot in the field of electronic countermeasures. In this paper, USRP N210, USRP-LW N210, and other general software radio peripherals are used to simulate the transmitting and receiving process of radar signals, and a total of 8 radar signals, namely, Barker, Frank, chaotic, P1, P2, P3, P4, and OFDM, are produced. The signal obtains time-frequency images (TFIs) through the Choi–Williams distribution function (CWD). According to the characteristics of the radar signal TFI, a global feature balance extraction module (GFBE) is designed. Then, a new IIF-Net convolutional neural network with fewer network parameters and less computation cost has been proposed. The signal-to-noise ratio (SNR) range is −10 to 6 dB in the experiments. The experiments show that when the SNR is higher than −2 dB, the signal recognition rate of IIF-Net is as high as 99.74%, and the signal recognition accuracy is still 92.36% when the SNR is −10 dB. Compared with other methods, IIF-Net has higher recognition rate and better robustness under low SNR.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 3103 ◽  
Author(s):  
Xuebao Wang ◽  
Gaoming Huang ◽  
Zhiwen Zhou ◽  
Wei Tian ◽  
Jialun Yao ◽  
...  

To cope with the complex electromagnetic environment and varied signal styles, a novel method based on the energy cumulant of short time Fourier transform and reinforced deep belief network is proposed to gain a higher correct recognition rate for radar emitter intra-pulse signals at a low signal-to-noise ratio. The energy cumulant of short time Fourier transform is attained by calculating the accumulations of each frequency sample value with the different time samples. Before this procedure, the time frequency distribution via short time Fourier transform is processed by base noise reduction. The reinforced deep belief network is proposed to employ the input feature vectors for training to achieve the radar emitter recognition and classification. Simulation results manifest that the proposed method is feasible and robust in radar emitter recognition even at a low SNR.


Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1419 ◽  
Author(s):  
Zhiyuan Ma ◽  
Zhi Huang ◽  
Anni Lin ◽  
Guangming Huang

Emitter signal waveform recognition and classification are necessary survival techniques in electronic warfare systems. The emitters use various techniques for power management and complex intra-pulse modulations, which can create what looks like a noisy signal to an intercept receiver, so emitter signal waveform recognition at a low signal-to-noise ratio (SNR) has gained increased attention. In this study, we propose an autocorrelation feature image construction technique (ACFICT) combined with a convolutional neural network (CNN) to maintain the unique feature of each signal, and a structure optimization for CNN input layer called hybrid model is designed to achieve image enhancement of the signal autocorrelation, which is different from using a single image combined with CNN to complete classification. We demonstrate the performance of ACFICT by comparing feature images generated by different signal pre-processing algorithms, and the evaluation indicators are signal recognition rate, image stability degree, and image restoration degree. This paper simulates six types of the signals by combining ACFICT with three types of hybrid model, the simulation results compared with the literature show that the proposed methods not only has a high universality, but also better adapts to waveform recognition at low SNR environment. When the SNR is –6 dB, the overall recognition rate of the method reaches 88%.


2011 ◽  
Vol 18 (3-4) ◽  
pp. 293-311
Author(s):  
Maarten P.M. Luykx ◽  
Martijn L.S. Vercammen

There is a certain tendency in the design of theatres to make the halls quite large. From a perspective of natural speech intelligibility and strength of speech this is disadvantageous, because an actor's voice has a certain, limited loudness and consequently the signal-to-noise ratio at the listener may become too low. Based on the influence of signal/noise ratio on speech intelligibility, it is deduced that the strength G ≥ 6 dB and room volumes have to be limited to 4000–4500 m3 in order to maintain sufficient loudness for natural speech. Sound level measurements during performances with natural speech in a theatre have been performed, to determine background noise levels in the hall due to the audience and to investigate the signal-to-noise ratio of the actors voice at the audience. The background levels are mainly determined by installation noise and not by the influence of the audience.


2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


2022 ◽  
Vol 26 ◽  
pp. 233121652110686
Author(s):  
Tim Green ◽  
Gaston Hilkhuysen ◽  
Mark Huckvale ◽  
Stuart Rosen ◽  
Mike Brookes ◽  
...  

A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.


Author(s):  
Lutao Liu ◽  
Xinyu Li

AbstractRecently, due to the wide application of low probability of intercept (LPI) radar, lots of recognition approaches about LPI radar signal modulations have been proposed. However, facing the increasingly complex electromagnetic environment, most existing methods have poor performance to identify different modulation types in low signal-to-noise ratio (SNR). This paper proposes an automatic recognition method for different LPI radar signal modulations. Firstly, time-domain signals are converted to time-frequency images (TFIs) by smooth pseudo-Wigner–Ville distribution. Then, these TFIs are fed into a designed triplet convolutional neural network (TCNN) to obtain high-dimensional feature vectors. In essence, TCNN is a CNN network that triplet loss is adopted to optimize parameters of the network in the training process. The participation of triplet loss can ensure that the distance between samples in different classes is greater than that between samples with the same label, improving the discriminability of TCNN. Eventually, a fully connected neural network is employed as the classifier to recognize different modulation types. Simulation shows that the overall recognition success rate can achieve 94% at − 10 dB, which proves the proposed method has a strong discriminating capability for the recognition of different LPI radar signal modulations, even under low SNR.


Sign in / Sign up

Export Citation Format

Share Document