scholarly journals Temporal Auditory Coding Features for Causal Speech Enhancement

Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1698
Author(s):  
Iordanis Thoidis ◽  
Lazaros Vrysis ◽  
Dimitrios Markou ◽  
George Papanikolaou

Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals.

1987 ◽  
Vol 96 (1_suppl) ◽  
pp. 74-76
Author(s):  
J. R. Walliker ◽  
A. J. Fourcin

We have developed a family of single-channel signal-processing aids for the profoundly and totally deaf. Common to them all are the analysis of speech into the components most important to the deaf lipreader; the synthesis of stimuli which make the best use of the patient's sensory abilities; and facilities to ensure accurate matching of the aid to the patient. The totally deaf are electrically stimulated by electrodes on the promontory or on the round window of the cochlea using charge-balanced controlled current square waves automatically adjusted to be at a comfortable level. Many potential candidates for electrocochlear stimulation have significant low frequency residual hearing, but do not find conventional hearing aids to be useful. We have found that they can often make very effective use of the voice fundamental frequency presented as an acoustic sinusoid. Our approach to these patients avoids the need for implant surgery but preserves that option should total loss of hearing occur in the future. Both electrocochlear and acoustic methods of signal presentation are implemented with similar hardware. The speech signal from a microphone or other source is analyzed by a voice fundamental frequency extractor and a voiceless sound detector. Their outputs are processed by a single chip microcomputer that synthesizes the output waveform. In both devices the aid is tailored to the patient using a desktop computer that stores amplitude-frequency characteristics and frequency mapping tables into a read-only memory.


2012 ◽  
Vol 546-547 ◽  
pp. 675-679
Author(s):  
Gao Huan Xu ◽  
Jun Xiang Ye

Due to the different structure of the machine parts, machine vibrations sent audio signal have different frequency. The early defect, audio signal can be analyzed well by wavelet packet transform. After wavelet packet decomposition and reconstruction, Audio signal noise reduced. And then through high and low frequency decomposition, we can constitute the energy characteristics. The experiment shows: the extracted features have good structure.


2021 ◽  
Author(s):  
Maryam Hosseini ◽  
Luca Celotti ◽  
Eric Plourde

Single-channel speech enhancement algorithms have seen great improvements over the past few years. Despite these improvements, they still lack the efficiency of the auditory system in extracting attended auditory information in the presence of competing speakers. Recently, it has been shown that the attended auditory information can be decoded from the brain activity of the listener. In this paper, we propose two novel deep learning methods referred to as the Brain Enhanced Speech Denoiser (BESD) and the U-shaped Brain Enhanced Speech Denoiser (U-BESD) respectively, that take advantage of this fact to denoise a multi-talker speech mixture. We use a Feature-wise Linear Modulation (FiLM) between the brain activity and the sound mixture, to better extract the features of the attended speaker to perform speech enhancement. We show, using electroencephalography (EEG) signals recorded from the listener, that U-BESD outperforms a current autoencoder approach in enhancing a speech mixture as well as a speech separation approach that uses brain activity. Moreover, we show that both BESD and U-BESD successfully extract the attended speaker without any prior information about this speaker. This makes both algorithms great candidates for realistic applications where no prior information about the attended speaker is available, such as hearing aids, cellphones, or noise cancelling headphones. All procedures were performed in accordance with the Declaration of Helsinki and were approved by the Ethics Committees of the School of Psychology at Trinity College Dublin, and the Health Sciences Faculty at Trinity College Dublin.


2021 ◽  
Vol 150 (3) ◽  
pp. 1663-1673
Author(s):  
Nikhil Shankar ◽  
Gautam Shreedhar Bhat ◽  
Issa M. S. Panahi ◽  
Stephanie Tittle ◽  
Linda M. Thibodeau

2021 ◽  
Author(s):  
Maryam Hosseini ◽  
Luca Celotti ◽  
Eric Plourde

Single-channel speech enhancement algorithms have seen great improvements over the past few years. Despite these improvements, they still lack the efficiency of the auditory system in extracting attended auditory information in the presence of competing speakers. Recently, it has been shown that the attended auditory information can be decoded from the brain activity of the listener. In this paper, we propose two novel deep learning methods referred to as the Brain Enhanced Speech Denoiser (BESD) and the U-shaped Brain Enhanced Speech Denoiser (U-BESD) respectively, that take advantage of this fact to denoise a multi-talker speech mixture. We use a Feature-wise Linear Modulation (FiLM) between the brain activity and the sound mixture, to better extract the features of the attended speaker to perform speech enhancement. We show, using electroencephalography (EEG) signals recorded from the listener, that U-BESD outperforms a current autoencoder approach in enhancing a speech mixture as well as a speech separation approach that uses brain activity. Moreover, we show that both BESD and U-BESD successfully extract the attended speaker without any prior information about this speaker. This makes both algorithms great candidates for realistic applications where no prior information about the attended speaker is available, such as hearing aids, cellphones, or noise cancelling headphones. All procedures were performed in accordance with the Declaration of Helsinki and were approved by the Ethics Committees of the School of Psychology at Trinity College Dublin, and the Health Sciences Faculty at Trinity College Dublin.


Author(s):  
G. Y. Fan ◽  
J. M. Cowley

It is well known that the structure information on the specimen is not always faithfully transferred through the electron microscope. Firstly, the spatial frequency spectrum is modulated by the transfer function (TF) at the focal plane. Secondly, the spectrum suffers high frequency cut-off by the aperture (or effectively damping terms such as chromatic aberration). While these do not have essential effect on imaging crystal periodicity as long as the low order Bragg spots are inside the aperture, although the contrast may be reversed, they may change the appearance of images of amorphous materials completely. Because the spectrum of amorphous materials is continuous, modulation of it emphasizes some components while weakening others. Especially the cut-off of high frequency components, which contribute to amorphous image just as strongly as low frequency components can have a fundamental effect. This can be illustrated through computer simulation. Imaging of a whitenoise object with an electron microscope without TF limitation gives Fig. 1a, which is obtained by Fourier transformation of a constant amplitude combined with random phases generated by computer.


1971 ◽  
Vol 36 (4) ◽  
pp. 527-537 ◽  
Author(s):  
Norman P. Erber

Two types of special hearing aid have been developed recently to improve the reception of speech by profoundly deaf children. In a different way, each special system provides greater low-frequency acoustic stimulation to deaf ears than does a conventional hearing aid. One of the devices extends the low-frequency limit of amplification; the other shifts high-frequency energy to a lower frequency range. In general, previous evaluations of these special hearing aids have obtained inconsistent or inconclusive results. This paper reviews most of the published research on the use of special hearing aids by deaf children, summarizes several unpublished studies, and suggests a set of guidelines for future evaluations of special and conventional amplification systems.


Sign in / Sign up

Export Citation Format

Share Document