Speech recognition apparatus having multiple audio inputs to cancel background noise from input speech

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.

Download Full-text

Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN)

Telekontran : Jurnal Ilmiah Telekomunikasi, Kendali dan Elektronika Terapan ◽

10.34010/telekontran.v8i1.3078 ◽

2020 ◽

Vol 8 (1) ◽

pp. 57-73

Author(s):

Lery Sakti Ramba

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Background Noise ◽

Electronic Devices ◽

Recognition System ◽

Background Intensity ◽

Automation System ◽

Home Automation ◽

Speech Recognition System ◽

Home Automation System

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.

Download Full-text

Deep bidirectional neural networks for robust speech recognition under heavy background noise

Materials Today Proceedings ◽

10.1016/j.matpr.2021.02.640 ◽

2021 ◽

Author(s):

Jeevan Reddy Koya ◽

S.P. Venu Madhava Rao

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Background Noise ◽

Robust Speech Recognition

Download Full-text

Variational model composition for robust speech recognition with time-varying background noise

10.21437/interspeech.2009-368 ◽

2009 ◽

Author(s):

Wooil Kim ◽

John H. L. Hansen

Keyword(s):

Speech Recognition ◽

Background Noise ◽

Variational Model ◽

Robust Speech Recognition ◽

Time Varying ◽

Model Composition

Download Full-text

Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients

Developments in Natural Intelligence Research and Knowledge Engineering ◽

10.4018/978-1-4666-1743-8.ch009 ◽

2012 ◽

pp. 126-134

Author(s):

Poonam Bansal ◽

Amita Dev ◽

Shail Jain

Keyword(s):

Speech Recognition ◽

Speech Signal ◽

Background Noise ◽

Extraction Method ◽

Recognition Performance ◽

Spectral Estimation ◽

Higher Order ◽

Feature Extraction Method ◽

Power Spectral ◽

Cepstral Coefficients

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.

Download Full-text

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

International Journal of Speech Technology ◽

10.1007/s10772-020-09671-5 ◽

2020 ◽

Vol 23 (1) ◽

pp. 149-167

Author(s):

G. Thimmaraja Yadava ◽

H. S. Jayanna

Keyword(s):

Speech Recognition ◽

Background Noise ◽

Recognition System ◽

Speech Recognition System ◽

Noise Elimination ◽

Acoustic Modelling

Download Full-text

Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise

Speech Communication ◽

10.1016/j.specom.2010.12.001 ◽

2011 ◽

Vol 53 (4) ◽

pp. 451-464 ◽

Cited By ~ 3

Author(s):

Wooil Kim ◽

John H.L. Hansen

Keyword(s):

Speech Recognition ◽

Background Noise ◽

Noise Model ◽

Robust Speech Recognition ◽

Time Varying ◽

Model Composition

Download Full-text

Further Validation of the Speech Transmission Index (STI)

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3003.403 ◽

1987 ◽

Vol 30 (3) ◽

pp. 403-410 ◽

Cited By ~ 6

Author(s):

Larry E. Humes ◽

Stephen Boney ◽

Faith Loven

Keyword(s):

Speech Recognition ◽

Background Noise ◽

Recognition Task ◽

Sound Field ◽

Normal Hearing ◽

Noise Levels ◽

Transmission Index ◽

Speech Transmission ◽

Binaural Listening ◽

Prospective Analyses

The present article further evaluates the accuracy of speech-recognition predictions made according to two forms of the Speech Transmission Index (STI) for normal-hearing listeners. The first portion of this article describes the application of the modified Speech Transmission Index (mSTI) to an extensive set of speech-recognition data. Performance of normal-hearing listeners on a nonsense-syllable recognition task in 216 conditions involving different speech levels, background noise levels, reverberation times and filter passbands was found to be monotonically related to the mSTI. The second portion of this article describes a retrospective and prospective analysis of an extended sound-field version of the STI, referred to here as STI x . This extended STI considers many of the variables relevant to sound-field speech recognition, some of which are not incorporated in the mSTI. These variables include: (a) reverberation time; (b) speech level; (e) noise level; (d) talker-to-listener distance; (e) directivity of the speech source; and (f) directivity of the listener (eg., monaural vs. binaural listening). For both the retrospective and prospective analyses, speech-recognition was found to vary monotonically with STI x .

Download Full-text

PCA-based Variational Model Composition Method for Roust Speech Recognition with Time-Varying Background Noise

The Journal of the Korean Institute of Information and Communication Engineering ◽

10.6109/jkiice.2013.17.12.2793 ◽

2013 ◽

Vol 17 (12) ◽

pp. 2793-2799

Author(s):

Wooil Kim

Keyword(s):

Speech Recognition ◽

Background Noise ◽

Variational Model ◽

Time Varying ◽

Model Composition ◽

Composition Method

Download Full-text

Masking of Speech in Young and Elderly Listeners With Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3703.655 ◽

1994 ◽

Vol 37 (3) ◽

pp. 655-661 ◽

Cited By ~ 40

Author(s):

Pamela E. Souza ◽

Christopher W. Turner

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Sensorineural Hearing Loss ◽

Background Noise ◽

Normal Hearing ◽

Sensorineural Hearing ◽

Masking Noise ◽

Speech Spectrum ◽

Specific Factors ◽

High Pass

This study examined the contributions of various properties of background noise to the speech recognition difficulties experienced by young and elderly listeners with hearing loss. Three groups of subjects participated: young listeners with normal hearing, young listeners with sensorineural hearing loss, and elderly listeners with sensorineural hearing loss. Sensitivity thresholds up to 4000 Hz of the young and elderly groups of listeners with hearing loss were closely matched, and a high-pass masking noise was added to minimize the contributions of high-frequency (above 4000 Hz) thresholds, which were not closely matched. Speech recognition scores for monosyllables were obtained in the high-pass noise alone and in three noise backgrounds. The latter consisted of high-pass noise plus one of three maskers: speechspectrum noise, speech-spectrum noise temporally modulated by the envelope of multi-talker babble, and multi-talker babble. For all conditions, the groups with hearing impairment consistently scored lower than the group with normal hearing. Although there was a trend toward poorer speech-recognition scores as the masker condition more closely resembled the speech babble, the effect of masker condition was not statistically significant. There was no interaction between group and condition, implying that listeners with normal hearing and listeners with hearing loss are affected similarly by the type of background noise when the long-term spectrum of the masker is held constant. A significant effect of age was not observed. In addition, masked thresholds for pure tones in the presence of the speech-spectrum masker were not different for the young and elderly listeners with hearing loss. These results suggest that, for both steady-state and modulated background noises, difficulties in speech recognition for monosyllables are due primarily, and perhaps exclusively, to the presence of sensorineural hearing loss itself, and not to age-specific factors.

Download Full-text