Speech recognition apparatus having multiple audio inputs to cancel background noise from input speech

2007 ◽  
Vol 122 (3) ◽  
pp. 1321
Author(s):  
Paul A. P. Kaufholz
2008 ◽  
Vol 18 (1) ◽  
pp. 19-24
Author(s):  
Erin C. Schafer

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.


Author(s):  
Lery Sakti Ramba

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.


Author(s):  
Poonam Bansal ◽  
Amita Dev ◽  
Shail Jain

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.


1987 ◽  
Vol 30 (3) ◽  
pp. 403-410 ◽  
Author(s):  
Larry E. Humes ◽  
Stephen Boney ◽  
Faith Loven

The present article further evaluates the accuracy of speech-recognition predictions made according to two forms of the Speech Transmission Index (STI) for normal-hearing listeners. The first portion of this article describes the application of the modified Speech Transmission Index (mSTI) to an extensive set of speech-recognition data. Performance of normal-hearing listeners on a nonsense-syllable recognition task in 216 conditions involving different speech levels, background noise levels, reverberation times and filter passbands was found to be monotonically related to the mSTI. The second portion of this article describes a retrospective and prospective analysis of an extended sound-field version of the STI, referred to here as STI x . This extended STI considers many of the variables relevant to sound-field speech recognition, some of which are not incorporated in the mSTI. These variables include: (a) reverberation time; (b) speech level; (e) noise level; (d) talker-to-listener distance; (e) directivity of the speech source; and (f) directivity of the listener (eg., monaural vs. binaural listening). For both the retrospective and prospective analyses, speech-recognition was found to vary monotonically with STI x .


1994 ◽  
Vol 37 (3) ◽  
pp. 655-661 ◽  
Author(s):  
Pamela E. Souza ◽  
Christopher W. Turner

This study examined the contributions of various properties of background noise to the speech recognition difficulties experienced by young and elderly listeners with hearing loss. Three groups of subjects participated: young listeners with normal hearing, young listeners with sensorineural hearing loss, and elderly listeners with sensorineural hearing loss. Sensitivity thresholds up to 4000 Hz of the young and elderly groups of listeners with hearing loss were closely matched, and a high-pass masking noise was added to minimize the contributions of high-frequency (above 4000 Hz) thresholds, which were not closely matched. Speech recognition scores for monosyllables were obtained in the high-pass noise alone and in three noise backgrounds. The latter consisted of high-pass noise plus one of three maskers: speechspectrum noise, speech-spectrum noise temporally modulated by the envelope of multi-talker babble, and multi-talker babble. For all conditions, the groups with hearing impairment consistently scored lower than the group with normal hearing. Although there was a trend toward poorer speech-recognition scores as the masker condition more closely resembled the speech babble, the effect of masker condition was not statistically significant. There was no interaction between group and condition, implying that listeners with normal hearing and listeners with hearing loss are affected similarly by the type of background noise when the long-term spectrum of the masker is held constant. A significant effect of age was not observed. In addition, masked thresholds for pure tones in the presence of the speech-spectrum masker were not different for the young and elderly listeners with hearing loss. These results suggest that, for both steady-state and modulated background noises, difficulties in speech recognition for monosyllables are due primarily, and perhaps exclusively, to the presence of sensorineural hearing loss itself, and not to age-specific factors.


Sign in / Sign up

Export Citation Format

Share Document