scholarly journals The principle of inverse effectiveness in audiovisual speech perception

2019 ◽  
Author(s):  
Luuk P.H. van de Rijt ◽  
Anja Roye ◽  
Emmanuel A.M. Mylanus ◽  
A. John van Opstal ◽  
Marc M. van Wanrooij

AbstractWe assessed how synchronous speech listening and lip reading affects speech recognition in acoustic noise. In simple audiovisual perceptual tasks, inverse effectiveness is often observed, which holds that the weaker the unimodal stimuli, or the poorer their signal-to-noise ratio, the stronger the audiovisual benefit. So far, however, inverse effectiveness has not been demonstrated for complex audiovisual speech stimuli. Here we assess whether this multisensory integration effect can also be observed for the recognizability of spoken words.To that end, we presented audiovisual sentences to 18 native-Dutch normal-hearing participants, who had to identify the spoken words from a finite list. Speech-recognition performance was determined for auditory-only, visual-only (lipreading) and auditory-visual conditions. To modulate acoustic task difficulty, we systematically varied the auditory signal-to-noise ratio. In line with a commonly-observed multisensory enhancement on speech recognition, audiovisual words were more easily recognized than auditory-only words (recognition thresholds of −15 dB and −12 dB, respectively).We here show that the difficulty of recognizing a particular word, either acoustically or visually, determines the occurrence of inverse effectiveness in audiovisual word integration. Thus, words that are better heard or recognized through lipreading, benefit less from bimodal presentation.Audiovisual performance at the lowest acoustic signal-to-noise ratios (45%) fell below the visual recognition rates (60%), reflecting an actual deterioration of lipreading in the presence of excessive acoustic noise. This suggests that the brain may adopt a strategy in which attention has to be divided between listening and lip reading.

2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


2020 ◽  
Vol 15 (2) ◽  
pp. 189-222
Author(s):  
Anne Pycha

Abstract Two experiments investigated how people perceived and remembered fragments of spoken words that either corresponded to correct lexical entries (as in the complex word drink-er) or did not (as in the simple word glitt-er). Experiment 1 was a noise-rating task that probed perception. Participants heard stimuli such drinker, where strikethrough indicates noise overlaid at a controlled signal-to-noise ratio, and rated the loudness of the noise. Results showed that participants rated noise on certain pseudo-roots (e.g., glitter) as louder than noise on true roots ( drinker), indicating that they perceived them with less clarity. Experiment 2 was an eye-fixation task that probed memory. Participants heard a word such as drink-er while associating each fragment with a visual shape. At test, they saw the shapes again, and were asked to look at the shape associated with a particular fragment, such as drink. Results showed that fixations to shapes associated with pseudo-affixes (-er in glitter) were less accurate than fixations to shapes associated with true affixes (-er in drinker), which suggests that they remembered the pseudo-affixes more poorly. These findings provide evidence that the presence of correct lexical entries for roots and affixes modulates people’s judgments about the speech that they hear.


2019 ◽  
Vol 28 (1) ◽  
pp. 101-113 ◽  
Author(s):  
Jenna M. Browning ◽  
Emily Buss ◽  
Mary Flaherty ◽  
Tim Vallier ◽  
Lori J. Leibold

Purpose The purpose of this study was to evaluate speech-in-noise and speech-in-speech recognition associated with activation of a fully adaptive directional hearing aid algorithm in children with mild to severe bilateral sensory/neural hearing loss. Method Fourteen children (5–14 years old) who are hard of hearing participated in this study. Participants wore laboratory hearing aids. Open-set word recognition thresholds were measured adaptively for 2 hearing aid settings: (a) omnidirectional (OMNI) and (b) fully adaptive directionality. Each hearing aid setting was evaluated in 3 listening conditions. Fourteen children with normal hearing served as age-matched controls. Results Children who are hard of hearing required a more advantageous signal-to-noise ratio than children with normal hearing to achieve comparable performance in all 3 conditions. For children who are hard of hearing, the average improvement in signal-to-noise ratio when comparing fully adaptive directionality to OMNI was 4.0 dB in noise, regardless of target location. Children performed similarly with fully adaptive directionality and OMNI settings in the presence of the speech maskers. Conclusions Compared to OMNI, fully adaptive directionality improved speech recognition in steady noise for children who are hard of hearing, even when they were not facing the target source. This algorithm did not affect speech recognition when the background noise was speech. Although the use of hearing aids with fully adaptive directionality is not proposed as a substitute for remote microphone systems, it appears to offer several advantages over fixed directionality, because it does not depend on children facing the target talker and provides access to multiple talkers within the environment. Additional experiments are required to further evaluate children's performance under a variety of spatial configurations in the presence of both noise and speech maskers.


2017 ◽  
Vol 28 (05) ◽  
pp. 404-414 ◽  
Author(s):  
Dorothy Neave-DiToro ◽  
Adrienne Rubinstein ◽  
Arlene C. Neuman

Background: Limited attention has been given to the effects of classroom acoustics at the college level. Many studies have reported that nonnative speakers of English are more likely to be affected by poor room acoustics than native speakers. An important question is how classroom acoustics affect speech perception of nonnative college students. Purpose: The combined effect of noise and reverberation on the speech recognition performance of college students who differ in age of English acquisition was evaluated under conditions simulating classrooms with reverberation times (RTs) close to ANSI recommended RTs. Research Design: A mixed design was used in this study. Study Sample: Thirty-six native and nonnative English-speaking college students with normal hearing, ages 18–28 yr, participated. Intervention: Two groups of nine native participants (native monolingual [NM] and native bilingual) and two groups of nine nonnative participants (nonnative early and nonnative late) were evaluated in noise under three reverberant conditions (0.03, 0.06, and 0.08 sec). Data Collection and Analysis: A virtual test paradigm was used, which represented a signal reaching a student at the back of a classroom. Speech recognition in noise was measured using the Bamford–Kowal–Bench Speech-in-Noise (BKB-SIN) test and signal-to-noise ratio required for correct repetition of 50% of the key words in the stimulus sentences (SNR-50) was obtained for each group in each reverberant condition. A mixed-design analysis of variance was used to determine statistical significance as a function of listener group and RT. Results: SNR-50 was significantly higher for nonnative listeners as compared to native listeners, and a more favorable SNR-50 was needed as RT increased. The most dramatic effect on SNR-50 was found in the group with later acquisition of English, whereas the impact of early introduction of a second language was subtler. At the ANSI standard’s maximum recommended RT (0.6 sec), all groups except the NM group exhibited a mild signal-to-noise ratio (SNR) loss. At the 0.8 sec RT, all groups exhibited a mild SNR loss. Conclusion: Acoustics in the classroom are an important consideration for nonnative speakers who are proficient in English and enrolled in college. To address the need for a clearer speech signal by nonnative students (and for all students), universities should follow ANSI recommendations, as well as minimize background noise in occupied classrooms. Behavioral/instructional strategies should be considered to address factors that cannot be compensated for through acoustic design.


2011 ◽  
Vol 22 (06) ◽  
pp. 375-386 ◽  
Author(s):  
Stella L. Ng ◽  
Christine N. Meston ◽  
Susan D. Scollie ◽  
Richard C. Seewald

Background: There is a need for objective pediatric hearing aid outcome measurement and thus a need for the evaluation of outcome measures. We explored a commercially available pediatric sentence-in-noise measure adapted for use as an aided outcome measure. Purpose: The purposes of the current study were (1) to administer an adapted BKB-SIN (Bamford-Kowal-Bench Speech-in-Noise test) to adults and children who have normal hearing and children who use hearing aids and (2) to evaluate the utility of this adapted BKB-SIN as an aided, within-subjects outcome measure for amplification strategies. Research Design: We used a mixed within and between groups design to evaluate speech recognition in noise for the three groups of participants. The children who use hearing aids were tested under the omnidirectional, directional, and digital noise reduction (DNR) conditions. Results from each group were compared to each other, and we compared results of each aided condition for the children who use hearing aids to evaluate the test utility as an aided outcome measure. Study Sample: The study sample consisted of 14 adults with normal hearing (aged 22–28 yr) and 15 children with normal hearing (aged 6–18 yr), recruited through word of mouth, and 14 children who use hearing aids (aged 9–16 yr) recruited from local audiology clinics. Data Collection and Analysis: List pairs of the BKB-SIN test were presented at 50 dB HL as follows: four list pairs to each participant with normal hearing, four list pairs in the omnidirectional condition, and two list pairs in the directional and DNR conditions. Children who use hearing aids were fitted bilaterally with laboratory devices and completed the BKB-SIN test aided. Data were plotted as mean percent of key words correct at each signal-to-noise ratio (SNR). Further, we conducted an analysis of variance for group differences and within-groups for the three aided conditions. Results: Adult participants outperformed children with normal hearing, who outperformed the children who use hearing aids. SNR-50 (signal-to-noise ratio at which listener can obtain a speech recognition score of 50% correct) scores demonstrated reliability of the adapted test implementation. The BKB-SIN test measured significant differences in performance for omnidirectional versus directional microphone conditions but not between omnidirectional and DNR conditions. Conclusions: We conclude that the adapted implementation of the BKB-SIN test can be administered reliably and feasibly. Further study is warranted to develop norms for the adapted implementation as well as to determine if an adapted implementation can be sensitive to age effects. Until such norms are developed, clinicians should refrain from comparing results from the adapted test to the test manual norms and should instead use the adapted implementation as a within-subject measure.


Author(s):  
Yu A Kropotov ◽  
A A Belov ◽  
A A Kolpakov ◽  
A Yu Proskuryakov

The paper investigates the effect of the signal-to-noise ratio on syllable intelligibility under the intense influence of external acoustic interference when exchanging voice messages in telecommunication systems of public address systems. The article discusses the effect on the syllable intelligibility of the signal / external acoustic noise ratio, examines the effect of the integral articulation index, the dependence of the perception coefficient of formants on the relative level of formant intensity, the dependence of the formant parameter on the geometric mean frequency of the i-th spectrum of the speech signal. In accordance with the results of studies of the integral articulation index depending on the signal-to-noise ratio, a function of syllable intelligibility depending on the signal-to-noise ratio was obtained, using which it is possible to determine the maximum value of the output signal-to-noise ratio in the audio exchange telecommunications system to obtain a given syllable intelligibility. At the same time, experimentally determined the value of the signal-to-noise ratio in the telecommunications system of audio exchange to obtain a syllable intelligibility of at least 93% for ensure full perception of the transmitted speech information.


2015 ◽  
Vol 26 (01) ◽  
pp. 051-058 ◽  
Author(s):  
Elizabeth R. Kolberg ◽  
Sterling W. Sheffield ◽  
Timothy J. Davis ◽  
Linsey W. Sunderhaus ◽  
René H. Gifford

Background: Despite improvements in cochlear implants (CIs), CI recipients continue to experience significant communicative difficulty in background noise. Many potential solutions have been proposed to help increase signal-to-noise ratio in noisy environments, including signal processing and external accessories. To date, however, the effect of microphone location on speech recognition in noise has focused primarily on hearing aid users. Purpose: The purpose of this study was to (1) measure physical output for the T-Mic as compared with the integrated behind-the-ear (BTE) processor mic for various source azimuths, and (2) to investigate the effect of CI processor mic location for speech recognition in semi-diffuse noise with speech originating from various source azimuths as encountered in everyday communicative environments. Research Design: A repeated-measures, within-participant design was used to compare performance across listening conditions. Study Sample: A total of 11 adults with Advanced Bionics CIs were recruited for this study. Data Collection and Analysis: Physical acoustic output was measured on a Knowles Experimental Mannequin for Acoustic Research (KEMAR) for the T-Mic and BTE mic, with broadband noise presented at 0 and 90° (directed toward the implant processor). In addition to physical acoustic measurements, we also assessed recognition of sentences constructed by researchers at Texas Instruments, the Massachusetts Institute of Technology, and the Stanford Research Institute (TIMIT sentences) at 60 dBA for speech source azimuths of 0, 90, and 270°. Sentences were presented in a semi-diffuse restaurant noise originating from the R-SPACE 8-loudspeaker array. Signal-to-noise ratio was determined individually to achieve approximately 50% correct in the unilateral implanted listening condition with speech at 0°. Performance was compared across the T-Mic, 50/50, and the integrated BTE processor mic. Results: The integrated BTE mic provided approximately 5 dB attenuation from 1500–4500 Hz for signals presented at 0° as compared with 90° (directed toward the processor). The T-Mic output was essentially equivalent for sources originating from 0 and 90°. Mic location also significantly affected sentence recognition as a function of source azimuth, with the T-Mic yielding the highest performance for speech originating from 0°. Conclusions: These results have clinical implications for (1) future implant processor design with respect to mic location, (2) mic settings for implant recipients, and (3) execution of advanced speech testing in the clinic.


Sign in / Sign up

Export Citation Format

Share Document