Auditory Peripheral Processing of Degraded Speech

2003 ◽  
Author(s):  
Oded Ghitza
2013 ◽  
Vol 44 (01) ◽  
Author(s):  
T Golombek ◽  
G Hartwigsen ◽  
J Obleser
Keyword(s):  

2002 ◽  
Vol 87 (4) ◽  
pp. 1723-1737 ◽  
Author(s):  
Srikantan S. Nagarajan ◽  
Steven W. Cheung ◽  
Purvis Bedenbaugh ◽  
Ralph E. Beitel ◽  
Christoph E. Schreiner ◽  
...  

Cortical sensitivity in representations of behaviorally relevant complex input signals was examined in recordings from primary auditory cortical neurons (AI) in adult, barbiturate-anesthetized common marmoset monkeys ( Callithrix jacchus). We studied the robustness of distributed responses to natural and degraded forms of twitter calls, social contact vocalizations comprising several quasi-periodic phrases of frequency and AM. We recorded neuronal responses to a monkey's own twitter call (MOC), degraded forms of their twitter call, and sinusoidal amplitude modulated (SAM) tones with modulation rates similar to those of twitter calls. In spectral envelope degradation, calls with narrowband channels of varying bandwidths had the same temporal envelope as a natural call. However, the carrier phase was randomized within each narrowband channel. In temporal envelope degradation, the temporal envelope within narrowband channels was filtered while the carrier frequencies and phases remained unchanged. In a third form of degradation, noise was added to the natural calls. Spatiotemporal discharge patterns in AI both within and across frequency bands encoded spectrotemporal acoustic features in the call although the encoded response is an abstract version of the call. The average temporal response pattern in AI, however, was significantly correlated with the average temporal envelope for each phrase of a call. Response entrainment to MOC was significantly correlated with entrainment to SAM stimuli at comparable modulation frequencies. Sensitivity of the response patterns to MOC was substantially greater for temporal envelope than for spectral envelope degradations. The distributed responses in AI were robust to additive continuous noise at signal-to-noise ratios ≥10 dB. Neurophysiological data reflecting response sensitivity in AI to these forms of degradation closely parallel human psychophysical results on the intelligibility of degraded speech in quiet and noisy conditions.


2004 ◽  
Vol 59 (5) ◽  
pp. P210-P219 ◽  
Author(s):  
P. A. Allen ◽  
M. D. Murphy ◽  
M. Kaufman ◽  
K. E. Groth ◽  
A. Begovic

2012 ◽  
Vol 35 (1) ◽  
pp. 61-74 ◽  
Author(s):  
Mareike Clos ◽  
Robert Langner ◽  
Martin Meyer ◽  
Mathias S. Oechslin ◽  
Karl Zilles ◽  
...  

2014 ◽  
Vol 40 (1) ◽  
pp. 186-199 ◽  
Author(s):  
Ediz Sohoglu ◽  
Jonathan E. Peelle ◽  
Robert P. Carlyon ◽  
Matthew H. Davis

2012 ◽  
Author(s):  
Dushyant Sharma ◽  
Gaston Hilkhuysen ◽  
Patrick A. Naylor ◽  
Nikolay D. Gaubitch ◽  
Mark Huckvale ◽  
...  

2002 ◽  
Vol 45 (4) ◽  
pp. 689-699 ◽  
Author(s):  
Donald G. Jamieson ◽  
Vijay Parsa ◽  
Moneca C. Price ◽  
James Till

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.


2018 ◽  
Vol 40 (1) ◽  
pp. 93-109
Author(s):  
YI ZHENG ◽  
ARTHUR G. SAMUEL

AbstractIt has been documented that lipreading facilitates the understanding of difficult speech, such as noisy speech and time-compressed speech. However, relatively little work has addressed the role of visual information in perceiving accented speech, another type of difficult speech. In this study, we specifically focus on accented word recognition. One hundred forty-two native English speakers made lexical decision judgments on English words or nonwords produced by speakers with Mandarin Chinese accents. The stimuli were presented as either as videos that were of a relatively far speaker or as videos in which we zoomed in on the speaker’s head. Consistent with studies of degraded speech, listeners were more accurate at recognizing accented words when they saw lip movements from the closer apparent distance. The effect of apparent distance tended to be larger under nonoptimal conditions: when stimuli were nonwords than words, and when stimuli were produced by a speaker who had a relatively strong accent. However, we did not find any influence of listeners’ prior experience with Chinese accented speech, suggesting that cross-talker generalization is limited. The current study provides practical suggestions for effective communication between native and nonnative speakers: visual information is useful, and it is more useful in some circumstances than others.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kendra Gimhani Kandana Arachchige ◽  
Wivine Blekic ◽  
Isabelle Simoes Loureiro ◽  
Laurent Lefebvre

Numerous studies have explored the benefit of iconic gestures in speech comprehension. However, only few studies have investigated how visual attention was allocated to these gestures in the context of clear versus degraded speech and the way information is extracted for enhancing comprehension. This study aimed to explore the effect of iconic gestures on comprehension and whether fixating the gesture is required for information extraction. Four types of gestures (i.e., semantically and syntactically incongruent iconic gestures, meaningless configurations, and congruent iconic gestures) were presented in a sentence context in three different listening conditions (i.e., clear, partly degraded or fully degraded speech). Using eye tracking technology, participants’ gaze was recorded, while they watched video clips after which they were invited to answer simple comprehension questions. Results first showed that different types of gestures differently attract attention and that the more speech was degraded, the less participants would pay attention to gestures. Furthermore, semantically incongruent gestures appeared to particularly impair comprehension although not being fixated while congruent gestures appeared to improve comprehension despite also not being fixated. These results suggest that covert attention is sufficient to convey information that will be processed by the listener.


Sign in / Sign up

Export Citation Format

Share Document