A theoretical model of cochlear processing improves spectrally degraded speech perception

Erinevate häälikute laad meie igapäevases kõnes varieerub tugevalt, kuid häälduse varieeruvus ei ole enamasti kõneeristusele takistuseks. See annab alust oletada, et kõnetaju on välja arendanud süsteemi, millega tuvastada foneeme väga suure varieeruvusega kõnesignaalist. See süsteem tegeleb kõne varieeruvusega nii tõhusalt ja kiiresti, et me ei ole sellest enamasti teadlikud. Seda süsteemi võiks nimetada kategoriaalseks tajuks (ingl Categorical Perception), kuid kuna taju on uurimisele üksnes kaudselt kättesaadav, siis tähistab see termin pigem eksperimentaalset mudelit või meetodit, millega uuritakse taju võimet foneeme kõnesignaalist eristada. (Schouten jt 2003) Käesolevas artiklis arutatakse kategoriaalse taju kui mudeli ja katsemeetodi üle, mille teoreetilised lähtekohad on olnud nii muudes keeltes kui eesti keeles läbi viidud tajukatsete ülesehituse ja järelduste eeldusteks.Categorical perception or the hypothesis of how we perceive linguistic units. The acoustic signal of everyday speech is very variable, but it seldom distracts the normal speech communication. This motivates the hypothesis that the speech perception must have developed a special mechanism for extracting phonemes from highly variable speech signal. This mechanism extracts phonemes so efficiently and quickly that we are often unaware of it. We would like to call this mechanism “categorical perception of speech”, but since the perceptual processes are only indirectly accessible for investigation, the term refers rather to a theoretical model or an experimental method for investigating our perceptual ability to distinguish phonemes from the speech signal so efficiently (Schouten et al. 2003). In this paper the Categorical Perception as an experimental method and its theoretical statements will be discussed in connection to perception experiments and findings in other languages as well as in Estonian language.

Download Full-text

Speech Perception in Noise With Formant Enhancement for Older Listeners

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-s-18-0089 ◽

2019 ◽

Vol 62 (9) ◽

pp. 3290-3301

Author(s):

Jingjing Guan ◽

Chang Liu

Keyword(s):

Speech Perception ◽

Background Noise ◽

Speech Intelligibility ◽

Speech Signals ◽

Speech Perception In Noise ◽

Degraded Speech ◽

Common Complaint ◽

Hearing Thresholds ◽

Speech Intelligibility Index ◽

Speech Information

Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.

Download Full-text

Individuals With Congenital Amusia Show Degraded Speech Perception but Preserved Statistical Learning for Tone Languages

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-21-00383 ◽

2021 ◽

pp. 1-17

Author(s):

Jiaqiang Zhu ◽

Xiaoxiang Chen ◽

Fei Chen ◽

Seth Wiener

Keyword(s):

Speech Perception ◽

Statistical Learning ◽

High Frequency ◽

Correct Word ◽

Token Frequency ◽

Congenital Amusia ◽

Rehabilitation Programs ◽

Degraded Speech ◽

Statistical Regularities ◽

Gating Paradigm

Purpose: Individuals with congenital amusia exhibit degraded speech perception. This study examined whether adult Chinese Mandarin listeners with amusia were still able to extract the statistical regularities of Mandarin speech sounds, despite their degraded speech perception. Method: Using the gating paradigm with monosyllabic syllable–tone words, we tested 19 Mandarin-speaking amusics and 19 musically intact controls. Listeners heard increasingly longer fragments of the acoustic signal across eight duration-blocked gates. The stimuli varied in syllable token frequency and syllable–tone co-occurrence probability. The correct syllable–tone word, correct syllable-only, correct tone-only, and correct syllable–incorrect tone responses were compared respectively between the two groups using mixed-effects models. Results: Amusics were less accurate than controls in terms of the correct word, correct syllable-only, and correct tone-only responses. Amusics, however, showed consistent patterns of top-down processing, as indicated by more accurate responses to high-frequency syllables, high-probability tones, and tone errors all in manners similar to those of the control listeners. Conclusions: Amusics are able to learn syllable and tone statistical regularities from the language input. This extends previous work by showing that amusics can track phonological segment and pitch cues despite their degraded speech perception. The observed speech deficits in amusics are therefore not due to an abnormal statistical learning mechanism. These results support rehabilitation programs aimed at improving amusics' sensitivity to pitch.

Download Full-text

Genetic hearing impairment affects cochlear processing and, consequently, speech recognition in noise

10.1101/2020.01.03.19015826 ◽

2020 ◽

Author(s):

Cris Lanting ◽

Ad Snik ◽

Joop Leijendeckers ◽

Arjan Bosman ◽

Ronald Pennings

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Speech Perception ◽

Animal Studies ◽

Genetic Defects ◽

Speech Perception In Noise ◽

Cochlear Processing ◽

Genetic Hearing Loss ◽

Patient Groups ◽

Speech Recognition In Noise

AbstractThe relation between speech recognition and hereditary hearing loss is not straightforward. Impaired cochlear processing of sound might be determined by underlying genetic defects. Data obtained in nine groups of patients with a specific type of genetic hearing loss were evaluated. For each group, the affected cochlear structure, or site-of-lesion, was determined based on previously published animal studies. Retrospectively obtained speech recognition scores in noise were related to several aspects of supra-threshold cochlear processing, as assessed by psychophysical measurements. The differences in speech perception in noise between these patient groups could be explained by these factors, and partially by the hypothesized affected structure of the cochlea, suggesting that speech recognition in noise was associated with genetics-related malfunctioning of the cochlea.

Download Full-text

Rapid computations of spectrotemporal prediction error support perception of degraded speech

eLife ◽

10.7554/elife.58077 ◽

2020 ◽

Vol 9 ◽

Author(s):

Ediz Sohoglu ◽

Matthew H Davis

Keyword(s):

Speech Perception ◽

Prediction Error ◽

Predictive Coding ◽

Signal Quality ◽

Neural Responses ◽

Perceptual Inference ◽

Written Text ◽

Degraded Speech ◽

Neural Representations ◽

Brain Responses

Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.

Download Full-text

Three Functions of Prediction Error for Bayesian Inference in Speech Perception

10.31234/osf.io/qc4u6 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew H. Davis ◽

Ediz Sohoglu

Keyword(s):

Bayesian Inference ◽

Speech Perception ◽

Prediction Error ◽

Word Identification ◽

Predictive Coding ◽

Superior Temporal Gyrus ◽

New Words ◽

Degraded Speech ◽

Neural Computations ◽

The Difference

Spoken language is one of the most important sounds that humans hear, yet, also one of the most difficult sounds for non-human listeners or machines to identify. In this chapter we explore different neuro-computational implementations of Bayesian Inference for Speech Perception. We propose, in line with Predictive Coding (PC) principles, that Bayesian Inference is based on neural computations of the difference between heard and expected speech segments (Prediction Error). We will review three functions of these Prediction Error representations: (1) in combining prior knowledge and degraded speech for optimal word identification, (2) supporting rapid learning processes so that perception remains optimal despite perceptual degradation or variation, (3) ensuring that listeners detect instances of lexical novelty (previously unfamiliar words) so as to learn new words over the life span. Evidence from MEG and multivariate fMRI studies suggestion computations of Prediction Error in the Superior Temporal Gyrus (STG) during these three processes.

Download Full-text

Reliability of individual differences in degraded speech perception

The Journal of the Acoustical Society of America ◽

10.1121/1.5010148 ◽

2017 ◽

Vol 142 (5) ◽

pp. EL461-EL466 ◽

Cited By ~ 2

Author(s):

Kathy M. Carbonell

Keyword(s):

Individual Differences ◽

Speech Perception ◽

Degraded Speech

Download Full-text

Recruitment of fusiform face area associated with listening to degraded speech sounds in auditory–visual speech perception: a PET study

Neuroscience Letters ◽

10.1016/j.neulet.2005.03.050 ◽

2005 ◽

Vol 382 (3) ◽

pp. 254-258 ◽

Cited By ~ 14

Author(s):

Tetsuaki Kawase ◽

Keiichiro Yamaguchi ◽

Takenori Ogawa ◽

Ken-ichi Suzuki ◽

Maki Suzuki ◽

...

Keyword(s):

Speech Perception ◽

Visual Speech ◽

Speech Sounds ◽

Fusiform Face Area ◽

Degraded Speech ◽

Face Area ◽

Visual Speech Perception

Download Full-text

How Early Auditory Experience Affects Children’s Ability to Learn Spoken Words

The Oxford Handbook of Deaf Studies in Learning and Cognition ◽

10.1093/oxfordhb/9780190054045.013.16 ◽

2020 ◽

pp. 122-137

Author(s):

Derek M. Houston ◽

Chi-hsin Chen ◽

Claire Monroy ◽

Irina Castellanos

Keyword(s):

Speech Perception ◽

Hard Of Hearing ◽

Sensory Integration ◽

Degraded Speech ◽

Spoken Words ◽

Auditory Experience ◽

Sound Patterns ◽

Hearing Children ◽

Impaired Speech

It is generally assumed that deaf and hard-of-hearing children’s difficulties in learning novel words stem entirely from impaired speech perception. Degraded speech perception makes words more confusable, and correctly recognizing words clearly plays an important role in word learning. However, recent findings suggest that early auditory experience may affect other factors involved in linking the sound patterns of words to their referents. This chapter reviews those findings and discusses possible factors that may be affected by early auditory experience and, in turn, also affect the ability to learn word-referent associations. These factors include forming representations for the sound patterns of words, encoding phonological information into memory, sensory integration, and quality of language input. Overall, we learn that in order to understand and to help mitigate the difficulties deaf and hard-of-hearing children face in learning spoken words after cochlear implantation, we must look well beyond speech perception.

Download Full-text