The contribution of temporal modulations to speech intelligibility: Some thoughts about the speech envelope

Ken W. Grant

doi:10.1121/1.4920290

Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope

Journal of the Association for Research in Otolaryngology ◽

10.1007/s10162-018-0654-z ◽

2018 ◽

Vol 19 (2) ◽

pp. 181-191 ◽

Cited By ~ 57

Author(s):

Jonas Vanthornhout ◽

Lien Decruy ◽

Jan Wouters ◽

Jonathan Z. Simon ◽

Tom Francart

Keyword(s):

Speech Intelligibility ◽

Neural Entrainment ◽

Speech Envelope

Download Full-text

Speech intelligibility predicted from neural entrainment of the speech envelope

10.1101/246660 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jonas Vanthornhout ◽

Lien Decruy ◽

Jan Wouters ◽

Jonathan Z. Simon ◽

Tom Francart

Keyword(s):

Language Processing ◽

Speech Signal ◽

Speech Intelligibility ◽

Neural Processing ◽

Behavioral Measures ◽

Speech Stimuli ◽

Neural Entrainment ◽

Auditory Prostheses ◽

Speech Envelope ◽

Processing Motivation

AbstractSpeech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation and memory. Very often electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.

Download Full-text

The effect of stimulus choice on an EEG-based objective measure of speech intelligibility

10.1101/421727 ◽

2018 ◽

Cited By ~ 1

Author(s):

Eline Verschueren ◽

Jonas Vanthornhout ◽

Tom Francart

Keyword(s):

Speech Processing ◽

Speech Intelligibility ◽

Objective Measure ◽

Natural Speech ◽

Temporal Characteristics ◽

Brain Responses ◽

Stimulus Choice ◽

The Matrix ◽

Speech Envelope ◽

The Brain

ABSTRACTObjectivesRecently an objective measure of speech intelligibility, based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of speech intelligibility can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications.DesignWe recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of speech intelligibility by adding speech weighted noise. Speech intelligibility was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas.ResultsFor both stimulus types the correlation between the speech envelope and the reconstructed envelope increased with increasing speech intelligibility. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing speech intelligibility for both stimuli. Remarkable is that although speech intelligibility remained unchanged in the no noise and +2.5 dB SNR condition, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 to 150 ms, while amplitudes between 150 to 200 ms increased in the presence of noise. TRF latency changes in function of speech intelligibility appeared to be stimulus specific: The latency of the prominent negative peak in the early responses (50-300 ms) increased with increasing speech intelligibility for the Matrix sentences, but remained unchanged for the natural story.ConclusionsThese results show (1) the feasibility of natural speech as a stimulus for the objective measure of speech intelligibility, (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.

Download Full-text

Two Stages of Speech Envelope Tracking in Human Auditory Cortex Modulated by Speech Intelligibility

10.1101/2021.12.11.472249 ◽

2021 ◽

Author(s):

Na Xu ◽

Baotian Zhao ◽

Lu Luo ◽

Kai Zhang ◽

Xiaoqiu Shao ◽

...

Keyword(s):

Auditory Cortex ◽

Speech Intelligibility ◽

Primary Auditory Cortex ◽

Acoustic Features ◽

Envelope Tracking ◽

Power Stage ◽

Human Auditory Cortex ◽

Speech Envelope ◽

Two Stages ◽

Vocoded Speech

The envelope is essential for speech perception. Recent studies have shown that cortical activity can track the acoustic envelope. However, whether the tracking strength reflects the extent of speech intelligibility processing remains controversial. Here, using stereo-electroencephalogram (sEEG) technology, we directly recorded the activity in human auditory cortex while subjects listened to either natural or noise-vocoded speech. These two stimuli have approximately identical envelopes, but the noise-vocoded speech does not have speech intelligibility. We found two stages of envelope tracking in auditory cortex: an early high-γ (60-140 Hz) power stage (delay ≈ 49 ms) that preferred the noise-vocoded speech, and a late θ (4-8 Hz) phase stage (delay ≈ 178 ms) that preferred the natural speech. Furthermore, the decoding performance of high-γ power was better in primary auditory cortex than in non-primary auditory cortex, consistent with its short tracking delay. We also found distinct lateralization effects: high-γ power envelope tracking dominated left auditory cortex, while θ phase showed better decoding performance in right auditory cortex. In sum, we suggested a functional dissociation between high-γ power and θ phase: the former reflects fast and automatic processing of brief acoustic features, while the latter correlates to slow build-up processing facilitated by speech intelligibility.

Download Full-text

Exaggerated Cortical Representation of Speech in Older Listeners: Mutual Information Analysis

10.1101/2019.12.18.881334 ◽

2019 ◽

Author(s):

Peng Zan ◽

Alessandro Presacco ◽

Samira Anderson ◽

Jonathan Z. Simon

Keyword(s):

Information Theory ◽

Mutual Information ◽

Speech Intelligibility ◽

Coarse Grained ◽

Speech Comprehension ◽

Late Component ◽

Age Related ◽

Cortical Responses ◽

Non Linear ◽

Speech Envelope

AbstractAging is associated with an exaggerated representation of the speech envelope in auditory cortex. The relationship between this age-related exaggerated response and a listener’s ability to understand speech in noise remains an open question. Here, information-theory-based analysis methods are applied to magnetoencephalography (MEG) recordings of human listeners, investigating their cortical responses to continuous speech, using the novel non-linear measure of phase-locked mutual information between the speech stimuli and cortical responses. The cortex of older listeners shows an exaggerated level of mutual information, compared to younger listeners, for both attended and unattended speakers. The mutual information peaks for several distinct latencies: early (∼50 ms), middle (∼100 ms) and late (∼200 ms). For the late component, the neural enhancement of attended over unattended speech is affected by stimulus SNR, but the direction of this dependency is reversed by aging. Critically, in older listeners and for the same late component, greater cortical exaggeration is correlated with decreased behavioral inhibitory control. This negative correlation also carries over to speech intelligibility in noise, where greater cortical exaggeration in older listeners is correlated with worse speech intelligibility scores. Finally, an age-related lateralization difference is also seen for the ∼100 ms latency peaks, where older listeners show a bilateral response compared to younger listeners’ right-lateralization. Thus, this information-theory-based analysis provides new, and less coarse-grained, results regarding age-related change in auditory cortical speech processing, and its correlation with cognitive measures, compared to related linear measures.New & NoteworthyCortical representations of natural speech are investigated using a novel non-linear approach based on mutual information. Cortical responses, phase-locked to the speech envelope, show an exaggerated level of mutual information associated with aging, appearing at several distinct latencies (∼50, ∼100 and ∼200 ms). Critically, for older listeners only, the ∼200 ms latency response components are correlated with specific behavioral measures, including behavioral inhibition and speech comprehension.

Download Full-text

Neural tracking of the speech envelope in cochlear implant users

10.1101/359299 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ben Somers ◽

Eline Verschueren ◽

Tom Francart

Keyword(s):

Electrical Stimulation ◽

Cochlear Implant ◽

Speech Intelligibility ◽

Recording Conditions ◽

Eeg Recordings ◽

Linear Envelope ◽

Envelope Reconstruction ◽

Speech Envelope ◽

First Time ◽

The Brain

AbstractObjectiveWhen listening to speech, the brain tracks the speech envelope. It is possible to reconstruct this envelope from EEG recordings. However, in people who hear using a cochlear implant (CI), the artifacts caused by electrical stimulation of the auditory nerve contaminate the EEG. This causes the decoder to produce an artifact-dominated reconstruction, which does not reflect the neural signal processing. The objective of this study is to develop and validate a method for assessing the neural tracking of speech envelope in CI users.ApproachTo obtain EEG recordings free of stimulus artifacts, the electrical stimulation is periodically in-terrupted. During these stimulation gaps, artifact-free EEG can be sampled and used to train a linear envelope decoder. Different recording conditions were used to characterize the artifacts and their influence on the envelope reconstruction.Main resultsThe present study demonstrates for the first time that neural tracking of the speech envelope can be measured in response to ongoing electrical stimulation. The responses were validated to be truly neural and not affected by stimulus artifact.SignificanceBesides applications in audiology and neuroscience, the characterization and elimination of stimulus artifacts will enable future EEG studies involving continuous speech in CI users. Measures of neural tracking of the speech envelope reflect interesting properties of the listener’s perception of speech, such as speech intelligibility or attentional state. Successful decoding of neural envelope tracking will open new possibilities to investigate the neural mechanisms of speech perception with a CI.

Download Full-text

Transcranial alternating current stimulation with speech envelopes modulates speech comprehension

10.1101/097576 ◽

2017 ◽

Author(s):

Anna Wilsch ◽

Toralf Neuling ◽

Jonas Obleser ◽

Christoph S. Herrmann

Keyword(s):

Auditory Cortex ◽

Speech Signal ◽

Speech Intelligibility ◽

Sentence Comprehension ◽

Average Frequency ◽

Speech Comprehension ◽

Time Lags ◽

Transcranial Alternating Current Stimulation ◽

Speech Envelope ◽

Cortical Entrainment

AbstractCortical entrainment of the auditory cortex to the broadband temporal envelope of a speech signal is crucial for speech comprehension. Entrainment results in phases of high and low neural excitability, which structure and decode the incoming speech signal. Entrainment to speech is strongest in the theta frequency range (4–8 Hz), the average frequency of the speech envelope. If a speech signal is degraded, entrainment to the speech envelope is weaker and speech intelligibility declines. Besides perceptually evoked cortical entrainment, transcranial alternating current stimulation (tACS) entrains neural oscillations by applying an electric signal to the brain. Accordingly, tACS-induced entrainment in auditory cortex has been shown to improve auditory perception. The aim of the current study was to modulate speech intelligibility externally by means of tACS such that the electric current corresponds to the envelope of the presented speech stream (i.e., envelope-tACS). Participants performed the Oldenburg sentence test with sentences presented in noise in combination with envelope-tACS. Critically, tACS was induced at time lags of 0 to 250 ms in 50-ms steps relative to sentence onset (auditory stimuli were simultaneous to or preceded tACS). We performed single-subject sinusoidal, linear, and quadratic fits to the sentence comprehension performance across the time lags. We could show that the sinusoidal fit described the modulation of sentence comprehension best. Importantly, the average frequency of the sinusoidal fit was 5.12 Hz, corresponding to the peaks of the amplitude spectrum of the stimulated envelopes. This finding was supported by a significant 5-Hz peak in the average power spectrum of individual performance time series. Altogether, envelope tACS modulates intelligibility of speech in noise, presumably by enhancing and disrupting (time lag with in-or out-of-phase stimulation, respectively) cortical entrainment to the speech envelope in auditory cortex.

Download Full-text

The Role of Multisensory Temporal Covariation in Audiovisual Speech Recognition in Noise

10.31234/osf.io/gcxwp ◽

2019 ◽

Author(s):

Jonathan Henry Venezia ◽

Robert Sandlin ◽

Leon Wojno ◽

Anthony Duc Tran ◽

Gregory Hickok ◽

...

Keyword(s):

Speech Recognition ◽

Speech Intelligibility ◽

Visual Speech ◽

Auditory Signal ◽

Motion Trajectories ◽

Speech Cues ◽

Audiovisual Speech Recognition ◽

Speech Envelope ◽

Speech Recognition In Noise

Static and dynamic visual speech cues contribute to audiovisual (AV) speech recognition in noise. Static cues (e.g., “lipreading”) provide complementary information that enables perceivers to ascertain ambiguous acoustic-phonetic content. The role of dynamic cues is less clear, but one suggestion is that temporal covariation between facial motion trajectories and the speech envelope enables perceivers to recover a more robust representation of the time-varying acoustic signal. Modeling studies show this is computationally feasible, though it has not been confirmed experimentally. We conducted two experiments to determine whether AV speech recognition depends on the magnitude of cross-sensory temporal coherence (AVC). In Experiment 1, sentence-keyword recognition in steady-state noise (SSN) was assessed across a range of signal-to-noise ratios (SNRs) for auditory and AV speech. The auditory signal was unprocessed or filtered to remove 3-7 Hz temporal modulations. Filtering severely reduced AVC (magnitude-squared coherence of lip trajectories with cochlear-narrowband speech envelopes), but did not reduce the magnitude of the AV advantage (AV > A; ~ 4 dB). This did not depend on the presence of static cues, manipulated via facial blurring. Experiment 2 assessed AV speech recognition in SSN at a fixed SNR (-10.5 dB) for subsets of Exp. 1 stimuli with naturally high or low AVC. A small effect (~ 5% correct; high-AVC > low-AVC) was observed. A computational model of AV speech intelligibility based on AVC yielded good overall predictions of performance, but over-predicted the differential effects of AVC. These results suggest the role and/or computational characterization of AVC must be re-conceptualized.

Download Full-text

A Family With Autosomal-Dominant Progressive Sensorineural Hearing Loss

American Journal of Audiology ◽

10.1044/1059-0889.0501.23 ◽

1996 ◽

Vol 5 (1) ◽

pp. 23-32 ◽

Cited By ~ 3

Author(s):

Chris Halpin ◽

Barbara Herrmann ◽

Margaret Whearty

Keyword(s):

Speech Production ◽

Hearing Aids ◽

Role Models ◽

Speech Intelligibility ◽

Large Scale ◽

Speech Language Pathology ◽

The Family ◽

Patient Will ◽

Language Pathology

The family described in this article provides an unusual opportunity to relate findings from genetic, histological, electrophysiological, psychophysical, and rehabilitative investigation. Although the total number evaluated is large (49), the known, living affected population is smaller (14), and these are spread from age 20 to age 59. As a result, the findings described above are those of a large-scale case study. Clearly, more data will be available through longitudinal study of the individuals documented in the course of this investigation but, given the slow nature of the progression in this disease, such studies will be undertaken after an interval of several years. The general picture presented to the audiologist who must rehabilitate these cases is that of a progressive cochlear degeneration that affects only thresholds at first, and then rapidly diminishes speech intelligibility. The expected result is that, after normal language development, the patient may accept hearing aids well, encouraged by the support of the family. Performance and satisfaction with the hearing aids is good, until the onset of the speech intelligibility loss, at which time the patient will encounter serious difficulties and may reject hearing aids as unhelpful. As the histological and electrophysiological results indicate, however, the eighth nerve remains viable, especially in the younger affected members, and success with cochlear implantation may be expected. Audiologic counseling efforts are aided by the presence of role models and support from the other affected members of the family. Speech-language pathology services were not considered important by the members of this family since their speech production developed normally and has remained very good. Self-correction of speech was supported by hearing aids and cochlear implants (Case 5’s speech production was documented in Perkell, Lane, Svirsky, & Webster, 1992). These patients received genetic counseling and, due to the high penetrance of the disease, exhibited serious concerns regarding future generations and the hope of a cure.

Download Full-text

Comparison of In-the-Ear and Over-the-Ear Hearing Aid Fittings

Journal of Speech and Hearing Disorders ◽

10.1044/jshd.5104.362 ◽

1986 ◽

Vol 51 (4) ◽

pp. 362-369 ◽

Cited By ~ 4

Author(s):

Donna M. Risberg ◽

Robyn M. Cox

Keyword(s):

Hearing Aids ◽

High Frequency ◽

Comparative Evaluation ◽

Speech Intelligibility ◽

Hearing Aid ◽

Hearing Aid Fitting ◽

The Difference ◽

Functional Gain ◽

The Relationship

A custom in-the-ear (ITE) hearing aid fitting was compared to two over-the-ear (OTE) hearing aid fittings for each of 9 subjects with mild to moderately severe hearing losses. Speech intelligibility via the three instruments was compared using the Speech Intelligibility Rating (SIR) test. The relationship between functional gain and coupler gain was compared for the ITE and the higher rated OTE instruments. The difference in input received at the microphone locations of the two types of hearing aids was measured for 10 different subjects and compared to the functional gain data. It was concluded that (a) for persons with mild to moderately severe hearing losses, appropriately adjusted custom ITE fittings typically yield speech intelligibility that is equal to the better OTE fitting identified in a comparative evaluation; and (b) gain prescriptions for ITE hearing aids should be adjusted to account for the high-frequency emphasis associated with in-the-concha microphone placement.

Download Full-text