Predicting speech intelligibility based on the modulation spectrum and modulation frequency selectivity

Torsten Dau

doi:10.1121/1.5036245

Relations between frequency selectivity, temporal resolution, and speech intelligibility in sensorineural hearing‐impaired subjects

The Journal of the Acoustical Society of America ◽

10.1121/1.2026472 ◽

1988 ◽

Vol 84 (S1) ◽

pp. S76-S76

Author(s):

Kiyoshi Yonemoto ◽

Noriko Kurauchi ◽

Hareo Hamada ◽

Tanetoshi Miura

Keyword(s):

Temporal Resolution ◽

Speech Intelligibility ◽

Frequency Selectivity ◽

Hearing Impaired ◽

Sensorineural Hearing

Download Full-text

On the Role of Theta-Driven Syllabic Parsing in Decoding Speech: Intelligibility of Speech with a Manipulated Modulation Spectrum

Frontiers in Psychology ◽

10.3389/fpsyg.2012.00238 ◽

2012 ◽

Vol 3 ◽

Cited By ~ 72

Author(s):

Oded Ghitza

Keyword(s):

Speech Intelligibility ◽

Modulation Spectrum

Download Full-text

Temporal Modulations Reveal Distinct Rhythmic Properties of Speech and Music

10.1101/059683 ◽

2016 ◽

Cited By ~ 1

Author(s):

Nai Ding ◽

Aniruddh D. Patel ◽

Lin Chen ◽

Henry Butler ◽

Cheng Luo ◽

...

Keyword(s):

Rock Music ◽

Classical Music ◽

Modulation Frequency ◽

Sound Intensity ◽

Frequency Range ◽

Perceptual Analysis ◽

Modulation Spectrum ◽

Statistical Regularities ◽

Musical Rhythms ◽

Western Classical Music

AbstractSpeech and music have structured rhythms, but these rhythms are rarely compared empirically. This study, based on large corpora, quantitatively characterizes and compares a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32 Hz) temporal modulations in sound intensity. We show that the speech modulation spectrum is highly consistent cross 9 languages (including languages with typologically different rhythmic characteristics, such as English, French, and Mandarin Chinese). A different, but similarly consistent modulation spectrum is observed for Western classical music played by 6 different instruments. Western music, including classical music played by single instruments, symphonic, jazz, and rock music, contains more energy than speech in the low modulation frequency range below 4 Hz. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2 Hz, respectively. These differences in temporal modulations alone, without any spectral details, can discriminate speech and music with high accuracy. Speech and music therefore show distinct and reliable statistical regularities in their temporal modulations that likely facilitate their perceptual analysis and its neural foundations.

Download Full-text

Effects of temporal smearing on temporal resolution, frequency selectivity, and speech intelligibility

The Journal of the Acoustical Society of America ◽

10.1121/1.410279 ◽

1994 ◽

Vol 96 (3) ◽

pp. 1325-1340 ◽

Cited By ~ 7

Author(s):

Zezhang Hou ◽

Chaslav V. Pavlovic

Keyword(s):

Temporal Resolution ◽

Speech Intelligibility ◽

Frequency Selectivity

Download Full-text

Frequency Selectivity and Consonant Intelligibility in Sensorineural Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2802.197 ◽

1985 ◽

Vol 28 (2) ◽

pp. 197-206 ◽

Cited By ~ 14

Author(s):

Jill Preminger ◽

Terry L. Wiley

Keyword(s):

Hearing Loss ◽

Sensorineural Hearing Loss ◽

High Frequency ◽

Speech Intelligibility ◽

Low Frequency ◽

Frequency Selectivity ◽

Sensorineural Hearing ◽

Tuning Curves ◽

The Subject ◽

Subject Pair

The relations between frequency selectivity and consonant intelligibility were investigated in subjects with sensorineura] hearing loss in an attempt to derive predictive indices. Three matched pairs of subjects with similar audiometric configurations (high-frequency, fiat or low-frequency hearing loss) but significantly different word-intelligibility scores were tested. Characteristics of psychophysical tuning curves (PTCs) for high- and low-frequency probes were compared with speech-intelligibility performance for high- and low-frequency consonant-vowel syllables. Frequency-specific relations between PTC characteristics and consonant-intelligibility performance were observed in the subject pairs with high-frequency and fiat sensorineural hearing loss. Corresponding results for the subject pair with low-frequency sensorineural hearing loss were equivocal.

Download Full-text

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

Speech Communication ◽

10.1016/j.specom.2021.09.004 ◽

2021 ◽

Author(s):

Thuanvan Ngo ◽

Rieko Kubo ◽

Masato Akagi

Keyword(s):

Transfer Function ◽

Modulation Transfer Function ◽

Speech Intelligibility ◽

Modulation Transfer ◽

Modulation Spectrum

Download Full-text

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

The Journal of the Acoustical Society of America ◽

10.1121/1.3621502 ◽

2011 ◽

Vol 130 (3) ◽

pp. 1475-1487 ◽

Cited By ~ 99

Author(s):

Søren Jørgensen ◽

Torsten Dau

Keyword(s):

Speech Intelligibility ◽

Modulation Frequency ◽

Power Ratio ◽

Signal To Noise ◽

Selective Processing ◽

Frequency Selective

Download Full-text

Predicting speech intelligibility based on the envelope power signal‐to‐noise ratio after modulation‐frequency selective processing.

The Journal of the Acoustical Society of America ◽

10.1121/1.3587737 ◽

2011 ◽

Vol 129 (4) ◽

pp. 2384-2384 ◽

Cited By ~ 1

Author(s):

Torsten Dau ◽

So/ren Jo/rgensen

Keyword(s):

Speech Intelligibility ◽

Signal To Noise Ratio ◽

Modulation Frequency ◽

Signal To Noise ◽

Selective Processing ◽

Frequency Selective ◽

Noise Ratio

Download Full-text

Improved Modulation Spectrum Through Multi-Scale Modulation Frequency Decomposition

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. ◽

10.1109/icassp.2005.1416059 ◽

2006 ◽

Cited By ~ 1

Author(s):

Somsak Sukittanon ◽

Les E. Atlas ◽

James W. Pitton ◽

Karim Filali

Keyword(s):

Modulation Frequency ◽

Modulation Spectrum ◽

Multi Scale ◽

Frequency Decomposition

Download Full-text

Auditory Models of Suprathreshold Distortion and Speech Intelligibility in Persons with Impaired Hearing

Journal of the American Academy of Audiology ◽

10.3766/jaaa.24.4.6 ◽

2013 ◽

Vol 24 (04) ◽

pp. 307-328 ◽

Cited By ~ 18

Author(s):

Joshua G.W. Bernstein ◽

Van Summers ◽

Elena Grassi ◽

Ken W. Grant

Keyword(s):

Speech Recognition ◽

Auditory Processing ◽

Speech Intelligibility ◽

Prediction Accuracy ◽

Dynamic Range ◽

Recognition Performance ◽

Frequency Selectivity ◽

Temporal Acuity ◽

Dynamic Range Compression ◽

Speech Reception

Background: Hearing-impaired (HI) individuals with similar ages and audiograms often demonstrate substantial differences in speech-reception performance in noise. Traditional models of speech intelligibility focus primarily on average performance for a given audiogram, failing to account for differences between listeners with similar audiograms. Improved prediction accuracy might be achieved by simulating differences in the distortion that speech may undergo when processed through an impaired ear. Although some attempts to model particular suprathreshold distortions can explain general speech-reception deficits not accounted for by audibility limitations, little has been done to model suprathreshold distortion and predict speech-reception performance for individual HI listeners. Auditory-processing models incorporating individualized measures of auditory distortion, along with audiometric thresholds, could provide a more complete understanding of speech-reception deficits by HI individuals. A computational model capable of predicting individual differences in speech-recognition performance would be a valuable tool in the development and evaluation of hearing-aid signal-processing algorithms for enhancing speech intelligibility. Purpose: This study investigated whether biologically inspired models simulating peripheral auditory processing for individual HI listeners produce more accurate predictions of speech-recognition performance than audiogram-based models. Research Design: Psychophysical data on spectral and temporal acuity were incorporated into individualized auditory-processing models consisting of three stages: a peripheral stage, customized to reflect individual audiograms and spectral and temporal acuity; a cortical stage, which extracts spectral and temporal modulations relevant to speech; and an evaluation stage, which predicts speech-recognition performance by comparing the modulation content of clean and noisy speech. To investigate the impact of different aspects of peripheral processing on speech predictions, individualized details (absolute thresholds, frequency selectivity, spectrotemporal modulation [STM] sensitivity, compression) were incorporated progressively, culminating in a model simulating level-dependent spectral resolution and dynamic-range compression. Study Sample: Psychophysical and speech-reception data from 11 HI and six normal-hearing listeners were used to develop the models. Data Collection and Analysis: Eleven individualized HI models were constructed and validated against psychophysical measures of threshold, frequency resolution, compression, and STM sensitivity. Speech-intelligibility predictions were compared with measured performance in stationary speech-shaped noise at signal-to-noise ratios (SNRs) of −6, −3, 0, and 3 dB. Prediction accuracy for the individualized HI models was compared to the traditional audibility-based Speech Intelligibility Index (SII). Results: Models incorporating individualized measures of STM sensitivity yielded significantly more accurate within-SNR predictions than the SII. Additional individualized characteristics (frequency selectivity, compression) improved the predictions only marginally. A nonlinear model including individualized level-dependent cochlear-filter bandwidths, dynamic-range compression, and STM sensitivity predicted performance more accurately than the SII but was no more accurate than a simpler linear model. Predictions of speech-recognition performance simultaneously across SNRs and individuals were also significantly better for some of the auditory-processing models than for the SII. Conclusions: A computational model simulating individualized suprathreshold auditory-processing abilities produced more accurate speech-intelligibility predictions than the audibility-based SII. Most of this advantage was realized by a linear model incorporating audiometric and STM-sensitivity information. Although more consistent with known physiological aspects of auditory processing, modeling level-dependent changes in frequency selectivity and gain did not result in more accurate predictions of speech-reception performance.

Download Full-text