Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering

Sriram Ganapathy; Mohamed Omar

doi:10.1121/1.4896406

Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering

The Journal of the Acoustical Society of America ◽

10.1121/1.4896406 ◽

2014 ◽

Vol 136 (5) ◽

pp. EL343-EL349 ◽

Cited By ~ 2

Author(s):

Sriram Ganapathy ◽

Mohamed Omar

Keyword(s):

Temporal Modulation ◽

Noisy Speech ◽

Front End ◽

Modulation Filtering

Download Full-text

Mechanisms of spectrotemporal modulation detection for normal- and hearing-impaired listeners

10.1101/2020.01.03.894667 ◽

2020 ◽

Author(s):

Emmanuel Ponsot ◽

Léo Varnet ◽

Nicolas Wallaert ◽

Elza Daoud ◽

Shihab A. Shamma ◽

...

Keyword(s):

Auditory Processing ◽

Filter Bank ◽

Hearing Impaired ◽

Temporal Modulation ◽

Unified Framework ◽

Cochlear Hearing Loss ◽

Band Pass ◽

Modulation Filtering ◽

Modulation Filter ◽

Processing Architecture

AbstractSpectrotemporal modulations (STMs) offer a unified framework to probe suprathreshold auditory processing. Here, we introduce a novel methodological framework based on psychophysical reverse-correlation deployed in the modulation space to characterize how STMs are detected by the auditory system and how cochlear hearing loss impacts this processing. Our results show that young normal-hearing (NH) and older hearing-impaired (HI) individuals rely on a comparable non-linear processing architecture involving non-directional band-pass modulation filtering. We demonstrate that a temporal-modulation filter-bank model can capture the strategy of the NH group and that a broader tuning of cochlear filters is sufficient to explain the overall shift toward temporal modulations of the HI group. Yet, idiosyncratic behaviors exposed within each group highlight the contribution and the need to consider additional mechanisms. This integrated experimental-computational approach offers a principled way to assess supra-threshold auditory processing distortions of each individual.

Download Full-text

Blind estimation of reverberation time based on spectro-temporal modulation filtering

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6637686 ◽

2013 ◽

Cited By ~ 10

Author(s):

Feifei Xiong ◽

Stefan Goetze ◽

Bernd T. Meyer

Keyword(s):

Blind Estimation ◽

Reverberation Time ◽

Temporal Modulation ◽

Modulation Filtering

Download Full-text

A Front-End Technique for Automatic Noisy Speech Recognition

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) ◽

10.1109/o-cocosda50338.2020.9295006 ◽

2020 ◽

Author(s):

Hay Mar Soe Naing ◽

Risanuri Hidayat ◽

Rudy Hartanto ◽

Yoshikazu Miyanaga

Keyword(s):

Speech Recognition ◽

Noisy Speech ◽

Front End ◽

Noisy Speech Recognition

Download Full-text

Speech recognition front-end feature extraction for noisy speech

The Journal of the Acoustical Society of America ◽

10.1121/1.1697779 ◽

2004 ◽

Vol 115 (3) ◽

pp. 959

Author(s):

Yifan Gong

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Noisy Speech ◽

Front End

Download Full-text

Robust Front-End Processing For Emotion Recognition In Noisy Speech

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2018.8706699 ◽

2018 ◽

Cited By ~ 2

Author(s):

Meghna Pandharipande ◽

Rupayan Chakraborty ◽

Ashish Panda ◽

Sunil Kumar Kopparapu

Keyword(s):

Emotion Recognition ◽

Noisy Speech ◽

Front End

Download Full-text

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/518 ◽

2020 ◽

Author(s):

Hyeongju Kim ◽

Hyeonseung Lee ◽

Woo Hyun Kang ◽

Hyung Yong Kim ◽

Nam Soo Kim

Keyword(s):

Speech Recognition ◽

Density Estimation ◽

Speech Enhancement ◽

Automatic Speech Recognition ◽

Optimization Techniques ◽

Data Simulation ◽

Noisy Speech ◽

Front End ◽

Novel Approach ◽

Parallel Data

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.

Download Full-text