Fast Adaptation of Speech and Speaker Characteristics for Enhanced Speech Recognition in Adverse Intelligent Environments

Sparse coding over redundant dictionaries for fast adaptation of speech recognition system

Computer Speech & Language ◽

10.1016/j.csl.2016.10.004 ◽

2017 ◽

Vol 43 ◽

pp. 1-17 ◽

Cited By ~ 8

Author(s):

S. Shahnawazuddin ◽

Rohit Sinha

Keyword(s):

Speech Recognition ◽

Sparse Coding ◽

Recognition System ◽

Speech Recognition System ◽

Fast Adaptation ◽

Redundant Dictionaries

Download Full-text

Learning Fast Adaptation on Cross-Accented Speech Recognition

10.21437/interspeech.2020-45 ◽

2020 ◽

Author(s):

Genta Indra Winata ◽

Samuel Cahyawijaya ◽

Zihan Liu ◽

Zhaojiang Lin ◽

Andrea Madotto ◽

...

Keyword(s):

Speech Recognition ◽

Accented Speech ◽

Fast Adaptation

Download Full-text

Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices

Proceedings ◽

10.3390/proceedings2019031054 ◽

2019 ◽

Vol 31 (1) ◽

pp. 54

Author(s):

Benítez-Guijarro ◽

Callejas ◽

Noguera ◽

Benghazi

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Intelligent Environments ◽

Intelligent Environment ◽

Voice Signal ◽

Reliable Source ◽

Acoustic Quality ◽

Multiple Devices ◽

Single Input ◽

The Voice

Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.

Download Full-text

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2014.2346313 ◽

2014 ◽

Vol 22 (12) ◽

pp. 1713-1725 ◽

Cited By ~ 66

Author(s):

Shaofei Xue ◽

Ossama Abdel-Hamid ◽

Hui Jiang ◽

Lirong Dai ◽

Qingfeng Liu

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Fast Adaptation

Download Full-text

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

10.21437/interspeech.2021-1075 ◽

2021 ◽

Author(s):

Xun Gong ◽

Yizhou Lu ◽

Zhikai Zhou ◽

Yanmin Qian

Keyword(s):

Speech Recognition ◽

End To End ◽

Fast Adaptation

Download Full-text

Learning Fast Adaptation on Cross-Accented Speech Recognition

10.21437/interspeech.2020-0045 ◽

2020 ◽

Author(s):

Genta Indra Winata ◽

Samuel Cahyawijaya ◽

Zihan Liu ◽

Zhaojiang Lin ◽

Andrea Madotto ◽

...

Keyword(s):

Speech Recognition ◽

Accented Speech ◽

Fast Adaptation

Download Full-text

Measuring Mandarin Speech Recognition Thresholds Using the Method of Adaptive Tracking

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-h-18-0162 ◽

2019 ◽

Vol 62 (6) ◽

pp. 2009-2017

Author(s):

Yuxia Wang ◽

Zhaoyu Lu ◽

Xiaohu Yang ◽

Chang Liu

Keyword(s):

Speech Recognition ◽

Adaptive Tracking ◽

Mandarin Speech Recognition

Download Full-text

Selecting the Optimal FM System for Children With Cochlear Implants

Perspectives on Hearing and Hearing Disorders in Childhood ◽

10.1044/hhdc18.1.19 ◽

2008 ◽

Vol 18 (1) ◽

pp. 19-24

Author(s):

Erin C. Schafer

Keyword(s):

Speech Recognition ◽

Cochlear Implants ◽

Empirical Research ◽

Background Noise ◽

Signal To Noise Ratio ◽

Evidence Based ◽

Signal To Noise ◽

Speech Processor ◽

System Input ◽

Optimal Type

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.

Download Full-text