Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Noise Suppression Methods for Robust Speech Processing

10.21236/ada100629 ◽

1981 ◽

Author(s):

Steven F. Boll ◽

James Kajiya ◽

James Youngberg ◽

Tracy L. Petersen ◽

H. Ravindra

Keyword(s):

Speech Processing ◽

Noise Suppression ◽

Robust Speech Processing

Download Full-text

Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/2017_jslhr-s-16-0269 ◽

2017 ◽

Vol 60 (9) ◽

pp. 2394-2405 ◽

Cited By ~ 6

Author(s):

Lionel Fontan ◽

Isabelle Ferrané ◽

Jérôme Farinas ◽

Julien Pinquier ◽

Julien Tardieu ◽

...

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Hearing Aids ◽

Speech Processing ◽

Fine Tuning ◽

Language Models ◽

Age Related ◽

Age Related Hearing Loss ◽

Asr System

Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and 1 comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL.

Download Full-text

Monaural multi-talker speech recognition using factorial speech processing models

Speech Communication ◽

10.1016/j.specom.2018.01.007 ◽

2018 ◽

Vol 98 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Mahdi Khademian ◽

Mohammad Mehdi Homayounpour

Keyword(s):

Speech Recognition ◽

Speech Processing

Download Full-text

Phoneme lattice construction and its application to speech recognition and keyword spotting

The Journal of the Acoustical Society of America ◽

10.1121/1.3525343 ◽

2010 ◽

Vol 128 (5) ◽

pp. 3276

Author(s):

Hagai Aronowitz

Keyword(s):

Speech Recognition ◽

Keyword Spotting ◽

Lattice Construction

Download Full-text

Special Section on Robust Speech Processing in Realistic Environments

IEICE Transactions on Information and Systems ◽

10.1093/ietisy/e91-d.3.391 ◽

2008 ◽

Vol E91-D (3) ◽

pp. 391-392

Author(s):

K. Takeda

Keyword(s):

Speech Processing ◽

Special Section ◽

Robust Speech Processing

Download Full-text

Applying neural network to robust keyword spotting in speech recognition application

Proceedings of ICNN'95 - International Conference on Neural Networks ◽

10.1109/icnn.1995.488192 ◽

2002 ◽

Cited By ~ 1

Author(s):

Hao Ruan ◽

Ravi Sankar

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Keyword Spotting

Download Full-text

Efficacy of a Cochlear Implant Simultaneous Analog Stimulation Strategy Coupled with a Monopolar Electrode Configuration

Annals of Otology Rhinology & Laryngology ◽

10.1177/000348940511401113 ◽

2005 ◽

Vol 114 (11) ◽

pp. 886-893 ◽

Cited By ~ 8

Author(s):

Li Xu ◽

Teresa A. Zwolan ◽

Catherine S. Thompson ◽

Bryan E. Pfingst

Keyword(s):

Speech Recognition ◽

Cochlear Implant ◽

Speech Processing ◽

Dynamic Range ◽

Recognition Performance ◽

Electrode Configuration ◽

Processing Strategy ◽

Open Set ◽

Hearing In Noise ◽

Noise Test

Objectives: The present study was performed to evaluate the efficacy and clinical feasibility of using monopolar stimulation with the Clarion Simultaneous Analog Stimulation (SAS) strategy in patients with cochlear implants. Methods: Speech recognition by 10 Clarion cochlear implant users was evaluated by means of 4 different speech processing strategy/electrode configuration combinations; ie, SAS and Continuous Interleaved Sampling (CIS) strategies were each used with monopolar (MP) and bipolar (BP) electrode configurations. The test measures included consonants, vowels, consonant-nucleus-consonant words, and Hearing in Noise Test sentences with a +10 dB signal-to-noise ratio. Additionally, subjective judgments of sound quality were obtained for each strategy/configuration combination. Results: All subjects but 1 demonstrated open-set speech recognition with the SAS/MP combination. The group mean Hearing in Noise Test sentence score for the SAS/MP combination was 31.6% (range, 0% to 92%) correct, as compared to 25.0%, 46.7%, and 37.8% correct for the CIS/BP, CIS/MP, and SAS/BP combinations, respectively. Intersubject variability was high, and there were no significant differences in mean speech recognition scores or mean preference ratings among the 4 strategy/configuration combinations tested. Individually, the best speech recognition performance was with the subject's everyday strategy/configuration combination in 72% of the applicable cases. If the everyday strategy was excluded from the analysis, the subjects performed best with the SAS/MP combination in 37.5% of the remaining cases. Conclusions: The SAS processing strategy with an MP electrode configuration gave reasonable speech recognition in most subjects, even though subjects had minimal previous experience with this strategy/configuration combination. The SAS/MP combination might be particularly appropriate for patients for whom a full dynamic range of electrical hearing could not be achieved with a BP configuration.

Download Full-text

Spatial position constraint for unsupervised learning of speech representations

PeerJ Computer Science ◽

10.7717/peerj-cs.650 ◽

2021 ◽

Vol 7 ◽

pp. e650

Author(s):

Mohammad Ali Humayun ◽

Hayati Yassin ◽

Pg Emeroylariffion Abas

Keyword(s):

Speech Processing ◽

Supervised Classification ◽

Representation Learning ◽

Keyword Spotting ◽

Learning Techniques ◽

Language Analysis ◽

Proposed Model ◽

Unlabelled Data ◽

Cepstral Features ◽

Classification Tasks

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.

Download Full-text

Robust speech processing in human auditory cortex

The Journal of the Acoustical Society of America ◽

10.1121/1.5035693 ◽

2018 ◽

Vol 143 (3) ◽

pp. 1744-1744

Author(s):

Nima Mesgarani

Keyword(s):

Auditory Cortex ◽

Speech Processing ◽

Human Auditory Cortex ◽

Robust Speech Processing

Download Full-text

An Ergonomic Framework for Researching and Designing Speech Recognition Technologies in Health Care with an Emphasis on Safety

Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care ◽

10.1177/2327857919081067 ◽

2019 ◽

Vol 8 (1) ◽

pp. 279-283

Author(s):

Tim Arnold ◽

Helen J. A. Fuller

Keyword(s):

Health Care ◽

Speech Recognition ◽

Human Factors ◽

Automatic Speech Recognition ◽

Speech Processing ◽

Clinical Work ◽

Error Tolerance ◽

Speech Interfaces ◽

Computer Based ◽

Processing Errors

Automatic speech recognition (ASR) systems and speech interfaces are becoming increasingly prevalent. This includes increases in and expansion of use of these technologies for supporting work in health care. Computer-based speech processing has been extensively studied and developed over decades. Speech processing tools have been fine-tuned through the work of Speech and Language Researchers. Researchers have previously and continue to describe speech processing errors in medicine. The discussion provided in this paper proposes an ergonomic framework for speech recognition to expand and further describe this view of speech processing in supporting clinical work. With this end in mind, we hope to build on previous work and emphasize the need for increased human factors involvement in this area while also facilitating the discussion of speech recognition in contexts that have been explored in the human factors domain. Human factors expertise can contribute through proactively describing and designing these critical interconnected socio-technical systems with error-tolerance in mind.

Download Full-text