Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

Author(s):  
John H. Hansen
1981 ◽  
Author(s):  
Steven F. Boll ◽  
James Kajiya ◽  
James Youngberg ◽  
Tracy L. Petersen ◽  
H. Ravindra

2017 ◽  
Vol 60 (9) ◽  
pp. 2394-2405 ◽  
Author(s):  
Lionel Fontan ◽  
Isabelle Ferrané ◽  
Jérôme Farinas ◽  
Julien Pinquier ◽  
Julien Tardieu ◽  
...  

Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and 1 comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL.


2018 ◽  
Vol 98 ◽  
pp. 1-16 ◽  
Author(s):  
Mahdi Khademian ◽  
Mohammad Mehdi Homayounpour

2005 ◽  
Vol 114 (11) ◽  
pp. 886-893 ◽  
Author(s):  
Li Xu ◽  
Teresa A. Zwolan ◽  
Catherine S. Thompson ◽  
Bryan E. Pfingst

Objectives: The present study was performed to evaluate the efficacy and clinical feasibility of using monopolar stimulation with the Clarion Simultaneous Analog Stimulation (SAS) strategy in patients with cochlear implants. Methods: Speech recognition by 10 Clarion cochlear implant users was evaluated by means of 4 different speech processing strategy/electrode configuration combinations; ie, SAS and Continuous Interleaved Sampling (CIS) strategies were each used with monopolar (MP) and bipolar (BP) electrode configurations. The test measures included consonants, vowels, consonant-nucleus-consonant words, and Hearing in Noise Test sentences with a +10 dB signal-to-noise ratio. Additionally, subjective judgments of sound quality were obtained for each strategy/configuration combination. Results: All subjects but 1 demonstrated open-set speech recognition with the SAS/MP combination. The group mean Hearing in Noise Test sentence score for the SAS/MP combination was 31.6% (range, 0% to 92%) correct, as compared to 25.0%, 46.7%, and 37.8% correct for the CIS/BP, CIS/MP, and SAS/BP combinations, respectively. Intersubject variability was high, and there were no significant differences in mean speech recognition scores or mean preference ratings among the 4 strategy/configuration combinations tested. Individually, the best speech recognition performance was with the subject's everyday strategy/configuration combination in 72% of the applicable cases. If the everyday strategy was excluded from the analysis, the subjects performed best with the SAS/MP combination in 37.5% of the remaining cases. Conclusions: The SAS processing strategy with an MP electrode configuration gave reasonable speech recognition in most subjects, even though subjects had minimal previous experience with this strategy/configuration combination. The SAS/MP combination might be particularly appropriate for patients for whom a full dynamic range of electrical hearing could not be achieved with a BP configuration.


2021 ◽  
Vol 7 ◽  
pp. e650
Author(s):  
Mohammad Ali Humayun ◽  
Hayati Yassin ◽  
Pg Emeroylariffion Abas

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.


Author(s):  
Tim Arnold ◽  
Helen J. A. Fuller

Automatic speech recognition (ASR) systems and speech interfaces are becoming increasingly prevalent. This includes increases in and expansion of use of these technologies for supporting work in health care. Computer-based speech processing has been extensively studied and developed over decades. Speech processing tools have been fine-tuned through the work of Speech and Language Researchers. Researchers have previously and continue to describe speech processing errors in medicine. The discussion provided in this paper proposes an ergonomic framework for speech recognition to expand and further describe this view of speech processing in supporting clinical work. With this end in mind, we hope to build on previous work and emphasize the need for increased human factors involvement in this area while also facilitating the discussion of speech recognition in contexts that have been explored in the human factors domain. Human factors expertise can contribute through proactively describing and designing these critical interconnected socio-technical systems with error-tolerance in mind.


Sign in / Sign up

Export Citation Format

Share Document