The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance

Ming Tu; Alan Wisler; Visar Berisha; Julie M. Liss

doi:10.1121/1.4967208

Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech

Folia Phoniatrica et Logopaedica ◽

10.1159/000511042 ◽

2020 ◽

pp. 1-10

Author(s):

Irene Calvo ◽

Peppino Tropea ◽

Mauro Viganò ◽

Maria Scialla ◽

Agnieszka B. Cavalcante ◽

...

Keyword(s):

Speech Recognition ◽

Technology Acceptance ◽

Automatic Speech Recognition ◽

Recognition Accuracy ◽

Ease Of Use ◽

Clinical Environment ◽

Dysarthric Speech ◽

Acceptance Model ◽

The Relationship ◽

Accuracy Rates

Introduction: The use of commercially available automatic speech recognition (ASR) software is challenged when dysarthria accompanies a physical disability. To overcome this issue, a mobile and personal speech assistant (mPASS) platform was developed, using a speaker-dependent ASR software. Objective: The aim of this study was to evaluate the performance of the proposed platform and to compare mPASS recognition accuracy to a commercial speaker-independent ASR software. In addition, secondary aims were to investigate the relationship between severity of dysarthria and accuracy and to explore people with dysarthria perceptions on the proposed platform. Methods: Fifteen individuals with dysarthric speech and 20 individuals with nondysarthric speech recorded 24 words and 5 sentences in a clinical environment. Differences in recognition accuracy between the two systems were evaluated. In addition, mPASS usability was assessed with a technology acceptance model (TAM) questionnaire. Results: In both groups, mean accuracy rates were significantly higher with mPASS compared to the commercial ASR for words and for sentences. mPASS reached good levels of usefulness and ease of use according to the TAM questionnaire. Conclusions: Practical applicability of this technology is realistic: the mPASS platform is accurate, and it could be easily used by individuals with dysarthria.

Download Full-text

Dysarthric speech perception: Comparison of training effects on human listeners versus automatic speech recognition tools

The Journal of the Acoustical Society of America ◽

10.1121/1.5101567 ◽

2019 ◽

Vol 145 (3) ◽

pp. 1795-1795

Author(s):

Michael F. Lally ◽

Heejin Kim ◽

Lori A. Moon

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

Automatic Speech Recognition ◽

Training Effects ◽

Dysarthric Speech

Download Full-text

A HYBRID METHOD FOR AUTOMATIC SPEECH RECOGNITION PERFORMANCE IMPROVEMENT IN REAL WORLD NOISY ENVIRONMENT

Journal of Computer Science ◽

10.3844/jcssp.2013.94.104 ◽

2013 ◽

Vol 9 (1) ◽

pp. 94-104 ◽

Cited By ~ 1

Author(s):

Shrawankar

Keyword(s):

Speech Recognition ◽

Performance Improvement ◽

Automatic Speech Recognition ◽

Hybrid Method ◽

Real World ◽

Recognition Performance ◽

Noisy Environment

Download Full-text

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

Language and Speech ◽

10.1177/0023830920911079 ◽

2020 ◽

pp. 002383092091107 ◽

Cited By ~ 1

Author(s):

Suzanne R. Jongman ◽

Yung Han Khoe ◽

Florian Hintz

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Production ◽

Spontaneous Speech ◽

Receptive Vocabulary ◽

Sentence Production ◽

Production Data ◽

Vocabulary Size ◽

Speech Transcription ◽

The Relationship

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to everyday questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition (ASR) for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription.

Download Full-text

Speech Feature Evaluation for Bangla Automatic Speech Recognition

Technical Challenges and Design Issues in Bangla Language Processing ◽

10.4018/978-1-4666-3970-6.ch009 ◽

2013 ◽

pp. 169-208 ◽

Cited By ~ 1

Author(s):

Mohammed Rokibul Alam Kotwal ◽

Foyzul Hassan ◽

Mohammad Nurul Huda

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Performance ◽

Dynamic Parameters ◽

Mel Frequency Cepstral Coefficients ◽

Phoneme Recognition ◽

Hybrid Features ◽

Feature Evaluation ◽

Speech Feature ◽

Speech Features

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.

Download Full-text

Improvement In Automatic Speech Recognition Performance In Noisy Environments Using Time-Domain Blind Source Separation

2007 IEEE 15th Signal Processing and Communications Applications ◽

10.1109/siu.2007.4298592 ◽

2007 ◽

Author(s):

Cemil Demir ◽

F. Kerem Harmanci

Keyword(s):

Speech Recognition ◽

Blind Source Separation ◽

Automatic Speech Recognition ◽

Time Domain ◽

Recognition Performance ◽

Source Separation ◽

Noisy Environments

Download Full-text

Automatic speech recognition and a review of its functioning with dysarthric speech

Augmentative and Alternative Communication ◽

10.1080/07434610012331278904 ◽

2000 ◽

Vol 16 (1) ◽

pp. 48-60 ◽

Cited By ~ 33

Author(s):

Kristin Rosen ◽

Sasha Yampolsky

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Dysarthric Speech

Download Full-text

Reverberation, Masking, Filtering, and Level Effects on Speech Recognition Performance

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3104.681 ◽

1988 ◽

Vol 31 (4) ◽

pp. 681-695 ◽

Cited By ~ 13

Author(s):

Faith C. Loven ◽

M. Jane Collins

Keyword(s):

Speech Recognition ◽

Speech Signal ◽

Nonsense Syllable ◽

Recognition Performance ◽

Interactive Effects ◽

Spectral Changes ◽

Signal Parameters ◽

Recognition Ability ◽

The Relationship ◽

Everyday Communication

The purpose of this investigation was to describe the interactive effects of four signal modifications typically encountered in everyday communication settings. These modifications included reverberation, masking, filtering, and fluctuation in speech intensity. The relationship between recognition performance and spectral changes to the speech signal due to the presence of these signal alterations was also studied. The interactive effects of these modifications were evaluated by obtaining indices of nonsense syllable recognition ability from normally hearing listeners for systematically varied combinations of the four signal parameters. The results of this study were in agreement with previous studies concerned with the effect of these variables in isolation on speech recognition ability. When present in combination, the direction of each variable's effect on recognition performance is maintained; however, the magnitude of the effect increases. The results of this investigation are reasonably accounted for by a spectral theory of speech recognition.

Download Full-text