scholarly journals The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance

2016 ◽  
Vol 140 (5) ◽  
pp. EL416-EL422 ◽  
Author(s):  
Ming Tu ◽  
Alan Wisler ◽  
Visar Berisha ◽  
Julie M. Liss
2020 ◽  
pp. 1-10
Author(s):  
Irene Calvo ◽  
Peppino Tropea ◽  
Mauro Viganò ◽  
Maria Scialla ◽  
Agnieszka B. Cavalcante ◽  
...  

<b><i>Introduction:</i></b> The use of commercially available automatic speech recognition (ASR) software is challenged when dysarthria accompanies a physical disability. To overcome this issue, a mobile and personal speech assistant (mPASS) platform was developed, using a speaker-dependent ASR software. <b><i>Objective:</i></b> The aim of this study was to evaluate the performance of the proposed platform and to compare mPASS recognition accuracy to a commercial speaker-independent ASR software. In addition, secondary aims were to investigate the relationship between severity of dysarthria and accuracy and to explore people with dysarthria perceptions on the proposed platform. <b><i>Methods:</i></b> Fifteen individuals with dysarthric speech and 20 individuals with nondysarthric speech recorded 24 words and 5 sentences in a clinical environment. Differences in recognition accuracy between the two systems were evaluated. In addition, mPASS usability was assessed with a technology acceptance model (TAM) questionnaire. <b><i>Results:</i></b> In both groups, mean accuracy rates were significantly higher with mPASS compared to the commercial ASR for words and for sentences. mPASS reached good levels of usefulness and ease of use according to the TAM questionnaire. <b><i>Conclusions:</i></b> Practical applicability of this technology is realistic: the mPASS platform is accurate, and it could be easily used by individuals with dysarthria.


2020 ◽  
pp. 002383092091107 ◽  
Author(s):  
Suzanne R. Jongman ◽  
Yung Han Khoe ◽  
Florian Hintz

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to everyday questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition (ASR) for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription.


Author(s):  
Mohammed Rokibul Alam Kotwal ◽  
Foyzul Hassan ◽  
Mohammad Nurul Huda

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.


1988 ◽  
Vol 31 (4) ◽  
pp. 681-695 ◽  
Author(s):  
Faith C. Loven ◽  
M. Jane Collins

The purpose of this investigation was to describe the interactive effects of four signal modifications typically encountered in everyday communication settings. These modifications included reverberation, masking, filtering, and fluctuation in speech intensity. The relationship between recognition performance and spectral changes to the speech signal due to the presence of these signal alterations was also studied. The interactive effects of these modifications were evaluated by obtaining indices of nonsense syllable recognition ability from normally hearing listeners for systematically varied combinations of the four signal parameters. The results of this study were in agreement with previous studies concerned with the effect of these variables in isolation on speech recognition ability. When present in combination, the direction of each variable's effect on recognition performance is maintained; however, the magnitude of the effect increases. The results of this investigation are reasonably accounted for by a spectral theory of speech recognition.


Sign in / Sign up

Export Citation Format

Share Document