Real-time integration of dynamic context information for improving automatic speech recognition

Author(s):  
Youssef Oualil ◽  
Marc Schulder ◽  
Hartmut Helmke ◽  
Anna Schmidt ◽  
Dietrich Klakow
Author(s):  
Pablo Revuelta ◽  
Javier Jiménez ◽  
José M. Sánchez ◽  
Belén Ruiz

This chapter introduces the potential of Automatic Speech Recognition Technology (ASR) in the challenge of inclusive education. ASR technology combined with Information and Communication Technology (ICT) enhances the learning of disabled people both in and outside the classroom. In the classroom, deaf and hearing-impaired students can benefit from a real-time transcription of what the teacher is saying. Also, a real-time transcription facilitates note taking for students with visual or physical disabilities. Outside the classroom, transcription and other media files (audio, slides, video, etc.) are powerful educational resources for all students, disabled or able-bodied. Some of most relevant projects and systems around the world are described and compared in this chapter to provide updated information about ASR technology performance and its application to enhancing the learning of disabled students.


Author(s):  
Ranjodh Singh ◽  
Hemant Yadav ◽  
Mohit Sharma ◽  
Sandeep Gosain ◽  
Rajiv Ratn Shah

2021 ◽  
Author(s):  
Yu Wang ◽  
Chee Siang Leow ◽  
Akio Kobayashi ◽  
Takehito Utsuro ◽  
Hiromitsu Nishizaki

Author(s):  
Carly B. Fox ◽  
Megan Israelsen-Augenstein ◽  
Sharad Jones ◽  
Sandra Laing Gillam

Purpose This study examined the accuracy and potential clinical utility of two expedited transcription methods for narrative language samples elicited from school-age children (7;5–11;10 [years;months]) with developmental language disorder. Transcription methods included real-time transcription produced by speech-language pathologists (SLPs) and trained transcribers (TTs) as well as Google Cloud Speech automatic speech recognition. Method The accuracy of each transcription method was evaluated against a gold-standard reference corpus. Clinical utility was examined by determining the reliability of scores calculated from the transcripts produced by each method on several language sample analysis (LSA) measures. Participants included seven certified SLPs and seven TTs. Each participant was asked to produce a set of six transcripts in real time, out of a total 42 language samples. The same 42 samples were transcribed using Google Cloud Speech. Transcription accuracy was evaluated through word error rate. Reliability of LSA scores was determined using correlation analysis. Results Results indicated that Google Cloud Speech was significantly more accurate than real-time transcription in transcribing narrative samples and was not impacted by speech rate of the narrator. In contrast, SLP and TT transcription accuracy decreased as a function of increasing speech rate. LSA metrics generated from Google Cloud Speech transcripts were also more reliably calculated. Conclusions Automatic speech recognition showed greater accuracy and clinical utility as an expedited transcription method than real-time transcription. Though there is room for improvement in the accuracy of speech recognition for the purpose of clinical transcription, it produced highly reliable scores on several commonly used LSA metrics. Supplemental Material https://doi.org/10.23641/asha.15167355


Sign in / Sign up

Export Citation Format

Share Document