Speech recognition with a seamlessly updated language model for real-time closed-captioning

N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning

2012 International Conference on Asian Language Processing ◽

10.1109/ialp.2012.55 ◽

2012 ◽

Cited By ~ 4

Author(s):

Shinya Takahashi ◽

Tsuyoshi Morimoto

Keyword(s):

Speech Recognition ◽

Language Model ◽

Web Documents ◽

Closed Captioning ◽

Model Based ◽

N Gram

Download Full-text

Real-time Controlling Dynamics Sensing in Air Traffic System

Sensors ◽

10.3390/s19030679 ◽

2019 ◽

Vol 19 (3) ◽

pp. 679

Author(s):

Yi Lin ◽

Xianlong Tan ◽

Bo Yang ◽

Kai Yang ◽

Jianwei Zhang ◽

...

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Real Time ◽

Traffic Control ◽

High Performance ◽

Language Model ◽

Air Traffic ◽

Civil Aviation ◽

Correction Procedure ◽

Traffic System

In order to obtain real-time controlling dynamics in air traffic system, a framework is proposed to introduce and process air traffic control (ATC) speech via radiotelephony communication. An automatic speech recognition (ASR) and controlling instruction understanding (CIU)-based pipeline is designed to convert the ATC speech into ATC related elements, i.e., controlling intent and parameters. A correction procedure is also proposed to improve the reliability of the information obtained by the proposed framework. In the ASR model, acoustic model (AM), pronunciation model (PM), and phoneme- and word-based language model (LM) are proposed to unify multilingual ASR into one model. In this work, based on their tasks, the AM and PM are defined as speech recognition and machine translation problems respectively. Two-dimensional convolution and average-pooling layers are designed to solve special challenges of ASR in ATC. An encoder–decoder architecture-based neural network is proposed to translate phoneme labels into word labels, which achieves the purpose of ASR. In the CIU model, a recurrent neural network-based joint model is proposed to detect the controlling intent and label the controlling parameters, in which the two tasks are solved in one network to enhance the performance with each other based on ATC communication rules. The ATC speech is now converted into ATC related elements by the proposed ASR and CIU model. To further improve the accuracy of the sensing framework, a correction procedure is proposed to revise minor mistakes in ASR decoding results based on the flight information, such as flight plan, ADS-B. The proposed models are trained using real operating data and applied to a civil aviation airport in China to evaluate their performance. Experimental results show that the proposed framework can obtain real-time controlling dynamics with high performance, only 4% word-error rate. Meanwhile, the decoding efficiency can also meet the requirement of real-time applications, i.e., an average 0.147 real time factor. With the proposed framework and obtained traffic dynamics, current ATC applications can be accomplished with higher accuracy. In addition, the proposed ASR pipeline has high reusability, which allows us to apply it to other controlling scenes and languages with minor changes.

Download Full-text

Language Model Comparison for Ukrainian Real-Time Speech Recognition System

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-319-01931-4_28 ◽

2013 ◽

pp. 211-218

Author(s):

Mykola Sazhok ◽

Valentyna Robeiko

Keyword(s):

Speech Recognition ◽

Real Time ◽

Model Comparison ◽

Language Model ◽

Recognition System ◽

Speech Recognition System

Download Full-text

Improved Factorization of a Connectionist Language Model for Single-Pass Real-Time Speech Recognition

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-319-08326-1_36 ◽

2014 ◽

pp. 355-364

Author(s):

Łukasz Brocki ◽

Danijel Koržinek ◽

Krzysztof Marasek

Keyword(s):

Speech Recognition ◽

Real Time ◽

Language Model ◽

Single Pass

Download Full-text

Real-Time Closed-Captioning System Using Speech Recognition of Direct Program Sound and Re-Spoken Utterances

The Journal of The Institute of Image Information and Television Engineers ◽

10.3169/itej.63.331 ◽

2009 ◽

Vol 63 (3) ◽

pp. 331-338 ◽

Cited By ~ 1

Author(s):

Shinichi Homma ◽

Akio Kobayashi ◽

Takahiro Oku ◽

Shoei Sato ◽

Toru Imai ◽

...

Keyword(s):

Speech Recognition ◽

Real Time ◽

Closed Captioning

Download Full-text

A Framework for Fast Incremental Interpretation during Speech Decoding

Computational Linguistics ◽

10.1162/coli.08-011-r2-07-021 ◽

2009 ◽

Vol 35 (3) ◽

pp. 313-343 ◽

Cited By ~ 8

Author(s):

William Schuler ◽

Stephen Wu ◽

Lane Schwartz

Keyword(s):

Speech Recognition ◽

Real Time ◽

Semantic Information ◽

Probability Model ◽

Language Model ◽

Search Space ◽

World Model ◽

Incremental Interpretation ◽

Large Domains

This article describes a framework for incorporating referential semantic information from a world model or ontology directly into a probabilistic language model of the sort commonly used in speech recognition, where it can be probabilistically weighted together with phonological and syntactic factors as an integral part of the decoding process. Introducing world model referents into the decoding search greatly increases the search space, but by using a single integrated phonological, syntactic, and referential semantic language model, the decoder is able to incrementally prune this search based on probabilities associated with these combined contexts. The result is a single unified referential semantic probability model which brings several kinds of context to bear in speech decoding, and performs accurate recognition in real time on large domains in the absence of example in-domain training sentences.

Download Full-text

Real-time one-pass decoding with recurrent neural network language model for speech recognition

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2014.6854829 ◽

2014 ◽

Cited By ~ 13

Author(s):

Takaaki Hori ◽

Yotaro Kubo ◽

Atsushi Nakamura

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Real Time ◽

Recurrent Neural Network ◽

Language Model ◽

Network Language

Download Full-text

Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition

10.21437/interspeech.2017-1790 ◽

2017 ◽

Author(s):

Weiwu Zhu

Keyword(s):

Speech Recognition ◽

Language Model ◽

Knowledge Graph ◽

Search Query ◽

Statistical Language Model

Download Full-text

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383515 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Sarangarajan Parthasarathy ◽

Eric Sun ◽

Yashesh Gaur ◽

Naoyuki Kanda ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Estimation ◽

End To End

Download Full-text

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text