Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Damheo Lee; Donghyun Kim; Seung Yun; Sanghun Kim

doi:10.3390/app11062866

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

Attention-based Contextual Language Model Adaptation for Speech Recognition

10.18653/v1/2021.findings-acl.175 ◽

2021 ◽

Author(s):

Richard Diehl Martinez ◽

Scott Novotney ◽

Ivan Bulyko ◽

Ariya Rastrow ◽

Andreas Stolcke ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Language Model Adaptation

Download Full-text

Efficient language model adaptation for automatic speech recognition of spoken translations

10.21437/interspeech.2015-497 ◽

2015 ◽

Author(s):

Joris Pelemans ◽

Tom Vanallemeersch ◽

Kris Demuynck ◽

Hugo Van hamme ◽

Patrick Wambacq

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Language Model Adaptation

Download Full-text

An Interactive Way to Acquire Internet Documents for Language Model Adaptation of Speech Recognition Systems

2011 Third International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2011.29 ◽

2011 ◽

Cited By ~ 1

Author(s):

Hong Zhang ◽

Xiangdong Wang ◽

Yueliang Qian ◽

Shouxun Lin

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Language Model Adaptation ◽

Recognition Systems

Download Full-text

Language model adaptation for conversational speech recognition using automatically tagged pseudo-morphological classes

1997 IEEE International Conference on Acoustics, Speech, and Signal Processing ◽

10.1109/icassp.1997.596058 ◽

2002 ◽

Cited By ~ 1

Author(s):

C. Crespo ◽

D. Tapias ◽

G. Escalada ◽

J. Alvarez

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Conversational Speech ◽

Language Model Adaptation

Download Full-text

Factored language model adaptation using Dirichlet class language model for speech recognition

The 5th Conference on Information and Knowledge Technology ◽

10.1109/ikt.2013.6620107 ◽

2013 ◽

Cited By ~ 1

Author(s):

Ali Hatami ◽

Ahmad Akbari ◽

Babak Nasersharif

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Language Model Adaptation

Download Full-text

Unsupervised class-based language model adaptation for spontaneous speech recognition

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). ◽

10.1109/icassp.2003.1198761 ◽

2003 ◽

Cited By ~ 1

Author(s):

T. Yokoyama ◽

T. Shinozaki ◽

K. Iwano ◽

S. Furui

Keyword(s):

Speech Recognition ◽

Language Model ◽

Spontaneous Speech ◽

Model Adaptation ◽

Language Model Adaptation

Download Full-text

Unsupervised Language Model Adaptation by Data Selection for Speech Recognition

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54472-4_48 ◽

2017 ◽

pp. 508-517 ◽

Cited By ~ 1

Author(s):

Yerbolat Khassanov ◽

Tze Yuang Chong ◽

Benjamin Bigot ◽

Eng Siong Chng

Keyword(s):

Speech Recognition ◽

Language Model ◽

Data Selection ◽

Model Adaptation ◽

Language Model Adaptation ◽

Selection For

Download Full-text

Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition

Expert Systems with Applications ◽

10.1016/j.eswa.2014.07.035 ◽

2015 ◽

Vol 42 (1) ◽

pp. 101-112 ◽

Cited By ~ 9

Author(s):

J.D. Echeverry-Correa ◽

J. Ferreiros-López ◽

A. Coucheiro-Limeres ◽

R. Córdoba ◽

J.M. Montero

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Model Adaptation ◽

Topic Identification ◽

Language Model Adaptation ◽

Dynamic Language ◽

Identification Techniques

Download Full-text

Language model adaptation for language and dialect identification of text

Natural Language Engineering ◽

10.1017/s135132491900038x ◽

2019 ◽

Vol 25 (5) ◽

pp. 561-583 ◽

Cited By ~ 1

Author(s):

T. Jauhiainen ◽

K. Lindén ◽

H. Jauhiainen

Keyword(s):

Language Model ◽

Language Identification ◽

The Other ◽

Model Adaptation ◽

Current Version ◽

New Approach ◽

Identification Methods ◽

Language Model Adaptation ◽

Lm Adaptation ◽

A Current

AbstractThis article describes an unsupervised language model (LM) adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which is now called HeLI 2.0. We describe the HeLI 2.0 method in detail. The resulting system is evaluated using the datasets from the German dialect identification and Indo-Aryan language identification shared tasks of the VarDial workshops 2017 and 2018. The new approach with LM adaptation provides considerably higher F1-scores than the basic HeLI or HeLI 2.0 methods or the other systems which participated in the shared tasks. The results indicate that unsupervised LM adaptation should be considered as an option in all language identification tasks, especially in those where encountering out-of-domain data is likely.

Download Full-text

Speech Recognition for Task Domains with Sparse Matched Training Data

Applied Sciences ◽

10.3390/app10186155 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6155

Author(s):

Byung Ok Kang ◽

Hyeong Bae Jeon ◽

Jeon Gue Park

Keyword(s):

Speech Recognition ◽

Active Learning ◽

Latent Variables ◽

Data Augmentation ◽

Integrated System ◽

Training Data ◽

Target Domain ◽

Teacher Student ◽

Speech Data ◽

Active Learning Method

We propose two approaches to handle speech recognition for task domains with sparse matched training data. One is an active learning method that selects training data for the target domain from another general domain that already has a significant amount of labeled speech data. This method uses attribute-disentangled latent variables. For the active learning process, we designed an integrated system consisting of a variational autoencoder with an encoder that infers latent variables with disentangled attributes from the input speech, and a classifier that selects training data with attributes matching the target domain. The other method combines data augmentation methods for generating matched target domain speech data and transfer learning methods based on teacher/student learning. To evaluate the proposed method, we experimented with various task domains with sparse matched training data. The experimental results show that the proposed method has qualitative characteristics that are suitable for the desired purpose, it outperforms random selection, and is comparable to using an equal amount of additional target domain data.

Download Full-text