Speech data clustering based on phoneme error trend for unsupervised acoustic model adaptation

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. In this paper, we propose a novel Divide-and-Merge paradigm to solve salient problems plaguing the ASR field. In the Divide phase, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the Merge phase two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge Algorithm (SOMA), which effectively alleviates the efficiency bottleneck of GMA and maintains superior performance. Extensive experiments on public data show that the proposed methods can significantly outperform the state-of-the-art.

Download Full-text

Unsupervised acoustic model adaptation algorithm using MLLR in a noisy environment

Electronics and Communications in Japan (Part III Fundamental Electronic Science) ◽

10.1002/ecjc.20227 ◽

2006 ◽

Vol 89 (3) ◽

pp. 48-58 ◽

Cited By ~ 2

Author(s):

Miichi Yamada ◽

Akira Baba ◽

Shinichi Yoshizawa ◽

Yuuichiro Mera ◽

Akinobu Lee ◽

...

Keyword(s):

Acoustic Model ◽

Model Adaptation ◽

Noisy Environment ◽

Adaptation Algorithm

Download Full-text

minimum classification error linear regression for acoustic model adaptation of continuous density HMMS

2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698) ◽

10.1109/icme.2003.1220938 ◽

2003 ◽

Author(s):

Xiaodong He ◽

Wu Chou

Keyword(s):

Linear Regression ◽

Classification Error ◽

Acoustic Model ◽

Model Adaptation ◽

Continuous Density ◽

Minimum Classification Error

Download Full-text

An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors

ReCALL ◽

10.1017/s0958344004001314 ◽

2004 ◽

Vol 16 (1) ◽

pp. 173-188 ◽

Cited By ~ 8

Author(s):

YASUSHI TSUBOTA ◽

MASATAKE DANTSUJI ◽

TATSUYA KAWAHARA

Keyword(s):

Error Rate ◽

Error Detection ◽

Native Speakers ◽

Error Rates ◽

Learning System ◽

Second Phase ◽

Acoustic Model ◽

Japanese Students ◽

Speech Data ◽

English Pronunciation

We have developed an English pronunciation learning system which estimates the intelligibility of Japanese learners' speech and ranks their errors from the viewpoint of improving their intelligibility to native speakers. Error diagnosis is particularly important in self-study since students tend to spend time on aspects of pronunciation that do not noticeably affect intelligibility. As a preliminary experiment, the speech of seven Japanese students was scored from 1 (hardly intelligible) to 5 (perfectly intelligible) by a linguistic expert. We also computed their error rates for each skill. We found that each intelligibility level is characterized by its distribution of error rates. Thus, we modeled each intelligibility level in accordance with its error rate. Error priority was calculated by comparing students' error rate distributions with that of the corresponding model for each intelligibility level. As non-native speech is acoustically broader than the speech of native speakers, we developed an acoustic model to perform automatic error detection using speech data obtained from Japanese students. As for supra-segmental error detection, we categorized errors frequently made by Japanese students and developed a separate acoustic model for that type of error detection. Pronunciation learning using this system involves two phases. In the first phase, students experience virtual conversation through video clips. They receive an error profile based on pronunciation errors detected during the conversation. Using the profile, students are able to grasp characteristic tendencies in their pronunciation errors which in effect lower their intelligibility. In the second phase, students practise correcting their individual errors using words and short phrases. They then receive information regarding the errors detected during this round of practice and instructions for correcting the errors. We have begun using this system in a CALL class at Kyoto University. We have evaluated system performance through the use of questionnaires and analysis of speech data logged in the server, and will present our findings in this paper.

Download Full-text

A study on deep neural network acoustic model adaptation for robust far-field speech recognition

10.21437/interspeech.2015-525 ◽

2015 ◽

Author(s):

Seyedmahdad Mirsamadi ◽

John H. L. Hansen

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Far Field ◽

Acoustic Model ◽

Model Adaptation

Download Full-text