Acoustic Model Training, using Kaldi, for Automatic Whispery Speech Recognition

Automatic Speech Recognition (ASR) requires huge amounts of real user speech data to reach state-of-the-art performance. However, speech data conveys sensitive speaker attributes like identity that can be inferred and exploited for malicious purposes. Therefore, there is an interest in the collection of anonymized speech data that is processed by some voice conversion method. In this paper, we evaluate one of the voice conversion methods on Latvian speech data and also investigate if privacy-transformed data can be used to improve ASR acoustic models. Results show the effectiveness of voice conversion against state-of-the-art speaker verification models on Latvian speech and the effectiveness of using privacy-transformed data in ASR training.

Download Full-text

Robust Acoustic Model Training Against Phoneme Variations for Large Vocabulary Continuous Speech Recognition

Signal and Image Processing ◽

10.2316/p.2011.759-070 ◽

2011 ◽

Author(s):

Gil Ho Lee ◽

Nam Soo Kim

Keyword(s):

Speech Recognition ◽

Acoustic Model ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Large Vocabulary ◽

Model Training

Download Full-text

Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient

Applied Sciences ◽

10.3390/app10103542 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3542 ◽

Cited By ~ 1

Author(s):

Hoon Chung ◽

Sung Joo Lee ◽

Hyeong Bae Jeon ◽

Jeon Gue Park

Keyword(s):

Speech Recognition ◽

Learning Algorithm ◽

Training Methods ◽

Acoustic Model ◽

Confidence Measure ◽

External Knowledge ◽

Teacher Student ◽

Policy Gradient ◽

Gradient Based ◽

Model Training

In this paper, we propose a policy gradient-based semi-supervised speech recognition acoustic model training. In practice, self-training and teacher/student learning are one of the widely used semi-supervised training methods due to their scalability and effectiveness. These methods are based on generating pseudo labels for unlabeled samples using a pre-trained model and selecting reliable samples using confidence measure. However, there are some considerations in this approach. The generated pseudo labels can be biased depending on which pre-trained model is used, and the training process can be complicated because the confidence measure is usually carried out in post-processing using external knowledge. Therefore, to address these issues, we propose a policy gradient method-based approach. Policy gradient is a reinforcement learning algorithm to find an optimal behavior strategy for an agent to obtain optimal rewards. The policy gradient-based approach provides a framework for exploring unlabeled data as well as exploiting labeled data, and it also provides a way to incorporate external knowledge in the same training cycle. The proposed approach was evaluated on an in-house non-native Korean recognition domain. The experimental results show that the method is effective in semi-supervised acoustic model training.

Download Full-text