Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training

The STC ASR System for the VOiCES from a Distance Challenge 2019

10.21437/interspeech.2019-1574 ◽

2019 ◽

Author(s):

Ivan Medennikov ◽

Yuri Khokhlov ◽

Aleksei Romanenko ◽

Ivan Sorokin ◽

Anton Mitrofanov ◽

...

Keyword(s):

Asr System

Download Full-text

On the Impact of Gabor Phase for Spectro-Temporal Feature Extraction in Building an ASR System

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon51383.2020.9284872 ◽

2020 ◽

Author(s):

Anirban Dutta ◽

Gudmalwar Prabhakar ◽

Ch V Rama Rao

Keyword(s):

Feature Extraction ◽

The Impact ◽

Asr System ◽

Temporal Feature

Download Full-text

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition

Sensors ◽

10.3390/s21093063 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3063

Author(s):

Aleksandr Laptev ◽

Andrei Andrusenko ◽

Ivan Podluzhny ◽

Anton Mitrofanov ◽

Ivan Medennikov ◽

...

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Rapid Development ◽

Computational Cost ◽

Vocabulary Size ◽

Word Error Rate ◽

Low Resource ◽

Steady Improvement ◽

End To End ◽

Asr System

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.

Download Full-text

Deep neural network for water/fat separation: Supervised training, unsupervised training, and no training

Magnetic Resonance in Medicine ◽

10.1002/mrm.28663 ◽

2020 ◽

Vol 85 (4) ◽

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Supervised Training ◽

Unsupervised Training

Download Full-text

Pedestrian detection algorithm in traffic scene based on weakly supervised hierarchical deep model

International Journal of Advanced Robotic Systems ◽

10.1177/1729881417692311 ◽

2016 ◽

Vol 14 (1) ◽

pp. 172988141769231 ◽

Cited By ~ 2

Author(s):

Yingfeng Cai ◽

Youguo He ◽

Hai Wang ◽

Xiaoqiang Sun ◽

Long Chen ◽

...

Keyword(s):

Deep Learning ◽

Pedestrian Detection ◽

Recognition Rate ◽

Detection Algorithm ◽

Training Methods ◽

Two Dimensional ◽

Data Set ◽

Deep Model ◽

Unsupervised Training ◽

Weakly Supervised

The emergence and development of deep learning theory in machine learning field provide new method for visual-based pedestrian recognition technology. To achieve better performance in this application, an improved weakly supervised hierarchical deep learning pedestrian recognition algorithm with two-dimensional deep belief networks is proposed. The improvements are made by taking into consideration the weaknesses of structure and training methods of existing classifiers. First, traditional one-dimensional deep belief network is expanded to two-dimensional that allows image matrix to be loaded directly to preserve more information of a sample space. Then, a determination regularization term with small weight is added to the traditional unsupervised training objective function. By this modification, original unsupervised training is transformed to weakly supervised training. Subsequently, that gives the extracted features discrimination ability. Multiple sets of comparative experiments show that the performance of the proposed algorithm is better than other deep learning algorithms in recognition rate and outperforms most of the existing state-of-the-art methods in non-occlusion pedestrian data set while performs fair in weakly and heavily occlusion data set.

Download Full-text