Articulatory feature based multilingual MLPs for low-resource speech recognition

Optimization of Formant Feature Based Speech Recognition

Informatica ◽

10.15388/informatica.2010.294 ◽

2010 ◽

Vol 21 (3) ◽

pp. 361-374 ◽

Cited By ~ 2

Author(s):

Antanas Lipeika

Keyword(s):

Speech Recognition ◽

Feature Based

Download Full-text

Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages

10.21437/sltu.2018-6 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arun Baby ◽

Karthik Pandia D S ◽

Hema A Murthy

Keyword(s):

Signal Processing ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Indian Languages ◽

Low Resource

Download Full-text

TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages

10.21437/interspeech.2018-2117 ◽

2018 ◽

Cited By ~ 7

Author(s):

Noor Fathima ◽

Tanvina Patel ◽

Mahima C ◽

Anuroop Iyengar

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Indian Languages ◽

Low Resource ◽

Multilingual Speech Recognition

Download Full-text

Active Learning Methods for Low Resource End-to-End Speech Recognition

10.21437/interspeech.2019-2316 ◽

2019 ◽

Cited By ~ 2

Author(s):

Karan Malhotra ◽

Shubham Bansal ◽

Sriram Ganapathy

Keyword(s):

Speech Recognition ◽

Active Learning ◽

Learning Methods ◽

Low Resource ◽

End To End

Download Full-text

A General Procedure for Improving Language Models in Low-Resource Speech Recognition

2019 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp48816.2019.9037726 ◽

2019 ◽

Author(s):

Qian Liu ◽

Wei-Qiang Zhang ◽

Jia Liu ◽

Yao Liu

Keyword(s):

Speech Recognition ◽

General Procedure ◽

Language Models ◽

Low Resource

Download Full-text

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition

Sensors ◽

10.3390/s21093063 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3063

Author(s):

Aleksandr Laptev ◽

Andrei Andrusenko ◽

Ivan Podluzhny ◽

Anton Mitrofanov ◽

Ivan Medennikov ◽

...

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Rapid Development ◽

Computational Cost ◽

Vocabulary Size ◽

Word Error Rate ◽

Low Resource ◽

Steady Improvement ◽

End To End ◽

Asr System

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.

Download Full-text

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3071668 ◽

2021 ◽

Vol 28 ◽

pp. 788-792

Author(s):

Cheng Yi ◽

Shiyu Zhou ◽

Bo Xu

Keyword(s):

Speech Recognition ◽

Low Resource

Download Full-text

Cross-lingual and ensemble MLPs strategies for low-resource speech recognition

10.21437/interspeech.2012-11 ◽

2012 ◽

Author(s):

Yanmin Qian ◽

Jia Liu

Keyword(s):

Speech Recognition ◽

Low Resource ◽

Cross Lingual

Download Full-text

Feature-based pronunciation modeling for speech recognition

Proceedings of HLT-NAACL 2004: Short Papers on XX - HLT-NAACL '04 ◽

10.3115/1613984.1614005 ◽

2004 ◽

Cited By ~ 13

Author(s):

Karen Livescu ◽

James Glass

Keyword(s):

Speech Recognition ◽

Pronunciation Modeling ◽

Feature Based

Download Full-text

Prosodic Feature-Based Discriminatively Trained Low Resource Speech Recognition System

Sustainability ◽

10.3390/su14020614 ◽

2022 ◽

Vol 14 (2) ◽

pp. 614

Author(s):

Taniya Hasija ◽

Virender Kadyan ◽

Kalpna Guleria ◽

Abdullah Alharbi ◽

Hashem Alyami ◽

...

Keyword(s):

Speech Recognition ◽

Mutual Information ◽

Data Augmentation ◽

Recognition System ◽

Speech Recognition System ◽

Prosodic Features ◽

Prosodic Feature ◽

Feature Based ◽

Maximum Mutual Information ◽

Children's Speech

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.

Download Full-text