Signal conditioned minimum error rate training for continuous speech recognition

Here a development of an Acoustic and Language Model is presented. Low Word Error Rate is an early good sign of a good Language and Acoustic Model. Although there are still parameters other than Words Error Rate, our work focused on building Bahasa Indonesia with approximately 2000 common words and achieved the minimum threshold of 25% Word Error Rate. There were several experiments consist of different cases, training data, and testing data with Word Error Rate and Testing Ratio as the main comparison. The language and acoustic model were built using Sphinx4 from Carnegie Mellon University using Hidden Markov Model for the acoustic model and ARPA Model for the language model. The models configurations, which are Beam Width and Force Alignment, directly correlates with Word Error Rate. The configurations were set to 1e-80 for Beam Width and 1e-60 for Force Alignment to prevent underfitting or overfitting of the acoustic model. The goals of this research are to build continuous speech recognition in Bahasa Indonesia which has low Word Error Rate and to determine the optimum numbers of training and testing data which minimize the Word Error Rate.

Download Full-text

Continuous Speech Recognition of Kazakh Language

ITM Web of Conferences ◽

10.1051/itmconf/20192401012 ◽

2019 ◽

Vol 24 ◽

pp. 01012 ◽

Cited By ~ 2

Author(s):

Оrken Mamyrbayev ◽

Mussa Turdalyuly ◽

Nurbapa Mekebayev ◽

Kuralay Mukhsina ◽

Alimukhan Keylan ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Error Rate ◽

Speech Signal ◽

Deep Neural Networks ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Reliable System ◽

Word Error Rate ◽

Language Studies

This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. The continuous speech has been composed of 200 speakers of different genders and ages, and the pronunciation vocabulary of the selected language. Traditional models and deep neural networks have been used to train the system. As a result, a word error rate (WER) of 30.01% has been obtained.

Download Full-text

A MINIMUM ERROR RATE PATTERN RECOGNITION APPROACH TO SPEECH RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001494000024 ◽

1994 ◽

Vol 08 (01) ◽

pp. 5-31 ◽

Cited By ~ 25

Author(s):

W. CHOU ◽

C.-H. LEE ◽

B.-H. JUANG ◽

F.K. SOONG

Keyword(s):

Pattern Recognition ◽

Speech Recognition ◽

Error Rate ◽

Training Algorithms ◽

Generation Process ◽

Viterbi Decoding ◽

Continuous Speech Recognition ◽

Minimum Error ◽

Pattern Recognition Approach ◽

Speech Recognizer

In this paper, a minimum error rate pattern recognition approach to speech recognition is studied with particular emphasis on the speech recognizer designs based on hidden Markov models (HMMs) and Viterbi decoding. This approach differs from the traditional maximum likelihood based approach in that the objective of the recognition error rate minimization is established through a specially designed loss function, and is not based on the assumptions made about the speech generation process. Various theoretical and practical issues concerning this minimum error rate pattern recognition approach in speech recognition are investigated. The formulation and the algorithmic structures of several minimum error rate training algorithms for an HMM-based speech recognizer are discussed. The tree-trellis based N-best decoding method and a robust speech recognition scheme based on the combined string models are described. This approach can be applied to large vocabulary, continuous speech recognition tasks and to speech recognizers using word or subword based speech recognition units. Various experimental results have shown that significant error rate reduction can be achieved through the proposed approach.

Download Full-text

Large-Vocabulary Continuous Speech Recognition of Lhasa Tibetan

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.802 ◽

2014 ◽

Vol 519-520 ◽

pp. 802-806 ◽

Cited By ~ 2

Author(s):

Guan Yu Li ◽

Hong Zhi Yu

Keyword(s):

Speech Recognition ◽

Hidden Markov Models ◽

Error Rate ◽

Markov Models ◽

Hidden Markov ◽

Basic Unit ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Word Error Rate ◽

Large Vocabulary

The framework of auto speech recognition of Lhasa dialect was established in this paper. Phoneme was chosen as the basic unit for modeling. Then, phonemes set of Lhasa dialect and their Latin transliteration were designed. There were 5568 frequently used monosyllables in the vocabulary. Hidden Markov Models of triphones were established and trained by use of HTK. Word error rate (WER) was 21.81% under the optimal situation.

Download Full-text

Review on Acoustic Modeling for Continuous Speech Recognition

i-manager s Journal on Digital Signal Processing ◽

10.26634/jdp.2.4.3145 ◽

2014 ◽

Vol 2 (4) ◽

pp. 30-33

Author(s):

R. Mohan ◽

◽

M. Kalamani ◽

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text

The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API

International Journal of Networked and Distributed Computing ◽

10.2991/ijndc.k.201218.005 ◽

2021 ◽

Author(s):

Hyun Jae Yoo ◽

Sungwoong Seo ◽

Sun Woo Im ◽

Gwang Yong Gim

Keyword(s):

Performance Evaluation ◽

Speech Recognition ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Open Api ◽

Phonological Rules

Download Full-text

Recent results from the ARM continuous speech recognition project

10.3115/116580.116727 ◽

1990 ◽

Author(s):

Martin Russell ◽

Keith Ponting

Keyword(s):

Speech Recognition ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text

Improved acoustic modeling for continuous speech recognition

10.3115/116580.116686 ◽

1990 ◽

Cited By ~ 12

Author(s):

C.-H. Lee ◽

E. Giachin ◽

L. R. Rabiner ◽

R. Pieraccini ◽

A. E. Rosenberg

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text

Development of Assamese Continuous Speech Recognition System

10.21437/sltu.2018-45 ◽

2018 ◽

Author(s):

Barsha Deka ◽

Nirmala S.R. ◽

Samudravijaya K.

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text