Speech recognition for mixed speech and music by NMF using various cost functions and noise adaptive training methods

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature.Â

Download Full-text

Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training

10.21437/interspeech.2015-522 ◽

2015 ◽

Author(s):

Jun Wang ◽

Seongjun Hahm

Keyword(s):

Speech Recognition ◽

Adaptive Training ◽

Speaker Independent ◽

Silent Speech

Download Full-text

Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4960461 ◽

2009 ◽

Cited By ~ 23

Author(s):

Ozlem Kalinli ◽

Michael L. Seltzer ◽

Alex Acero

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Taylor Series ◽

Adaptive Training ◽

Noise Robust

Download Full-text

A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish

Multimedia Tools and Applications ◽

10.1007/s11042-020-08782-0 ◽

2020 ◽

Vol 79 (27-28) ◽

pp. 19669-19715

Author(s):

Aldonso Becerra ◽

J. Ismael de la Rosa ◽

Efrén González ◽

A. David Pedroza ◽

N. Iracemi Escalante ◽

...

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Cost Functions ◽

Comparative Case Study ◽

Neural Network Training ◽

Network Training

Download Full-text

Performance of forced-alignment algorithms on children’s speech

10.31234/osf.io/97jp4 ◽

2020 ◽

Author(s):

Tristan Mahr ◽

Visar Berisha ◽

Kan Kawabata ◽

Julie Liss ◽

Katherine Hustad

Keyword(s):

Speech Recognition ◽

Gold Standard ◽

Alignment Accuracy ◽

Speech Sample ◽

Older Children ◽

Adaptive Training ◽

Alignment Algorithms ◽

Child Speech ◽

Speech Recognition Engine ◽

Children's Speech

Aim. We compared the performance of five forced-alignment algorithms on a corpus of child speech.Method. The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals.Results. The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Interpretation. The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors.

Download Full-text

Noisy Speech Recognition Based on RBF Neural Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.271-273.597 ◽

2011 ◽

Vol 271-273 ◽

pp. 597-602

Author(s):

Gang Yan ◽

Hai Dong Kong ◽

Yang Yu ◽

Xiao Xia Zheng

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Rbf Neural Network ◽

Recognition Rate ◽

Training Methods ◽

Noisy Speech ◽

Rbf Network ◽

Final Prediction Error ◽

Noisy Speech Recognition ◽

Hidden Layer

A noisy speech recognition method based on improved RBF neural network is presented, which the parameters of hidden layer are trained dynamically, and Akaike’s final prediction error standard (FPE) is employed to simplify the network. Comparing with two other training methods of RBF network, experimental results based on noisy speech samples show that this method achieves excellent performance in terms of recognition rate and recognition speed.

Download Full-text