Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures

Jonathan Darch; Ben Milner; Saeed Vaseghi

doi:10.1121/1.2997436

Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures

The Journal of the Acoustical Society of America ◽

10.1121/1.2997436 ◽

2008 ◽

Vol 124 (6) ◽

pp. 3989-4000 ◽

Cited By ~ 6

Author(s):

Jonathan Darch ◽

Ben Milner ◽

Saeed Vaseghi

Keyword(s):

Speech Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Distributed Speech Recognition ◽

Speech Features ◽

Cepstral Coefficients

Download Full-text

Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition

Advances in Nonlinear Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25020-0_24 ◽

2011 ◽

pp. 183-189 ◽

Cited By ~ 3

Author(s):

Aitzol Ezeiza ◽

Karmele López de Ipiña ◽

Carmen Hernández ◽

Nora Barroso

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Fractal Dimensions ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Predictive Trellis-Coded Quantization of the Cepstral Coefficients for the Distributed Speech Recognition

IEICE Transactions on Communications ◽

10.1093/ietcom/e90-b.6.1570 ◽

2007 ◽

Vol E90-B (6) ◽

pp. 1570-1572

Author(s):

S. KANG ◽

J. LEE

Keyword(s):

Speech Recognition ◽

Distributed Speech Recognition ◽

Trellis Coded ◽

Cepstral Coefficients

Download Full-text

Local Feature or Mel Frequency Cepstral Coefficients - Which One Is Better for MLN-Based Bangla Speech Recognition?

Advances in Computing and Communications - Communications in Computer and Information Science ◽

10.1007/978-3-642-22714-1_17 ◽

2011 ◽

pp. 154-161 ◽

Cited By ~ 5

Author(s):

Foyzul Hassan ◽

Mohammed Rokibul Alam Kotwal ◽

Md. Mostafizur Rahman ◽

Mohammad Nasiruddin ◽

Md. Abdul Latif ◽

...

Keyword(s):

Speech Recognition ◽

Local Feature ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Speech Feature Evaluation for Bangla Automatic Speech Recognition

Technical Challenges and Design Issues in Bangla Language Processing ◽

10.4018/978-1-4666-3970-6.ch009 ◽

2013 ◽

pp. 169-208 ◽

Cited By ~ 1

Author(s):

Mohammed Rokibul Alam Kotwal ◽

Foyzul Hassan ◽

Mohammad Nurul Huda

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Performance ◽

Dynamic Parameters ◽

Mel Frequency Cepstral Coefficients ◽

Phoneme Recognition ◽

Hybrid Features ◽

Feature Evaluation ◽

Speech Feature ◽

Speech Features

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.

Download Full-text

Chip design of mel frequency cepstral coefficients for speech recognition

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) ◽

10.1109/icassp.2000.860195 ◽

2002 ◽

Cited By ~ 1

Author(s):

Jia-Ching Wang ◽

Jhing-Fa Wang ◽

Yu-Sheng Weng

Keyword(s):

Speech Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Chip Design ◽

Cepstral Coefficients

Download Full-text

Speech Recognition Using Cross Correlation and Feature Analysis Using Mel-Frequency Cepstral Coefficients and Pitch

2020 IEEE International Conference for Innovation in Technology (INOCON) ◽

10.1109/inocon50539.2020.9298320 ◽

2020 ◽

Author(s):

Ruchi Gupte ◽

Sarah Hawa ◽

Reena Sonkusare

Keyword(s):

Speech Recognition ◽

Cross Correlation ◽

Feature Analysis ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition

IEEE Transactions on Speech and Audio Processing ◽

10.1109/89.784104 ◽

1999 ◽

Vol 7 (5) ◽

pp. 525-532 ◽

Cited By ~ 96

Author(s):

R. Vergin ◽

D. O'Shaughnessy ◽

A. Farhat

Keyword(s):

Speech Recognition ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Large Vocabulary ◽

Speaker Independent ◽

Cepstral Coefficients

Download Full-text

Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children

Natural Language Engineering ◽

10.1017/s135132491600005x ◽

2016 ◽

Vol 23 (3) ◽

pp. 325-350 ◽

Cited By ~ 15

Author(s):

ROMAIN SERIZEL ◽

DIEGO GIULIANI

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Error Rate ◽

Deep Neural Network ◽

Vocal Tract ◽

Rate Performance ◽

Posterior Probabilities ◽

Mel Frequency Cepstral Coefficients ◽

Heterogeneous Groups ◽

Cepstral Coefficients

AbstractThis paper introduces deep neural network (DNN)–hidden Markov model (HMM)-based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kind of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN–HMM. Then, VTLN is investigated by training a DNN–HMM system by using either mel frequency cepstral coefficients normalised with standard VTLN or mel frequency cepstral coefficients derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when the VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. The combination of several approaches is shown to improve the baseline phone error rate performance by thirty per cent to thirty-five per cent relative and the baseline word error rate performance by about ten per cent relative.

Download Full-text

Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3465 ◽

2020 ◽

Vol 10 (2) ◽

pp. 5547-5553

Author(s):

A. A. Alasadi ◽

T. H. Aldhayni ◽

R. R. Deshmukh ◽

A. H. Alahmadi ◽

A. S. Alshebami

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Group Delay ◽

Recognition System ◽

Support Vector ◽

Speech Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Delay Function ◽

Cepstral Coefficients ◽

Arabic Speech Recognition

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.

Download Full-text

How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language

The Journal of Engineering ◽

10.1049/tje2.12082 ◽

2021 ◽

Author(s):

Md. Rakibul Hasan ◽

Md. Mahbub Hasan ◽

Md Zakir Hossain

Keyword(s):

Speech Recognition ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Bengali Language

Download Full-text