How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language

AbstractThis paper introduces deep neural network (DNN)–hidden Markov model (HMM)-based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kind of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN–HMM. Then, VTLN is investigated by training a DNN–HMM system by using either mel frequency cepstral coefficients normalised with standard VTLN or mel frequency cepstral coefficients derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when the VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. The combination of several approaches is shown to improve the baseline phone error rate performance by thirty per cent to thirty-five per cent relative and the baseline word error rate performance by about ten per cent relative.

Download Full-text

Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3465 ◽

2020 ◽

Vol 10 (2) ◽

pp. 5547-5553

Author(s):

A. A. Alasadi ◽

T. H. Aldhayni ◽

R. R. Deshmukh ◽

A. H. Alahmadi ◽

A. S. Alshebami

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Group Delay ◽

Recognition System ◽

Support Vector ◽

Speech Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Delay Function ◽

Cepstral Coefficients ◽

Arabic Speech Recognition

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.

Download Full-text

Comparison of Feature Extraction Mel Frequency Cepstral Coefficients and Linear Predictive Coding in Automatic Speech Recognition for Indonesian

TELKOMNIKA (Telecommunication Computing Electronics and Control) ◽

10.12928/telkomnika.v15i1.3605 ◽

2017 ◽

Vol 15 (1) ◽

pp. 292 ◽

Cited By ~ 1

Author(s):

Sukmawati Nur Endah ◽

Satriyo Adhy ◽

Sutikno Sutikno

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Predictive Coding ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Environmental effects on reliability and accuracy of MFCC based voice recognition for industrial human-robot-interaction

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1177/09544054211014492 ◽

2021 ◽

pp. 095440542110144

Author(s):

B Birch ◽

CA Griffiths ◽

A Morgan

Keyword(s):

Speech Recognition ◽

Voice Recognition ◽

Human Robot Interaction ◽

Hole Drilling ◽

Time Warping ◽

Mel Frequency Cepstral Coefficients ◽

Robot Interaction ◽

Extraction Algorithm ◽

Dynamic Time ◽

Manufacturing Environments

Collaborative robots are becoming increasingly important for advanced manufacturing processes. The purpose of this paper is to determine the capability of a novel Human-Robot-interface to be used for machine hole drilling. Using a developed voice activation system, environmental factors on speech recognition accuracy are considered. The research investigates the accuracy of a Mel Frequency Cepstral Coefficients-based feature extraction algorithm which uses Dynamic Time Warping to compare an utterance to a limited, user-dependent dictionary. The developed Speech Recognition method allows for Human-Robot-Interaction using a novel integration method between the voice recognition and robot. The system can be utilised in many manufacturing environments where robot motions can be coupled to voice inputs rather than using time consuming physical interfaces. However, there are limitations to uptake in industries where the volume of background machine noise is high.

Download Full-text