Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment

Author(s):  
Gellert Sarosi ◽  
Mihaly Mozsary ◽  
Peter Mihajlik ◽  
Tibor Fegyo
2020 ◽  
Vol 12 (1) ◽  
pp. 9
Author(s):  
Namkyoung Lee ◽  
Michael Azarian ◽  
Michael Pecht

The performance of a machine learning model depends on the quality of the features used as input to the model. Research into feature extraction methods for convolutional neural network (CNN)-based diagnostics for rotating machinery remains in a developmental stage. In general, the input to CNN-based diagnostics consists of a spectrogram without significant pre-processing. This paper introduces octave-band filtering as a feature extraction method for preprocessing a spectrogram prior to use with CNN. This method is an adaptation of a feature extraction method originally developed for speech recognition. The method developed for diagnosis of machinery faults differs from filtering methods applied to speech recognition in its use of octave bands, to which weighting has been applied that is optimal for machinery diagnosis. Through a case study, the effectiveness of octave-band filtering is demonstrated. The method not only improves the accuracy of the CNN-based diagnostics but also reduces the size of the CNN.


2020 ◽  
Vol 2 (2) ◽  
pp. 100-108
Author(s):  
Zaurarista Dyarbirru ◽  
Syahroni Hidayat

Voice is the sound emitted from living things. With the development of Automatic Speech Recognition (ASR) technology, voice can be used to make it easier for humans to do something. In the ASR extraction process the features have an important role in the recognition process. The feature extraction methods that are commonly applied to ASR are MFCC and Wavelet. Each of them has advantages and disadvantages. Therefore, this study will combine the wavelet feature extraction method and MFCC to maximize the existing advantages. The proposed method is called Wavelet-MFCC. Voice recognition method that does not use recommendations. Determination of system performance using the Word Recoginition Rate (WRR) method which is validated with the K-Fold Cross Validation with the number of folds is 5. The research dataset used is voice recording digits 0-9 in English. The results show that the digit speech recognition system that has been built gives the highest average value of 63% for digit 4 using wavelet daubechies DB3 and wavelet dyadic transform method. As for the comparison results of the wavelet decomposition method used, that the use of dyadic wavelet transformation is better than the wavelet package.


Author(s):  
Seong Jun Song ◽  
Hyun Joon Shim ◽  
Chul Ho Park ◽  
Seong Hee Lee ◽  
Sang Won Yoon

Sign in / Sign up

Export Citation Format

Share Document