scholarly journals Metode Wavelet-MFCC dan Korelasi dalam Pengenalan Suara Digit

2020 ◽  
Vol 2 (2) ◽  
pp. 100-108
Author(s):  
Zaurarista Dyarbirru ◽  
Syahroni Hidayat

Voice is the sound emitted from living things. With the development of Automatic Speech Recognition (ASR) technology, voice can be used to make it easier for humans to do something. In the ASR extraction process the features have an important role in the recognition process. The feature extraction methods that are commonly applied to ASR are MFCC and Wavelet. Each of them has advantages and disadvantages. Therefore, this study will combine the wavelet feature extraction method and MFCC to maximize the existing advantages. The proposed method is called Wavelet-MFCC. Voice recognition method that does not use recommendations. Determination of system performance using the Word Recoginition Rate (WRR) method which is validated with the K-Fold Cross Validation with the number of folds is 5. The research dataset used is voice recording digits 0-9 in English. The results show that the digit speech recognition system that has been built gives the highest average value of 63% for digit 4 using wavelet daubechies DB3 and wavelet dyadic transform method. As for the comparison results of the wavelet decomposition method used, that the use of dyadic wavelet transformation is better than the wavelet package.

Author(s):  
Gurpreet Kaur ◽  
Mohit Srivastava ◽  
Amod Kumar

In command and control applications, feature extraction process is very important for good accuracy and less learning time. In order to deal with these metrics, we have proposed an automated combined speaker and speech recognition technique. In this paper five isolated words are recorded with four speakers, two males and two females. We have used the Mel Frequency Cepstral Coefficient (MFCC)  feature extraction method with Genetic Algorithm to optimize the extracted features and generate an appropriate feature set. In first phase, feature extraction using MFCC is executed following the feature optimization using Genetic Algorithm and in last & third phase, training is conducted using the Deep Neural Network. In the end, evaluation and validation of the proposed work model is done by setting real environment. To check the efficiency of the proposed work, we have calculated the parameters like accuracy, precision rate, recall rate, sensitivity and specificity..


2009 ◽  
Vol 50 ◽  
pp. 212-216
Author(s):  
Antanas Leonas Lipeika

Straipsnyje nagrinėjami formantinių požymių taikymo atpažįstant kalbą klausimai. Nustatyta, kad formantiniai požymiai tam gali būti naudojami, tačiau atpažinimo tikslumas labai priklauso nuo formantinių požymių išskyrimo metodo. Geriausi atpažinimo rezultatai gaunami formantinių požymių išskyrimui naudojant išsigimusius prognozės polinomus. Šie polinomai gali būti skaičiuojami iš lyginės arba nelyginės eilės tiesinės prognozės modelio parametrų. Be to, atpažinimui galima naudoti simetrinius arba antisimetrinius išsigimusius tiesinės prognozės polinomus. Taip pat svarbu ištirti, kaip kalbos atpažinimo rezultatai priklauso ne tik nuo išsigimusių tiesinės prognozės polinomų parinkimo, bet ir kitų atpažinimo sistemos parametrų: analizės kadro ilgio, atpažinimui naudojamų formančių skaičiaus, formantinių požymių vaizdavimui naudojamos dažnių skalės. Tyrimais nustatyta, kad geriausi atpažinimo rezultatai gaunami naudojant dvi arba tris formantes, apskaičiuotas iš simetrinių išsigimusių prognozės polinomų. Tiriant atskirų formančių informatyvumą paaiškėjo, kad didžiausias indėlis į atpažinimą yra antros formantės. Pirmos, trečios ir ketvirtos formančių indėlis maždaug vienodas, bet aukštesnės formantės mažiau atsparios balto triukšmo įtakai. Tiriant analizės kadro ilgio parinkimą nustatyta, kad geriausi atpažinimo rezultatai yra esant 500 atskaitų kadro ilgiui. Atpažinimo rezultatai taip pat gaunami geresni vaizduojant formančių trajektorijas melų skalėje.Investigation of Formant Features in Speech RecognitionAntanas Leonas Lipeika SummaryThe use of formant features in speech recognition is investigated in the paper. It was established that formant features can be used in speech recognition but recognition accuracy depends remarkably on the formant feature extraction method. The best recognition results were obtained when singular prediction polynomials were used for formant feature extraction. These polynomials can be calculated from parameters of linear prediction models of even or odd order. These polynomials can by symmetric or antisymmetric as well. Also it is important to investigate how results of speech recognition depends not only on choice of singular prediction polynomials but although on other parameters of the recognition system: frame length, number of used formants in recognition, frequency scale, used for representation of formant features. During the experiments it was defi ned that the best recognition results were obtained using 2 or 3 formants calculated from symmetric singular prediction polynomials. The experiments have shown that the most informative is the 2-nd formant. Contribution of the 1-st, 3-rd and 4-th formants is approximately similar, but higher formants are less resistant to white noise. Recognition results also depends on analysis frame length and frequency scale. The best results were obtained using 500 data points frame length and Mel frequency scale.


Over past few years, face recognition technology plays an important function in the development of biometric identifier with less time consuming and computational overhead. Many researchers were put their effort to develop face recognition algorithm involves three distinct steps such as detection, unique faceprint creation and finally verification. Traditional Local binary pattern based face recognition system slow down the recognition speed, high computational complexity and does not give the directional data of the picture. In order to overcome the above limitation, a novel face recognition system is proposed by employing the advantage of Directional Binary Code (DBC) feature extraction method. The face images features are extracted from DBC are generally smoother than other feature extraction methods. The images with blur creation, pose changes, and illumination is applied and stored in the database. For blur creation various filters such as Average filter, Gaussian filter and Motion filter are used. By using Directional Binary Code method, the face is detected and extracted. Then the same algorithm is used for input images and with help of Multi-SVM classifier multiple images in the database is compared and shows the matched images. Finally, simulation result shows the implemented results in term of its recognition speed and computation complexity.


2020 ◽  
Vol 12 (1) ◽  
pp. 9
Author(s):  
Namkyoung Lee ◽  
Michael Azarian ◽  
Michael Pecht

The performance of a machine learning model depends on the quality of the features used as input to the model. Research into feature extraction methods for convolutional neural network (CNN)-based diagnostics for rotating machinery remains in a developmental stage. In general, the input to CNN-based diagnostics consists of a spectrogram without significant pre-processing. This paper introduces octave-band filtering as a feature extraction method for preprocessing a spectrogram prior to use with CNN. This method is an adaptation of a feature extraction method originally developed for speech recognition. The method developed for diagnosis of machinery faults differs from filtering methods applied to speech recognition in its use of octave bands, to which weighting has been applied that is optimal for machinery diagnosis. Through a case study, the effectiveness of octave-band filtering is demonstrated. The method not only improves the accuracy of the CNN-based diagnostics but also reduces the size of the CNN.


2019 ◽  
Vol 9 (2) ◽  
pp. 4066-4070 ◽  
Author(s):  
A. Mnassri ◽  
M. Bennasr ◽  
C. Adnane

The development of a real-time automatic speech recognition system (ASR) better adapted to environmental variabilities, such as noisy surroundings, speaker variations and accents has become a high priority. Robustness is required, and it can be performed at the feature extraction stage which avoids the need for other pre-processing steps. In this paper, a new robust feature extraction method for real-time ASR system is presented. A combination of Mel-frequency cepstral coefficients (MFCC) and discrete wavelet transform (DWT) is proposed. This hybrid system can conserve more extracted speech features which tend to be invariant to noise. The main idea is to extract MFCC features by denoising the obtained coefficients in the wavelet domain by using a median filter (MF). The proposed system has been implemented on Raspberry Pi 3 which is a suitable platform for real-time requirements. The experiments showed a high recognition rate (100%) in clean environment and satisfying results (ranging from 80% to 100%) in noisy environments at different signal to noise ratios (SNRs).


2021 ◽  
Vol 16 (1) ◽  
pp. 1-15
Author(s):  
Gyoung S. Na ◽  
Hyunju Chang

Feature extraction has been widely studied to find informative latent features and reduce the dimensionality of data. In particular, due to the difficulty in obtaining labeled data, unsupervised feature extraction has received much attention in data mining. However, widely used unsupervised feature extraction methods require side information about data or rigid assumptions on the latent feature space. Furthermore, most feature extraction methods require predefined dimensionality of the latent feature space,which should be manually tuned as a hyperparameter. In this article, we propose a new unsupervised feature extraction method called Unsupervised Subspace Extractor ( USE ), which does not require any side information and rigid assumptions on data. Furthermore, USE can find a subspace generated by a nonlinear combination of the input feature and automatically determine the optimal dimensionality of the subspace for the given nonlinear combination. The feature extraction process of USE is well justified mathematically, and we also empirically demonstrate the effectiveness of USE for several benchmark datasets.


2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


Author(s):  
Htwe Pa Pa Win ◽  
Phyo Thu Thu Khine ◽  
Khin Nwe Ni Tun

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.


Sign in / Sign up

Export Citation Format

Share Document