Frequency weighting in the feature extraction process: Effects of parameter choice on generalized perceptual linear prediction coefficients

2005 ◽  
Vol 118 (3) ◽  
pp. 2019-2019
Author(s):  
Patrick J. Clemins ◽  
Michael T. Johnson
2018 ◽  
Vol 29 (1) ◽  
pp. 327-344 ◽  
Author(s):  
Mohit Dua ◽  
Rajesh Kumar Aggarwal ◽  
Mantosh Biswas

Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.


Author(s):  
Gurpreet Kaur ◽  
Mohit Srivastava ◽  
Amod Kumar

Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3481 ◽  
Author(s):  
Frederico Soares Cabral ◽  
Hidekazu Fukai ◽  
Satoshi Tamura

The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.


2019 ◽  
Vol 8 (02) ◽  
pp. 24469-24472
Author(s):  
Thiruven Gatanadhan R

Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. This paper deals with the Speech/Music classification problem, starting from a set of features extracted directly from audio data. Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. The accuracy of the classification relies on the strength of the features and classification scheme. In this work Perceptual Linear Prediction (PLP) features are extracted from the input signal. After feature extraction, classification is carried out, using Support Vector Model (SVM) model. The proposed feature extraction and classification models results in better accuracy in speech/music classification.


2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


2020 ◽  
Vol 20 (S12) ◽  
Author(s):  
Juan C. Mier ◽  
Yejin Kim ◽  
Xiaoqian Jiang ◽  
Guo-Qiang Zhang ◽  
Samden Lhatoo

Abstract Background Sudden Unexpected Death in Epilepsy (SUDEP) has increased in awareness considerably over the last two decades and is acknowledged as a serious problem in epilepsy. However, the scientific community remains unclear on the reason or possible bio markers that can discern potentially fatal seizures from other non-fatal seizures. The duration of postictal generalized EEG suppression (PGES) is a promising candidate to aid in identifying SUDEP risk. The length of time a patient experiences PGES after a seizure may be used to infer the risk a patient may have of SUDEP later in life. However, the problem becomes identifying the duration, or marking the end, of PGES (Tomson et al. in Lancet Neurol 7(11):1021–1031, 2008; Nashef in Epilepsia 38:6–8, 1997). Methods This work addresses the problem of marking the end to PGES in EEG data, extracted from patients during a clinically supervised seizure. This work proposes a sensitivity analysis on EEG window size/delay, feature extraction and classifiers along with associated hyperparameters. The resulting sensitivity analysis includes the Gradient Boosted Decision Trees and Random Forest classifiers trained on 10 extracted features rooted in fundamental EEG behavior using an EEG specific feature extraction process (pyEEG) and 5 different window sizes or delays (Bao et al. in Comput Intell Neurosci 2011:1687–5265, 2011). Results The machine learning architecture described above scored a maximum AUC score of 76.02% with the Random Forest classifier trained on all extracted features. The highest performing features included SVD Entropy, Petrosan Fractal Dimension and Power Spectral Intensity. Conclusion The methods described are effective in automatically marking the end to PGES. Future work should include integration of these methods into the clinical setting and using the results to be able to predict a patient’s SUDEP risk.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Chun-Cheng Lin ◽  
Chun-Min Yang

This study developed an automatic heartbeat classification system for identifying normal beats, supraventricular ectopic beats, and ventricular ectopic beats based on normalized RR intervals and morphological features. The proposed heartbeat classification system consists of signal preprocessing, feature extraction, and linear discriminant classification. First, the signal preprocessing removed the high-frequency noise and baseline drift of the original ECG signal. Then the feature extraction derived the normalized RR intervals and two types of morphological features using wavelet analysis and linear prediction modeling. Finally, the linear discriminant classifier combined the extracted features to classify heartbeats. A total of 99,827 heartbeats obtained from the MIT-BIH Arrhythmia Database were divided into three datasets for the training and testing of the optimized heartbeat classification system. The study results demonstrate that the use of the normalized RR interval features greatly improves the positive predictive accuracy of identifying the normal heartbeats and the sensitivity for identifying the supraventricular ectopic heartbeats in comparison with the use of the nonnormalized RR interval features. In addition, the combination of the wavelet and linear prediction morphological features has higher global performance than only using the wavelet features or the linear prediction features.


Author(s):  
Made Sudarma ◽  
I Gede Harsemadi

Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music.  This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm.  In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process.  For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data.  The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious.  Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021.  Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.


2016 ◽  
Vol 7 (1) ◽  
pp. 58-68 ◽  
Author(s):  
Imen Trabelsi ◽  
Med Salim Bouhlel

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.


Sign in / Sign up

Export Citation Format

Share Document