Frequency weighting in the feature extraction process: Effects of parameter choice on generalized perceptual linear prediction coefficients

Patrick J. Clemins; Michael T. Johnson

doi:10.1121/1.4785744

Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0618 ◽

2018 ◽

Vol 29 (1) ◽

pp. 327-344 ◽

Cited By ~ 3

Author(s):

Mohit Dua ◽

Rajesh Kumar Aggarwal ◽

Mantosh Biswas

Keyword(s):

Feature Extraction ◽

Linear Prediction ◽

Extraction Methods ◽

Discriminative Training ◽

Mel Frequency Cepstral Coefficients ◽

Maximum Mutual Information ◽

Perceptual Linear Prediction ◽

Noise Robust ◽

Minimum Phone Error ◽

Asr System

Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.

Download Full-text

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Journal of Telecommunications and Information Technology ◽

10.26636/jtit.2018.119617 ◽

2018 ◽

Vol 2 ◽

pp. 23-31 ◽

Cited By ~ 1

Author(s):

Gurpreet Kaur ◽

Mohit Srivastava ◽

Amod Kumar

Keyword(s):

Genetic Algorithm ◽

Feature Extraction ◽

Speech Recognition ◽

Speaker Recognition ◽

Linear Prediction ◽

Rate Sensitivity ◽

Second Phase ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Perceptual Linear Prediction

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coeﬃcient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

Download Full-text

Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors

Sensors ◽

10.3390/s19163481 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3481 ◽

Cited By ~ 1

Author(s):

Frederico Soares Cabral ◽

Hidekazu Fukai ◽

Satoshi Tamura

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Condition Monitoring ◽

Linear Prediction ◽

Machine Learning Techniques ◽

Feature Extraction Method ◽

Extraction Step ◽

Road Condition ◽

Road Condition Monitoring ◽

Perceptual Linear Prediction

The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.

Download Full-text

Speech/music classification using PLP and SVM

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs.v8i02.4277 ◽

2019 ◽

Vol 8 (02) ◽

pp. 24469-24472

Author(s):

Thiruven Gatanadhan R

Keyword(s):

Feature Extraction ◽

Linear Prediction ◽

Classification Problem ◽

Vector Model ◽

Support Vector ◽

Audio Classification ◽

Music Classification ◽

Audio Retrieval ◽

Audio Indexing ◽

Perceptual Linear Prediction

Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. This paper deals with the Speech/Music classification problem, starting from a set of features extracted directly from audio data. Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. The accuracy of the classification relies on the strength of the features and classification scheme. In this work Perceptual Linear Prediction (PLP) features are extracted from the input signal. After feature extraction, classification is carried out, using Support Vector Model (SVM) model. The proposed feature extraction and classification models results in better accuracy in speech/music classification.

Download Full-text

Handwritten Balinesse Character Recognition using K-Nearest Neighbor

10.31227/osf.io/z6m8u ◽

2018 ◽

Author(s):

I Wayan Agus Surya Darma

Keyword(s):

Feature Extraction ◽

Success Rate ◽

Character Recognition ◽

Nearest Neighbor ◽

Recognition System ◽

Extraction Process ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Character Feature

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.

Download Full-text

Categorisation of EEG suppression using enhanced feature extraction for SUDEP risk assessment

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01309-5 ◽

2020 ◽

Vol 20 (S12) ◽

Author(s):

Juan C. Mier ◽

Yejin Kim ◽

Xiaoqian Jiang ◽

Guo-Qiang Zhang ◽

Samden Lhatoo

Keyword(s):

Sensitivity Analysis ◽

Feature Extraction ◽

Random Forest ◽

Window Size ◽

Extraction Process ◽

Unexpected Death ◽

Power Spectral ◽

Eeg Data ◽

Boosted Decision Trees ◽

Future Work

Abstract Background Sudden Unexpected Death in Epilepsy (SUDEP) has increased in awareness considerably over the last two decades and is acknowledged as a serious problem in epilepsy. However, the scientific community remains unclear on the reason or possible bio markers that can discern potentially fatal seizures from other non-fatal seizures. The duration of postictal generalized EEG suppression (PGES) is a promising candidate to aid in identifying SUDEP risk. The length of time a patient experiences PGES after a seizure may be used to infer the risk a patient may have of SUDEP later in life. However, the problem becomes identifying the duration, or marking the end, of PGES (Tomson et al. in Lancet Neurol 7(11):1021–1031, 2008; Nashef in Epilepsia 38:6–8, 1997). Methods This work addresses the problem of marking the end to PGES in EEG data, extracted from patients during a clinically supervised seizure. This work proposes a sensitivity analysis on EEG window size/delay, feature extraction and classifiers along with associated hyperparameters. The resulting sensitivity analysis includes the Gradient Boosted Decision Trees and Random Forest classifiers trained on 10 extracted features rooted in fundamental EEG behavior using an EEG specific feature extraction process (pyEEG) and 5 different window sizes or delays (Bao et al. in Comput Intell Neurosci 2011:1687–5265, 2011). Results The machine learning architecture described above scored a maximum AUC score of 76.02% with the Random Forest classifier trained on all extracted features. The highest performing features included SVD Entropy, Petrosan Fractal Dimension and Power Spectral Intensity. Conclusion The methods described are effective in automatically marking the end to PGES. Future work should include integration of these methods into the clinical setting and using the results to be able to predict a patient’s SUDEP risk.

Download Full-text

Heartbeat Classification Using Normalized RR Intervals and Morphological Features

Mathematical Problems in Engineering ◽

10.1155/2014/712474 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 25

Author(s):

Chun-Cheng Lin ◽

Chun-Min Yang

Keyword(s):

Feature Extraction ◽

Classification System ◽

Linear Prediction ◽

Morphological Features ◽

Rr Intervals ◽

Heartbeat Classification ◽

Linear Discriminant ◽

Rr Interval ◽

Signal Preprocessing ◽

Ectopic Beats

This study developed an automatic heartbeat classification system for identifying normal beats, supraventricular ectopic beats, and ventricular ectopic beats based on normalized RR intervals and morphological features. The proposed heartbeat classification system consists of signal preprocessing, feature extraction, and linear discriminant classification. First, the signal preprocessing removed the high-frequency noise and baseline drift of the original ECG signal. Then the feature extraction derived the normalized RR intervals and two types of morphological features using wavelet analysis and linear prediction modeling. Finally, the linear discriminant classifier combined the extracted features to classify heartbeats. A total of 99,827 heartbeats obtained from the MIT-BIH Arrhythmia Database were divided into three datasets for the training and testing of the optimized heartbeat classification system. The study results demonstrate that the use of the normalized RR interval features greatly improves the positive predictive accuracy of identifying the normal heartbeats and the sensitivity for identifying the supraventricular ectopic heartbeats in comparison with the use of the nonnormalized RR interval features. In addition, the combination of the wavelet and linear prediction morphological features has higher global performance than only using the wavelet features or the linear prediction features.

Download Full-text

Design and Analysis System of KNN and ID3 Algorithm for Music Classification based on Mood Feature Extraction

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i1.pp486-495 ◽

2017 ◽

Vol 7 (1) ◽

pp. 486

Author(s):

Made Sudarma ◽

I Gede Harsemadi

Keyword(s):

Feature Extraction ◽

Processing Time ◽

Nearest Neighbor ◽

Extraction Process ◽

Performance Comparison ◽

K Nearest Neighbor ◽

K Value ◽

Id3 Algorithm ◽

Analysis System ◽

Music Information

Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.

Download Full-text

Perceptual Linear Prediction Feature as an Indicator of Dysphonia

Lecture Notes in Electrical Engineering - Advances in Control Instrumentation Systems ◽

10.1007/978-981-15-4676-1_5 ◽

2020 ◽

pp. 51-64 ◽

Cited By ~ 1

Author(s):

Jennifer C. Saldanha ◽

Malini Suvarna

Keyword(s):

Linear Prediction ◽

Perceptual Linear Prediction

Download Full-text

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

International Journal of Synthetic Emotions ◽

10.4018/ijse.2016010105 ◽

2016 ◽

Vol 7 (1) ◽

pp. 58-68 ◽

Cited By ~ 4

Author(s):

Imen Trabelsi ◽

Med Salim Bouhlel

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Recognition Rate ◽

Gaussian Mixture ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional States ◽

Wide Range ◽

Leibler Divergence ◽

Perceptual Linear Prediction

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Download Full-text