perceptual linear prediction
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

Adverse drug effects are a major cause of death across the world each year because of prescription errors. Many of such errors involve the administration of the wrong drug or dosage by care givers to patients due to indecipherable handwritings, drug interactions, confusing drug names etc. The adoption of voice-based prescription project could eliminate some of these errors because they allow prescription information to be captured and heard through voice response rather than in the physician’s handwriting. Our project will generate an electronic prescription using a “Speech to Text converter” (Perceptual Linear Prediction (PLP)) and capture the data from the keywords spoken by doctor(s). There won’t be any need to carry paper prescriptions on revisiting doctors. A patient will be able to share his historic medical records to a new doctor. This project also provide facility to sign the prescription and send to the patient directly on his phone and email id. The System enables the patient to manage the privacy of their personal health record. This project is proposed to target those doctors and clinics that are still using paper-based handwritten prescriptions


2020 ◽  
Vol 12 (5) ◽  
pp. 1-8
Author(s):  
Nahyan Al Mahmud ◽  
Shahfida Amjad Munni

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.


Audio content understanding is an active research problem in the area of speech analytics. A novel approach for content-based news audio classification using Multiple Instance Learning (MIL) approach is introduced in this paper. Content-based analysis provides useful information for audio classification as well as segmentation. A key step taken in this direction is to propose a classifier that can predict the category of the input audio sample. There are two types of features used for audio content detection, namely, Perceptual Linear Prediction (PLP) coefficients and Mel-Frequency Cepstral Coefficients (MFCC). Two MIL techniques viz. mi-Graph and mi-SVM are used for classification purpose. The results obtained using these methods are evaluated using different performance matrices. From the experimental results, it is marked that the MIL demonstrates excellent audio classification capability.


2020 ◽  
Vol 17 (1) ◽  
pp. 303-307
Author(s):  
S. Lalitha ◽  
Deepa Gupta

Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual linear prediction coefficients (PLPCs) are widely casted nonlinear vocal parameters in majority of the speaker identification, speaker and speech recognition techniques as well in the field of emotion recognition. Post 1980s, significant exertions are put forth on for the progress of these features. Considerations like the usage of appropriate frequency estimation approaches, proposal of appropriate filter banks, and selection of preferred features perform a vital part for the strength of models employing these features. This article projects an overview of MFCC and PLPC features for different speech applications. The insights such as performance metrics of accuracy, background environment, type of data, and size of features are inspected and concise with the corresponding key references. Adding more to this, the advantages and shortcomings of these features have been discussed. This background work will hopefully contribute to floating a heading step in the direction of the enhancement of MFCC and PLPC with respect to novelty, raised levels of accuracy, and lesser complexity.


2020 ◽  
pp. 283-293
Author(s):  
Imen Trabelsi ◽  
Med Salim Bouhlel

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3481 ◽  
Author(s):  
Frederico Soares Cabral ◽  
Hidekazu Fukai ◽  
Satoshi Tamura

The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.


2019 ◽  
Vol 8 (02) ◽  
pp. 24469-24472
Author(s):  
Thiruven Gatanadhan R

Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. This paper deals with the Speech/Music classification problem, starting from a set of features extracted directly from audio data. Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. The accuracy of the classification relies on the strength of the features and classification scheme. In this work Perceptual Linear Prediction (PLP) features are extracted from the input signal. After feature extraction, classification is carried out, using Support Vector Model (SVM) model. The proposed feature extraction and classification models results in better accuracy in speech/music classification.


2018 ◽  
Vol 29 (1) ◽  
pp. 959-976
Author(s):  
Mohit Dua ◽  
Rajesh Kumar Aggarwal ◽  
Mantosh Biswas

Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.


Sign in / Sign up

Export Citation Format

Share Document