Environment Recognition Using Selected MPEG-7 Audio Features and Mel-Frequency Cepstral Coefficients

Author(s):  
Ghulam Muhammad ◽  
Yousef A. Alotaibi ◽  
Mansour Alsulaiman ◽  
Mohammad Nurul Huda
2011 ◽  
Vol 62 (4) ◽  
pp. 199-205 ◽  
Author(s):  
Ghulam Muhammad ◽  
Khalid Alghathbar

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral FeaturesEnvironment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.


2019 ◽  
Vol 9 (14) ◽  
pp. 2833 ◽  
Author(s):  
Hwan Ing Hee ◽  
BT Balamurali ◽  
Arivazhagan Karunakaran ◽  
Dorien Herremans ◽  
Onn Hoe Teoh ◽  
...  

(1) Background: Cough is a major presentation in childhood asthma. Here, we aim to develop a machine-learning based cough sound classifier for asthmatic and healthy children. (2) Methods: Children less than 16 years old were randomly recruited in a Children’s Hospital, from February 2017 to April 2018, and were divided into 2 cohorts—healthy children and children with acute asthma presenting with cough. Children with other concurrent respiratory conditions were excluded in the asthmatic cohort. Demographic data, duration of cough, and history of respiratory status were obtained. Children were instructed to produce voluntary cough sounds. These clinically labeled cough sounds were randomly divided into training and testing sets. Audio features such as Mel-Frequency Cepstral Coefficients and Constant-Q Cepstral Coefficients were extracted. Using a training set, a classification model was developed with Gaussian Mixture Model–Universal Background Model (GMM-UBM). Its predictive performance was tested using the test set against the physicians’ labels. (3) Results: Asthmatic cough sounds from 89 children (totaling 1192 cough sounds) and healthy coughs from 89 children (totaling 1140 cough sounds) were analyzed. The sensitivity and specificity of the audio-based classification model was 82.81% and 84.76%, respectively, when differentiating coughs from asthmatic children versus coughs from ‘healthy’ children. (4) Conclusion: Audio-based classification using machine learning is a potentially useful technique in assisting the differentiation of asthmatic cough sounds from healthy voluntary cough sounds in children.


Author(s):  
Teddy Surya Gunawan ◽  
Mira Kartiwi

<p>Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and Mel-Frequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.</p>


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


TecnoLógicas ◽  
2008 ◽  
pp. 113
Author(s):  
Julián D. Echeverry-Correa ◽  
Mauricio Morales-Pérez

Se presenta en este trabajo una metodología para la caracterización de la señal de voz aplicada al reconocimiento de estados emocionales. Son estudiadas cuatro emociones primarias (alegría, enojo, sorpresa y tristeza) y un estado neutral. Se realizó un análisis en el dominio temporal y un análisis acústico empleando los MFCC (Mel Frequency Cepstral Coefficients). Las pruebas comprueban la efectividad de la metodología en el reconocimiento de las emociones superando el reconocimiento realizado por un grupo de personas. Se obtiene un porcentaje de 94.00% de acierto en el reconocimiento de emociones trabajando sobre la base de SES (Spanish emotional speech).


Sign in / Sign up

Export Citation Format

Share Document