Environment Recognition Using Selected MPEG-7 Audio Features and Mel-Frequency Cepstral Coefficients

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

Journal of Electrical Engineering ◽

10.2478/v10187-011-0032-0 ◽

2011 ◽

Vol 62 (4) ◽

pp. 199-205 ◽

Cited By ~ 7

Author(s):

Ghulam Muhammad ◽

Khalid Alghathbar

Keyword(s):

Recognition Accuracy ◽

Digital Audio ◽

Mel Frequency Cepstral Coefficients ◽

Environmental Sound ◽

Area Of Interest ◽

Audio Features ◽

Audio Forensics ◽

Cepstral Features ◽

Environment Recognition ◽

Audio Files

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral FeaturesEnvironment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

Download Full-text

Development of Machine Learning for Asthmatic and Healthy Voluntary Cough Sounds: A Proof of Concept Study

Applied Sciences ◽

10.3390/app9142833 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2833 ◽

Cited By ~ 6

Author(s):

Hwan Ing Hee ◽

BT Balamurali ◽

Arivazhagan Karunakaran ◽

Dorien Herremans ◽

Onn Hoe Teoh ◽

...

Keyword(s):

Machine Learning ◽

Demographic Data ◽

Predictive Performance ◽

Gaussian Mixture ◽

Classification Model ◽

Healthy Children ◽

Mel Frequency Cepstral Coefficients ◽

Voluntary Cough ◽

Audio Features ◽

Cepstral Coefficients

(1) Background: Cough is a major presentation in childhood asthma. Here, we aim to develop a machine-learning based cough sound classifier for asthmatic and healthy children. (2) Methods: Children less than 16 years old were randomly recruited in a Children’s Hospital, from February 2017 to April 2018, and were divided into 2 cohorts—healthy children and children with acute asthma presenting with cough. Children with other concurrent respiratory conditions were excluded in the asthmatic cohort. Demographic data, duration of cough, and history of respiratory status were obtained. Children were instructed to produce voluntary cough sounds. These clinically labeled cough sounds were randomly divided into training and testing sets. Audio features such as Mel-Frequency Cepstral Coefficients and Constant-Q Cepstral Coefficients were extracted. Using a training set, a classification model was developed with Gaussian Mixture Model–Universal Background Model (GMM-UBM). Its predictive performance was tested using the test set against the physicians’ labels. (3) Results: Asthmatic cough sounds from 89 children (totaling 1192 cough sounds) and healthy coughs from 89 children (totaling 1140 cough sounds) were analyzed. The sensitivity and specificity of the audio-based classification model was 82.81% and 84.76%, respectively, when differentiating coughs from asthmatic children versus coughs from ‘healthy’ children. (4) Conclusion: Audio-based classification using machine learning is a potentially useful technique in assisting the differentiation of asthmatic cough sounds from healthy voluntary cough sounds in children.

Download Full-text

On the Comparison of Line Spectral Frequencies and Mel-Frequency Cepstral Coefficients Using Feedforward Neural Network for Language Identification

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v10.i1.pp168-175 ◽

2018 ◽

Vol 10 (1) ◽

pp. 168 ◽

Cited By ~ 2

Author(s):

Teddy Surya Gunawan ◽

Mira Kartiwi

Keyword(s):

Neural Network ◽

Recognition Rate ◽

Language Identification ◽

Feedforward Neural Network ◽

Majority Voting ◽

Mel Frequency Cepstral Coefficients ◽

Frame Size ◽

Audio Features ◽

Cepstral Coefficients ◽

The Many

Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and Mel-Frequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.

Download Full-text

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Neural Computing and Applications ◽

10.1007/s00521-021-05782-5 ◽

2021 ◽

Author(s):

Shivangi Raj ◽

P. Prakasam ◽

Shubham Gupta

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Audio Signal ◽

Signal Denoising ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Hidden Markov model neurons classification based on Mel-frequency cepstral coefficients

2014 9th International Conference on System of Systems Engineering (SOSE) ◽

10.1109/sysose.2014.6892482 ◽

2014 ◽

Cited By ~ 5

Author(s):

Sherif Haggag ◽

Shady Mohamed ◽

Hussein Haggag ◽

Saeid Nahavandi

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Download Full-text

Neural FET small-signal modelling based on mel-frequency cepstral coefficients

2009 International Conference on Computer Engineering & Systems ◽

10.1109/icces.2009.5383249 ◽

2009 ◽

Cited By ~ 1

Author(s):

Rania R. Elsharkawy ◽

Sayed El-Rabaie ◽

Moataza Hindy ◽

Reda S. Ghoname ◽

Moawad I. Dessouky

Keyword(s):

Small Signal ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Signal Modelling

Download Full-text

Cough Classification Method with Mel-Frequency Cepstral Coefficients and Gaussian-type Evaluation Function

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2021.1a1-c02 ◽

2021 ◽

Vol 2021 (0) ◽

pp. 1A1-C02

Author(s):

Azumi Okubo ◽

Takehito Kikuchi

Keyword(s):

Classification Method ◽

Evaluation Function ◽

Mel Frequency Cepstral Coefficients ◽

Gaussian Type ◽

Cepstral Coefficients

Download Full-text

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

Communications in Computer and Information Science - Applied Technologies ◽

10.1007/978-3-030-42520-3_20 ◽

2020 ◽

pp. 245-253

Author(s):

María José Mora-Regalado ◽

Omar Ruiz-Vivanco ◽

Alexandra González-Eras ◽

Pablo Torres-Carrión

Keyword(s):

Neural Network ◽

Real Time ◽

Deep Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Real Time Classification

Download Full-text

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i2.pp782-789 ◽

2020 ◽

Vol 18 (2) ◽

pp. 782

Author(s):

Musab T. S. Al-Kaltakchi ◽

Haithem Abd Al-Raheem Taha ◽

Mohanad Abd Shehab ◽

Mohamed A.M. Abdullah

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture ◽

Identification Accuracy ◽

Identification System ◽

Good Representation ◽

Mel Frequency Cepstral Coefficients ◽

Normalization Methods ◽

Cepstral Coefficients

In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method.

Download Full-text

Reconocimiento de emociones en el habla

TecnoLógicas ◽

10.22430/22565337.256 ◽

2008 ◽

pp. 113

Author(s):

Julián D. Echeverry-Correa ◽

Mauricio Morales-Pérez

Keyword(s):

Emotional Speech ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients

Se presenta en este trabajo una metodología para la caracterización de la señal de voz aplicada al reconocimiento de estados emocionales. Son estudiadas cuatro emociones primarias (alegría, enojo, sorpresa y tristeza) y un estado neutral. Se realizó un análisis en el dominio temporal y un análisis acústico empleando los MFCC (Mel Frequency Cepstral Coefficients). Las pruebas comprueban la efectividad de la metodología en el reconocimiento de las emociones superando el reconocimiento realizado por un grupo de personas. Se obtiene un porcentaje de 94.00% de acierto en el reconocimiento de emociones trabajando sobre la base de SES (Spanish emotional speech).

Download Full-text