scholarly journals Automatic Scene Recognition through Acoustic Classification for Behavioral Robotics

Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 483 ◽  
Author(s):  
Sumair Aziz ◽  
Muhammad Awais ◽  
Tallha Akram ◽  
Umar Khan ◽  
Musaed Alhussein ◽  
...  

Classification of complex acoustic scenes under real time scenarios is an active domain which has engaged several researchers lately form the machine learning community. A variety of techniques have been proposed for acoustic patterns or scene classification including natural soundscapes such as rain/thunder, and urban soundscapes such as restaurants/streets, etc. In this work, we present a framework for automatic acoustic classification for behavioral robotics. Motivated by several texture classification algorithms used in computer vision, a modified feature descriptor for sound is proposed which incorporates a combination of 1-D local ternary patterns (1D-LTP) and baseline method Mel-frequency cepstral coefficients (MFCC). The extracted feature vector is later classified using a multi-class support vector machine (SVM), which is selected as a base classifier. The proposed method is validated on two standard benchmark datasets i.e., DCASE and RWCP and achieves accuracies of 97.38 % and 94.10 % , respectively. A comparative analysis demonstrates that the proposed scheme performs exceptionally well compared to other feature descriptors.

Author(s):  
Yassine Ben Salem ◽  
Mohamed Naceur Abdelkrim

In this paper, a novel algorithm for automatic fabric defect classification was proposed, based on the combination of a texture analysis method and a support vector machine SVM. Three texture methods were used and compared, GLCM, LBP, and LPQ. They were combined with SVM’s classifier. The system has been tested using TILDA database. A comparative study of the performance and the running time of the three methods was carried out. The obtained results are interesting and show that LBP is the best method for recognition and classification and it proves that the SVM is a suitable classifier for such problems. We demonstrate that some defects are easier to classify than others.


2019 ◽  
Vol 9 (3) ◽  
pp. 66-69
Author(s):  
Róża Dzierżak

The aim of this article was to compare the influence of the data pre-processing methods – normalization and standardization – on the results of the classification of spongy tissue images. Four hundred CT images of the spine (L1 vertebra) were used for the analysis. The images were obtained from fifty healthy patients and fifty patients with diagnosed with osteoporosis. The samples of tissue (50×50 pixels) were subjected to a texture analysis to obtain descriptors of features based on a histogram of grey levels, gradient, run length matrix, co-occurrence matrix, autoregressive model and wavelet transform. The obtained results were set in the importance ranking (from the most important to the least important), and the first fifty features were used for further experiments. These data were normalized and standardized and then classified using five different methods: naive Bayes classifier, support vector machine, multilayer perceptrons, random forest and classification via regression. The best results were obtained for standardized data and classified by using multilayer perceptrons. This algorithm allowed for obtaining high accuracy of classification at the level of 94.25%.


Author(s):  
Ergün Yücesoy

In this study, the classification of the speakers according to age and gender was discussed. Age and gender classes were first examined separately, and then by combining these classes a classification with a total of 7 classes was made. Speech signals represented by Mel-Frequency Cepstral Coefficients (MFCC) and delta parameters were converted into Gaussian Mixture Model (GMM) mean supervectors and classified with a Support Vector Machine (SVM). While the GMM mean supervectors were formed according to the Maximum-a-posteriori (MAP) adaptive GMM-Universal Background Model (UBM) configuration, the number of components was changed from 16 to 512, and the optimum number of components was decided. Gender classification accuracy of the system developed using aGender dataset was measured as 99.02% for two classes and 92.58% for three classes and age group classification accuracy was measured as 67.03% for female and 63.79% for male. In the classification of age and gender classes together in one step, an accuracy of 61.46% was obtained. In the study, a two-level approach was proposed for classifying age and gender classes together. According to this approach, the speakers were first divided into three classes as child, male and female, then males and females were classified according to their age groups and thus a 7-class classification was realized. This two-level approach was increased the accuracy of the classification in all other cases except when 32-component GMMs were used. While the highest improvement of 2.45% was achieved with 64 component GMMs, an improvement of 0.79 was achieved with 256 component GMMs.


2022 ◽  
pp. 828-847
Author(s):  
Gaurav Aggarwal ◽  
Latika Singh

Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: Linear Predictive Coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.


2021 ◽  
Author(s):  
Muhammad Zubair

Traditionally, the heart sound classification process is performed by first finding the elementary heart sounds of the phonocardiogram (PCG) signal. After detecting sounds S1 and S2, the features like envelograms, Mel frequency cepstral coefficients (MFCC), kurtosis, etc., of these sounds are extracted. These features are used for the classification of normal and abnormal heart sounds, which leads to an increase in computational complexity. In this paper, we have proposed a fully automated algorithm to localize heart sounds using K-means clustering. The K-means clustering model can differentiate between the primitive heart sounds like S1, S2, S3, S4 and the rest of the insignificant sounds like murmurs without requiring the excessive pre-processing of data. The peaks detected from the noisy data are validated by implementing five classification models with 30 fold cross-validation. These models have been implemented on a publicly available PhysioNet/Cinc challenge 2016 database. Lastly, to classify between normal and abnormal heart sounds, the localized labelled peaks from all the datasets were fed as an input to the various classifiers such as support vector machine (SVM), K-nearest neighbours (KNN), logistic regression, stochastic gradient descent (SGD) and multi-layer perceptron (MLP). To validate the superiority of the proposed work, we have compared our reported metrics with the latest state-of-the-art works. Simulation results show that the highest classification accuracy of 94.75% is achieved by the SVM classifier among all other classifiers.


2020 ◽  
Author(s):  
Rob Dunne ◽  
Tim Morris ◽  
Simon Harper

Abstract Diagnosing COVID-19 early in domestic settings is possible through smart home devices that can classify audio input of coughs, and determine whether they are COVID-19. Research is currently sparse in this area and data is difficult to obtain. However, a few small data collection projects have enabled audio classification research into the application of different machine learning classification algorithms, including Logistic Regression (LR), Support Vector Machines (SVM), and Convolution Neural Networks (CNN). We show here that a CNN using audio converted to Mel-frequency cepstral coefficient spectrogram images as input can achieve high accuracy results; with classification of validation data scoring an accuracy of 97.5% correct classification of covid and not covid labelled audio. The work here provides a proof of concept that high accuracy can be achieved with a small dataset, which can have a significant impact in this area. The results are highly encouraging and provide further opportunities for research by the academic community on this important topic.


2012 ◽  
Vol 22 (02) ◽  
pp. 1250001 ◽  
Author(s):  
MERCEDES CABRERIZO ◽  
MELVIN AYALA ◽  
MOHAMMED GORYAWALA ◽  
PRASANNA JAYAKAR ◽  
MALEK ADJOUADI

This study evaluates the sensitivity, specificity and accuracy in associating scalp EEG to either control or epileptic patients by means of artificial neural networks (ANNs) and support vector machines (SVMs). A confluence of frequency and temporal parameters are extracted from the EEG to serve as input features to well-configured ANN and SVM networks. Through these classification results, we thus can infer the occurrence of high-risk (epileptic) as well as low risk (control) patients for potential follow up procedures.


Author(s):  
Arijit Ghosal ◽  
Suchibrota Dutta ◽  
Debanjan Banerjee

Automatic recognition of instrument types from an audio signal is a challenging and a promising research topic. It is challenging as there has been work performed in this domain and because of its applications in the music industry. Different broad categories of instruments like strings, woodwinds, etc., have already been identified. Very few works have been done for the sub-categorization of different categories of instruments. Mel Frequency Cepstral Coefficients (MFCC) is a frequently used acoustic feature. In this work, a hierarchical scheme is proposed to classify string instruments without using MFCC-based features. Chroma reflects the strength of notes in a Western 12-note scale. Chroma-based features are able to differentiate from the different broad categories of string instruments in the first level. The identity of an instrument can be traced through the sound envelope produced by a note which bears a certain pitch. Pitch-based features have been considered to further sub-classify string instruments in the second level. To classify, a neural network, k-NN, Naïve Bayes' and Support Vector Machine have been used.


Sign in / Sign up

Export Citation Format

Share Document