Audio signal feature extraction and classification using local discriminant bases

Author(s):  
K. Umapathy ◽  
Raveendra.K. Rao ◽  
S. Krishnan
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hima Bindu Valiveti ◽  
Anil Kumar B. ◽  
Lakshmi Chaitanya Duggineni ◽  
Swetha Namburu ◽  
Swaraja Kuraparthi

Purpose Road accidents, an inadvertent mishap can be detected automatically and alerts sent instantly with the collaboration of image processing techniques and on-road video surveillance systems. However, to rely exclusively on visual information especially under adverse conditions like night times, dark areas and unfavourable weather conditions such as snowfall, rain, and fog which result in faint visibility lead to incertitude. The main goal of the proposed work is certainty of accident occurrence. Design/methodology/approach The authors of this work propose a method for detecting road accidents by analyzing audio signals to identify hazardous situations such as tire skidding and car crashes. The motive of this project is to build a simple and complete audio event detection system using signal feature extraction methods to improve its detection accuracy. The experimental analysis is carried out on a publicly available real time data-set consisting of audio samples like car crashes and tire skidding. The Temporal features of the recorded audio signal like Energy Volume Zero Crossing Rate 28ZCR2529 and the Spectral features like Spectral Centroid Spectral Spread Spectral Roll of factor Spectral Flux the Psychoacoustic features Energy Sub Bands ratio and Gammatonegram are computed. The extracted features are pre-processed and trained and tested using Support Vector Machine (SVM) and K-nearest neighborhood (KNN) classification algorithms for exact prediction of the accident occurrence for various SNR ranges. The combination of Gammatonegram with Temporal and Spectral features of the validates to be superior compared to the existing detection techniques. Findings Temporal, Spectral, Psychoacoustic features, gammetonegram of the recorded audio signal are extracted. A High level vector is generated based on centroid and the extracted features are classified with the help of machine learning algorithms like SVM, KNN and DT. The audio samples collected have varied SNR ranges and the accuracy of the classification algorithms is thoroughly tested. Practical implications Denoising of the audio samples for perfect feature extraction was a tedious chore. Originality/value The existing literature cites extraction of Temporal and Spectral features and then the application of classification algorithms. For perfect classification, the authors have chosen to construct a high level vector from all the four extracted Temporal, Spectral, Psycho acoustic and Gammetonegram features. The classification algorithms are employed on samples collected at varied SNR ranges.


2003 ◽  
Vol 1840 (1) ◽  
pp. 186-192 ◽  
Author(s):  
Lori Mann Bruce ◽  
Navaneethakrishnan Balraj ◽  
Yunlong Zhang ◽  
Qingyong Yu

A system for automated traffic accident detection in intersections was designed. The input to the system is a 3-s segment of audio signal. The system can be operated in two modes: the two-class and multiclass modes. The output of the two-class mode is a label of “crash” or “noncrash.” In the multiclass mode of operation, the system identifies crashes as well as several types of noncrash incidents, including normal traffic and construction sounds. The system is composed of three main signal processing stages: feature extraction, feature reduction, and classification. Five methods of feature extraction were investigated and compared; these are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstral transform, and mel frequency cepstral transform. Statistical methods are used for feature optimization and classification. Three types of classifiers are investigated and compared; these are the nearest-mean, maximum-likelihood, and nearest-neighbor methods. The results of the study show that the optimum design uses wavelet-based features in combination with the maximum-likelihood classifier. The system is computationally inexpensive relative to the other methods investigated, and the system consistently results in accident detection accuracies of 95% to 100% when the audio signal has a signal-to-noise-ratio of at least 0 decibels.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Jingwen Zhang

With the rapid development of information technology and communication, digital music has grown and exploded. Regarding how to quickly and accurately retrieve the music that users want from huge bulk of music repository, music feature extraction and classification are considered as an important part of music information retrieval and have become a research hotspot in recent years. Traditional music classification approaches use a large number of artificially designed acoustic features. The design of features requires knowledge and in-depth understanding in the domain of music. The features of different classification tasks are often not universal and comprehensive. The existing approach has two shortcomings as follows: ensuring the validity and accuracy of features by manually extracting features and the traditional machine learning classification approaches not performing well on multiclassification problems and not having the ability to be trained on large-scale data. Therefore, this paper converts the audio signal of music into a sound spectrum as a unified representation, avoiding the problem of manual feature selection. According to the characteristics of the sound spectrum, the research has combined 1D convolution, gating mechanism, residual connection, and attention mechanism and proposed a music feature extraction and classification model based on convolutional neural network, which can extract more relevant sound spectrum characteristics of the music category. Finally, this paper designs comparison and ablation experiments. The experimental results show that this approach is better than traditional manual models and machine learning-based approaches.


Author(s):  
Tam Chi Nguyen ◽  
Lam Dang Pham ◽  
Hieu Minh Nguyen ◽  
Bao Gia Bui ◽  
Dat Thanh Ngo ◽  
...  

2019 ◽  
Vol 107 ◽  
pp. 10-17 ◽  
Author(s):  
Naghmeh Mahmoodian ◽  
Anna Schaufler ◽  
Ali Pashazadeh ◽  
Axel Boese ◽  
Michael Friebe ◽  
...  

2007 ◽  
Vol 15 (4) ◽  
pp. 1236-1246 ◽  
Author(s):  
Karthikeyan Umapathy ◽  
Sridhar Krishnan ◽  
Raveendra K. Rao

Author(s):  
Fatima Al-Quayed ◽  
Adel Soudani ◽  
Saad Al-Ahmadi

AbstractWireless acoustic sensor networks represent an attractive solution that can be deployed for animal detection and recognition in a monitored area. A typical configuration for this application would be to transmit the whole acquired audio signal through multi-hop communication to a remote server for recognition. However, continuous data streaming can cause a severe decline in the energy of the sensors, which consequently reduces the network lifetime and questions the viability of the application. An efficient solution to reduce the sensor's radio activity would be to perform the recognition task at the source sensor then to communicate the result to the remote server. This approach is intended to save the energy of the acoustic source sensor and to unload the network from carrying, probably, useless data. However, the validity of this solution depends on the energy efficiency of performing on-sensor detection of a new acoustic event and accurate recognition. In this context, this paper proposes a new scheme for on-sensor energy-efficient acoustic animal recognition based on low-complexity methods for feature extraction using the Haar wavelet transform. This scheme achieves more than 86% in recognition accuracy while saving 71.59% of the sensor energy compared with the transmission of the raw signal.


Author(s):  
Andrej Zgank ◽  
Damjan Vlaj

The chapter presents acoustic presence detection, which can be applied to support the smart home system with information about the presence of humans in the environment. The acoustic presence detection is based on digital signal processing and machine learning methods, with the objective to classify the captured audio signal into the corresponding class. An analysis of different audio capturing devices for a smart home environment from the perspective of acoustic presence detection will be carried out. The presence detection task consists of voice activity detection, feature extraction, and classification. The extension of acoustic presence detection with additional information about the user's characteristics is proposed. This information can be used to optimize the smart home human-computer interface with personalization and customization functionalities.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1698
Author(s):  
Iordanis Thoidis ◽  
Lazaros Vrysis ◽  
Dimitrios Markou ◽  
George Papanikolaou

Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals.


2012 ◽  
Vol 546-547 ◽  
pp. 675-679
Author(s):  
Gao Huan Xu ◽  
Jun Xiang Ye

Due to the different structure of the machine parts, machine vibrations sent audio signal have different frequency. The early defect, audio signal can be analyzed well by wavelet packet transform. After wavelet packet decomposition and reconstruction, Audio signal noise reduced. And then through high and low frequency decomposition, we can constitute the energy characteristics. The experiment shows: the extracted features have good structure.


Sign in / Sign up

Export Citation Format

Share Document