scholarly journals Environmental Sound Recognition Using Time-Frequency Intersection Patterns

2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Xuan Guo ◽  
Yoshiyuki Toyoda ◽  
Huankang Li ◽  
Jie Huang ◽  
Shuxue Ding ◽  
...  

Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we use a multistage perceptron neural network system for environmental sound recognition. The input data is a combination of time-variance pattern of instantaneous powers and frequency-variance pattern with instantaneous spectrum at the power peak, referred to as a time-frequency intersection pattern. Spectra of many environmental sounds change more slowly than those of speech or voice, so the intersectional time-frequency pattern will preserve the major features of environmental sounds but with drastically reduced data requirements. Two experiments were conducted using an original database and an open database created by the RWCP project. The recognition rate for 20 kinds of environmental sounds was 92%. The recognition rate of the new method was about 12% higher than methods using only an instantaneous spectrum. The results are also comparable with HMM-based methods, although those methods need to treat the time variance of an input vector series with more complicated computations.

2021 ◽  
Author(s):  
Wenjun Yang

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.


2021 ◽  
Author(s):  
Wenjun Yang

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.


2019 ◽  
Vol 13 (6) ◽  
pp. 803-809
Author(s):  
Shota Okubo ◽  
◽  
Zhihao Gong ◽  
Kento Fujita ◽  
Ken Sasaki

Environmental sound recognition (ESR) refers to the recognition of all sounds other than the human voice or musical sounds. Typical ESR methods utilize spectral information and variation within it with respect to time. However, in the case of transient sounds, spectral information is insufficient because only an average quantity of a given signal within a time period can be recognized. In this study, the waveform of sound signals and their spectrum were analyzed visually to extract temporal characteristics of the sound more directly. Based on the observations, features such as the initial rise time, duration, and smoothness of the sound signal; the distribution and smoothness of the spectrum; the clarity of the sustaining sound components; and the number and interval of collisions in chattering were proposed. Experimental feature values were obtained for eight transient environmental sounds, and the distributions of the values were evaluated. A recognition experiment was conducted on 11 transient sounds. The Mel-frequency cepstral coefficient (MFCC) was selected as reference. A support vector machine was adopted as the classification algorithm. The recognition rates obtained from the MFCC were below 50% for five of the 11 sounds, and the overall recognition rate was 69%. In contrast, the recognition rates obtained using the proposed features were above 50% for all sounds, and the overall rate was 86%.


Author(s):  
Sachin Chachada ◽  
C.-C. Jay Kuo

Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2622
Author(s):  
Jurgen Vandendriessche ◽  
Nick Wouters ◽  
Bruno da Silva ◽  
Mimoun Lamrini ◽  
Mohamed Yassin Chkouri ◽  
...  

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.


Sign in / Sign up

Export Citation Format

Share Document