Environmental Sound Recognition Using Time-Frequency Intersection Patterns

Temporal patterns and ensemble learning for environmental sound recognition

10.32920/ryerson.14653065.v1 ◽

2021 ◽

Author(s):

Wenjun Yang

Keyword(s):

Event Detection ◽

Temporal Dynamics ◽

Total Error ◽

Nonnegative Matrix ◽

Training Data ◽

Sound Recognition ◽

Environmental Sound ◽

Time Frequency ◽

Ensemble Techniques ◽

Environmental Sound Recognition

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.

Download Full-text

Environmental sound recognition using time-frequency intersection patterns

2011 3rd International Conference on Awareness Science and Technology (iCAST) ◽

10.1109/icawst.2011.6163148 ◽

2011 ◽

Author(s):

Xuan Guo ◽

Yoshiyuki Toyoda ◽

Huankang Li ◽

Jie Huang ◽

Shuxue Ding ◽

...

Keyword(s):

Sound Recognition ◽

Environmental Sound ◽

Time Frequency ◽

Environmental Sound Recognition

Download Full-text

Temporal patterns and ensemble learning for environmental sound recognition

10.32920/ryerson.14653065 ◽

2021 ◽

Author(s):

Wenjun Yang

Keyword(s):

Event Detection ◽

Temporal Dynamics ◽

Total Error ◽

Nonnegative Matrix ◽

Training Data ◽

Sound Recognition ◽

Environmental Sound ◽

Time Frequency ◽

Ensemble Techniques ◽

Environmental Sound Recognition

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.

Download Full-text

Recognition of Transient Environmental Sounds Based on Temporal and Frequency Features

International Journal of Automation Technology ◽

10.20965/ijat.2019.p0803 ◽

2019 ◽

Vol 13 (6) ◽

pp. 803-809

Author(s):

Shota Okubo ◽

◽

Zhihao Gong ◽

Kento Fujita ◽

Ken Sasaki

Keyword(s):

Recognition Rate ◽

Sound Recognition ◽

Spectral Information ◽

Support Vector ◽

Environmental Sound ◽

Time Duration ◽

Environmental Sounds ◽

Recognition Experiment ◽

Average Quantity ◽

Feature Values

Environmental sound recognition (ESR) refers to the recognition of all sounds other than the human voice or musical sounds. Typical ESR methods utilize spectral information and variation within it with respect to time. However, in the case of transient sounds, spectral information is insufficient because only an average quantity of a given signal within a time period can be recognized. In this study, the waveform of sound signals and their spectrum were analyzed visually to extract temporal characteristics of the sound more directly. Based on the observations, features such as the initial rise time, duration, and smoothness of the sound signal; the distribution and smoothness of the spectrum; the clarity of the sustaining sound components; and the number and interval of collisions in chattering were proposed. Experimental feature values were obtained for eight transient environmental sounds, and the distributions of the values were evaluated. A recognition experiment was conducted on 11 transient sounds. The Mel-frequency cepstral coefficient (MFCC) was selected as reference. A support vector machine was adopted as the classification algorithm. The recognition rates obtained from the MFCC were below 50% for five of the 11 sounds, and the overall recognition rate was 69%. In contrast, the recognition rates obtained using the proposed features were above 50% for all sounds, and the overall rate was 86%.

Download Full-text

Environmental Sound Recognition With Time–Frequency Audio Features

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2009.2017438 ◽

2009 ◽

Vol 17 (6) ◽

pp. 1142-1158 ◽

Cited By ~ 303

Author(s):

Selina Chu ◽

Shrikanth Narayanan ◽

C.-C. Jay Kuo

Keyword(s):

Sound Recognition ◽

Environmental Sound ◽

Time Frequency ◽

Audio Features ◽

Environmental Sound Recognition

Download Full-text

Environmental sound recognition: a survey

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2014.12 ◽

2014 ◽

Vol 3 ◽

Cited By ~ 35

Author(s):

Sachin Chachada ◽

C.-C. Jay Kuo

Keyword(s):

Spectral Characteristics ◽

Performance Comparison ◽

Sound Recognition ◽

Future Research ◽

Environmental Sound ◽

Environmental Sounds ◽

Recent Developments ◽

Music Signals ◽

Environmental Sound Recognition

Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

Download Full-text

Environmental sound recognition using MP-based features

2008 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2008.4517531 ◽

2008 ◽

Cited By ~ 24

Author(s):

Selina Chu ◽

Shrikanth Narayanan ◽

C.-C. Jay Kuo

Keyword(s):

Sound Recognition ◽

Environmental Sound ◽

Environmental Sound Recognition

Download Full-text

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Electronics ◽

10.3390/electronics10212622 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2622

Author(s):

Jurgen Vandendriessche ◽

Nick Wouters ◽

Bruno da Silva ◽

Mimoun Lamrini ◽

Mohamed Yassin Chkouri ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Machine Learning Techniques ◽

Sound Recognition ◽

Learning Approaches ◽

Environmental Sound ◽

Embedded Devices ◽

Power Efficient ◽

Computationally Intensive ◽

Environmental Sound Recognition

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Download Full-text

Environmental Sound Recognition with Classical Machine Learning Algorithms

Smart Industry & Smart Education - Lecture Notes in Networks and Systems ◽

10.1007/978-3-319-95678-7_2 ◽

2018 ◽

pp. 14-21

Author(s):

Nikolina Jekic ◽

Andreas Pester

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sound Recognition ◽

Environmental Sound ◽

Environmental Sound Recognition

Download Full-text

Automatic Environmental Sound Recognition (AESR) Using Convolutional Neural Network

International Journal of Modern Education and Computer Science ◽

10.5815/ijmecs.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-54

Author(s):

Md. Rayhan Ahmed ◽

◽

Towhidul Islam Robin ◽

Ashfaq Ali Shafin

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sound Recognition ◽

Environmental Sound ◽

Environmental Sound Recognition

Download Full-text