environmental sound recognition
Recently Published Documents


TOTAL DOCUMENTS

57
(FIVE YEARS 6)

H-INDEX

11
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2622
Author(s):  
Jurgen Vandendriessche ◽  
Nick Wouters ◽  
Bruno da Silva ◽  
Mimoun Lamrini ◽  
Mohamed Yassin Chkouri ◽  
...  

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.


2021 ◽  
Vol 11 (18) ◽  
pp. 8394
Author(s):  
Lancelot Lhoest ◽  
Mimoun Lamrini ◽  
Jurgen Vandendriessche ◽  
Nick Wouters ◽  
Bruno da Silva ◽  
...  

Environmental Sound Recognition has become a relevant application for smart cities. Such an application, however, demands the use of trained machine learning classifiers in order to categorize a limited set of audio categories. Although classical machine learning solutions have been proposed in the past, most of the latest solutions that have been proposed toward automated and accurate sound classification are based on a deep learning approach. Deep learning models tend to be large, which can be problematic when considering that sound classifiers often have to be embedded in resource constrained devices. In this paper, a classical machine learning based classifier called MosAIc, and a lighter Convolutional Neural Network model for environmental sound recognition, are proposed to directly compete in terms of accuracy with the latest deep learning solutions. Both approaches are evaluated in an embedded system in order to identify the key parameters when placing such applications on constrained devices. The experimental results show that classical machine learning classifiers can be combined to achieve similar results to deep learning models, and even outperform them in accuracy. The cost, however, is a larger classification time.


Cognition ◽  
2021 ◽  
Vol 214 ◽  
pp. 104627
Author(s):  
James Traer ◽  
Sam V. Norman-Haignere ◽  
Josh H. McDermott

2021 ◽  
Author(s):  
Wenjun Yang

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.


2021 ◽  
Author(s):  
Wenjun Yang

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.


Author(s):  
Michael S. Harris ◽  
Aaron C. Moberly ◽  
Ben L. Hamel ◽  
Kara Vasil ◽  
Christina L. Runge ◽  
...  

Purpose The aims of this study were (a) to longitudinally assess environmental sound recognition (ESR) before and after cochlear implantation in a sample of postlingually deafened adults and (b) to assess the extent to which spectro-temporal processing abilities influence ESR with cochlear implants (CIs). Method In a longitudinal cohort study, 20 postlingually deafened adults were tested with hearing aids on the Familiar Environmental Sound Test–Identification and AzBio sentences in quiet pre-CI and 6 months post-CI. A subset of 11 participants were also tested 12 months post-CI. Pre-CI spectro-temporal processing was assessed using the Spectral-temporally Modulated Ripple Test. Results Average ESR accuracy pre-CI ( M = 63.60%) was not significantly different from ESR accuracy at 6 months ( M = 65.40%) or 12 months ( M = 69.09%) post-CI. In 11 participants (55%), however, ESR improved following implantation by 10.91 percentage points, on average. Pre-CI ESR correlated moderately and significantly with pre-CI and 12-month post-CI AzBio scores, with a trend toward significance for AzBio performance at 6 months. Pre-CI spectro-temporal processing was moderately associated with ESR at 6 and 12 months post-CI but not with speech recognition post-CI. Conclusions The present findings failed to demonstrate an overall significant improvement in ESR following implantation. Nevertheless, more than half of our sample showed some degree of improvement in ESR. Several environmental sounds were poorly identified both before and after implantation. Spectro-temporal processing ability prior to implantation appears to predict postimplantation performance for ESR. These findings indicate the need for greater attention to ESR following cochlear implantation and for developing individualized targets for ESR rehabilitation. Supplemental Material https://doi.org/10.23641/asha.13876745


2020 ◽  
Vol 31 (04) ◽  
pp. 292-301 ◽  
Author(s):  
Kara J. Vasil ◽  
Jessica Lewis ◽  
Terrin Tamati ◽  
Christin Ray ◽  
Aaron C. Moberly

Abstract Background Objective speech recognition tasks are widely used to measure performance of adult cochlear implant (CI) users; however, the relationship of these measures with patient-reported quality of life (QOL) remains unclear. A comprehensive QOL measure, the Nijmegen Cochlear Implant Questionnaire (NCIQ), has historically shown a weak association with speech recognition performance, but closer examination may indicate stronger relations between QOL and objective auditory performance, particularly when examining a broad range of auditory skills. Purpose The aim of the present study was to assess the NCIQ for relations to speech and environmental sound recognition measures. Identifying associations with certain QOL domains, subdomains, and subitems would provide evidence that speech and environmental sound recognition measures are relevant to QOL. A lack of relations among QOL and various auditory abilities would suggest potential areas of patient-reported difficulty that could be better measured or targeted. Research Design A cross-sectional study was performed in adult CI users to examine relations among subjective QOL ratings on NCIQ domains, subdomains, and subitems with auditory outcome measures. Study Sample Participants were 44 adult experienced CI users. All participants were postlingually deafened and had met candidacy requirements for traditional cochlear implantation. Data Collection and Analysis Participants completed the NCIQ as well as several speech and environmental sound recognition tasks: monosyllabic word recognition, standard and high-variability sentence recognition, audiovisual sentence recognition, and environmental sound identification. Bivariate correlation analyses were performed to investigate relations among patient-reported NCIQ scores and the functional auditory measures. Results The total NCIQ score was not strongly correlated with any objective auditory outcome measures. The physical domain and the advanced sound perception subdomain related to several measures, in particular monosyllabic word recognition and AzBio sentence recognition. Fourteen of the 60 subitems on the NCIQ were correlated with at least one auditory measure. Conclusions Several subitems demonstrated moderate-to-strong correlations with auditory measures, indicating that these auditory measures are relevant to the QOL. A lack of relations with other subitems suggests a need for the development of objective measures that will better capture patients' hearing-related obstacles. Clinicians may use information obtained through the NCIQ to better estimate real-world performance, which may support improved counseling and development of recommendations for CI patients.


Sign in / Sign up

Export Citation Format

Share Document