Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach

This research focuses on the application of joint time-frequency (TF) analysis for watermarking and classifying different audio signals. Time frequency analysis which originated in the 1930s has often been used to model the non-stationary behaviour of speech and audio signals. By taking into consideration the human auditory system which has many non-linear effects and its masking properties, we can extract efficient features from the TF domain to watermark or classify signals. This novel audio watermarking scheme is based on spread spectrum techniques and uses content-based analysis to detect the instananeous mean frequency (IMF) of the input signal. The watermark is embedded in this perceptually significant region such that it will resist attacks. Audio watermarking offers a solution to data privacy and helps to protect the rights of the artists and copyright holders. Using the IMF, we aim to keep the watermark imperceptible while maximizing its robustness. In this case, 25 bits are embedded and recovered witin a 5 s sample of an audio signal. This scheme has shown to be robust against various signal processing attacks including filtering, MP3 compression, additive moise and resampling with a bit error rate in the range of 0-13%. In addition content-based classification is performed using TF analysis to classify sounds into 6 music groups consisting of rock, classical, folk, jazz and pop. The features that are extracted include entropy, centroid, centroid ratio, bandwidth, silence ratio, energy ratio, frequency location of minimum and maximum energy. Using a database of 143 signals, a set of 10 time-frequncy features are extracted and an accuracy of classification of around 93.0% using regular linear discriminant analysis or 92.3% using leave one out method is achieved.

Download Full-text

Sound source distance estimation using a small-size microphone array

The Journal of the Acoustical Society of America ◽

10.1121/1.4709222 ◽

2012 ◽

Vol 131 (4) ◽

pp. 3499-3499

Author(s):

Satoshi Esaki ◽

Takanori Nishino ◽

Kazuya Takeda

Keyword(s):

Sound Source ◽

Microphone Array ◽

Distance Estimation ◽

Source Distance

Download Full-text

Content Based Audio Watermarking and Retrieval Using Time-Frequency Analysis

10.32920/ryerson.14655894 ◽

2021 ◽

Author(s):

Shahrzad Esmaili

Keyword(s):

Frequency Analysis ◽

Spread Spectrum ◽

Data Privacy ◽

Audio Signal ◽

Maximum Energy ◽

Audio Watermarking ◽

Audio Signals ◽

Time Frequency Analysis ◽

Linear Discriminant ◽

Time Frequency

This research focuses on the application of joint time-frequency (TF) analysis for watermarking and classifying different audio signals. Time frequency analysis which originated in the 1930s has often been used to model the non-stationary behaviour of speech and audio signals. By taking into consideration the human auditory system which has many non-linear effects and its masking properties, we can extract efficient features from the TF domain to watermark or classify signals. This novel audio watermarking scheme is based on spread spectrum techniques and uses content-based analysis to detect the instananeous mean frequency (IMF) of the input signal. The watermark is embedded in this perceptually significant region such that it will resist attacks. Audio watermarking offers a solution to data privacy and helps to protect the rights of the artists and copyright holders. Using the IMF, we aim to keep the watermark imperceptible while maximizing its robustness. In this case, 25 bits are embedded and recovered witin a 5 s sample of an audio signal. This scheme has shown to be robust against various signal processing attacks including filtering, MP3 compression, additive moise and resampling with a bit error rate in the range of 0-13%. In addition content-based classification is performed using TF analysis to classify sounds into 6 music groups consisting of rock, classical, folk, jazz and pop. The features that are extracted include entropy, centroid, centroid ratio, bandwidth, silence ratio, energy ratio, frequency location of minimum and maximum energy. Using a database of 143 signals, a set of 10 time-frequncy features are extracted and an accuracy of classification of around 93.0% using regular linear discriminant analysis or 92.3% using leave one out method is achieved.

Download Full-text

Supervised Learning-based Sound Source Distance Estimation Using Multivariate Features

10.1109/tensymp52854.2021.9551007 ◽

2021 ◽

Author(s):

Kalamkas Zhagyparova ◽

Ruslan Zhagypar ◽

Amin Zollanvari ◽

Muhammad Tahir Akhtar

Keyword(s):

Supervised Learning ◽

Sound Source ◽

Distance Estimation ◽

Source Distance

Download Full-text

Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2260155 ◽

2013 ◽

Vol 21 (8) ◽

pp. 1727-1741 ◽

Cited By ~ 18

Author(s):

Eleftheria Georganti ◽

Tobias May ◽

Steven van de Par ◽

John Mourjopoulos

Keyword(s):

Sound Source ◽

Distance Estimation ◽

Statistical Properties ◽

Source Distance

Download Full-text

IoT-Based Bee Swarm Activity Acoustic Classification Using Deep Neural Networks

Sensors ◽

10.3390/s21030676 ◽

2021 ◽

Vol 21 (3) ◽

pp. 676

Author(s):

Andrej Zgank

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Markov Models ◽

Audio Signal ◽

Audio Signals ◽

Mel Frequency Cepstral Coefficients ◽

Animal Activity ◽

The Impact ◽

Acoustic Classification ◽

Swarm Activity

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.

Download Full-text

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Electronics ◽

10.3390/electronics10111349 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1349

Author(s):

Stefan Lattner ◽

Javier Nistal

Keyword(s):

Data Storage ◽

Audio Signal ◽

Human Perception ◽

Generative Adversarial Networks ◽

Audio Signals ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Extensive Evaluation ◽

Listening Tests ◽

Musical Audio

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.

Download Full-text

Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-020-00181-5 ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Benjamin Yen ◽

Yusuke Hioka

Keyword(s):

Spectral Density ◽

Noise Reduction ◽

Power Spectral Density ◽

Sound Source ◽

Processing Algorithm ◽

Post Processing ◽

Time Frequency ◽

Distance Error ◽

Rotor Noise ◽

Power Spectral

Abstract A method to locate sound sources using an audio recording system mounted on an unmanned aerial vehicle (UAV) is proposed. The method introduces extension algorithms to apply on top of a baseline approach, which performs localisation by estimating the peak signal-to-noise ratio (SNR) response in the time-frequency and angular spectra with the time difference of arrival information. The proposed extensions include a noise reduction and a post-processing algorithm to address the challenges in a UAV setting. The noise reduction algorithm reduces influences of UAV rotor noise on localisation performance, by scaling the SNR response using power spectral density of the UAV rotor noise, estimated using a denoising autoencoder. For the source tracking problem, an angular spectral range restricted peak search and link post-processing algorithm is also proposed to filter out incorrect location estimates along the localisation path. Experimental results show the proposed extensions yielded improvements in locating the target sound source correctly, with a 0.0064–0.175 decrease in mean haversine distance error across various UAV operating scenarios. The proposed method also shows a reduction in unexpected location estimations, with a 0.0037–0.185 decrease in the 0.75 quartile haversine distance error.

Download Full-text

Near-Field Discrimination of Sound Source Distance in the Rabbit

Journal of the Association for Research in Otolaryngology ◽

10.1007/s10162-014-0505-5 ◽

2015 ◽

Vol 16 (2) ◽

pp. 255-262 ◽

Cited By ~ 2

Author(s):

Shigeyuki Kuwada ◽

Duck O. Kim ◽

Kelly-Jo Koch ◽

Kristina S. Abrams ◽

Fabio Idrobo ◽

...

Keyword(s):

Sound Source ◽

Near Field ◽

Source Distance

Download Full-text

Using Monte Carlo method to estimate the behavior of neural training between balanced and unbalanced data in classification of patterns

Artificial Intelligence Research ◽

10.5430/air.v7n2p1 ◽

2018 ◽

Vol 7 (2) ◽

pp. 1

Author(s):

Paulo Marcelo Tasinaffo ◽

Gildárcio Sousa Gonçalves ◽

Adilson Marques da Cunha ◽

Luiz Alberto Vieira Dias

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Monte Carlo ◽

Monte Carlo Method ◽

Mean Squared Error ◽

Classification Problem ◽

Computational Environment ◽

The Monte Carlo Method ◽

Proposed Model ◽

Artificial Neural

This paper proposes to develop a model-based Monte Carlo method for computationally determining the best mean squared error of training for an artificial neural network with feedforward architecture. It is applied for a particular non-linear classification problem of input/output patterns in a computational environment with abundant data. The Monte Carlo method allows computationally checking that balanced data are much better than non-balanced ones for an artificial neural network to learn by means of supervised learning. The major contribution of this investigation is that, the proposed model can be tested by analogy, considering also the fraud detection problem in credit cards, where the amount of training patterns used are high.

Download Full-text