scholarly journals Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3418 ◽  
Author(s):  
Juan Vera-Diaz ◽  
Daniel Pizarro ◽  
Javier Macias-Guarasa

This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used.

Author(s):  
Juan Manuel Vera-Diaz ◽  
Daniel Pizarro ◽  
Javier Macias-Guarasa

This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.


Geophysics ◽  
1990 ◽  
Vol 55 (9) ◽  
pp. 1166-1182 ◽  
Author(s):  
Irshad R. Mufti

Finite‐difference seismic models are commonly set up in 2-D space. Such models must be excited by a line source which leads to different amplitudes than those in the real data commonly generated from a point source. Moreover, there is no provision for any out‐of‐plane events. These problems can be eliminated by using 3-D finite‐difference models. The fundamental strategy in designing efficient 3-D models is to minimize computational work without sacrificing accuracy. This was accomplished by using a (4,2) differencing operator which ensures the accuracy of much larger operators but requires many fewer numerical operations as well as significantly reduced manipulation of data in the computer memory. Such a choice also simplifies the problem of evaluating the wave field near the subsurface boundaries of the model where large operators cannot be used. We also exploited the fact that, unlike the real data, the synthetic data are free from ambient noise; consequently, one can retain sufficient resolution in the results by optimizing the frequency content of the source signal. Further computational efficiency was achieved by using the concept of the exploding reflector which yields zero‐offset seismic sections without the need to evaluate the wave field for individual shot locations. These considerations opened up the possibility of carrying out a complete synthetic 3-D survey on a supercomputer to investigate the seismic response of a large‐scale structure located in Oklahoma. The analysis of results done on a geophysical workstation provides new insight regarding the role of interference and diffraction in the interpretation of seismic data.


Author(s):  
Mehmet Niyazi Çankaya

The systematic sampling is used as a method to get the quantitative results from the tissues and the radiological images. Systematic sampling on real line (R) is a very attractive method within which the biomedical imaging is consulted by the practitioners. For the systematic sampling on R, the measurement function (MF) is occurred by slicing the three dimensional object equidistant  systematically. If the parameter q of MF is estimated to be small enough for mean square error, we can make the important remarks for the design-based stereology. This study is an extension of [17], and an exact calculation method is proposed to calculate the constant λ(q,N) of confidence interval in the systematic sampling. In the results, synthetic data can support the results of real data. The currently used covariogram model in variance approximation proposed by [28,29] is tested for the different measurement functions to see the performance on the variance estimation of systematically sampled R. The exact value of constant λ(q,N) is examined for the different measurement functions as well.


Author(s):  
Mehmet Niyazi Çankaya

The systematic sampling is used as a method to get the quantitative results from the tissues and the radiological images. Systematic sampling on real line (R) is a very attractive method within which the biomedical imaging is consulted by the practitioners. For the systematic sampling on R, the measurement function (MF) is occurred by slicing the three-dimensional object equidistant  systematically. The currently used covariogram model in variance approximation proposed by [28,29] is tested for the different measurement functions in a class to see the performance on the variance estimation of systematically sampled R. This study is an extension of [17], and an exact calculation method is proposed to calculate the constant λ(q,N) of confidence interval in the systematic sampling. The exact value of constant λ(q,N) is examined for the different measurement functions as well. As a result, it is observed from the simulation that the proposed MF should be used to check the performances of the variance approximation and the constant λ(q,N). Synthetic data can support the results of real data.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4555
Author(s):  
Lee Friedman ◽  
Hal S. Stern ◽  
Larry R. Price ◽  
Oleg V. Komogortsev

It is generally accepted that relatively more permanent (i.e., more temporally persistent) traits are more valuable for biometric performance than less permanent traits. Although this finding is intuitive, there is no current work identifying exactly where in the biometric analysis temporal persistence makes a difference. In this paper, we answer this question. In a recent report, we introduced the intraclass correlation coefficient (ICC) as an index of temporal persistence for such features. Here, we present a novel approach using synthetic features to study which aspects of a biometric identification study are influenced by the temporal persistence of features. What we show is that using more temporally persistent features produces effects on the similarity score distributions that explain why this quality is so key to biometric performance. The results identified with the synthetic data are largely reinforced by an analysis of two datasets, one based on eye-movements and one based on gait. There was one difference between the synthetic and real data, related to the intercorrelation of features in real data. Removing these intercorrelations for real datasets with a decorrelation step produced results which were very similar to that obtained with synthetic features.


Geophysics ◽  
2019 ◽  
Vol 84 (5) ◽  
pp. C217-C227 ◽  
Author(s):  
Baoqing Tian ◽  
Jiangjie Zhang

High-resolution imaging has become more popular recently in exploration geophysics. Conventionally, geophysicists image the subsurface using the isotropy approximation. When considering the anisotropy effects, one can expect to obtain an imaging profile with higher accuracy than the isotropy approach allows. Orthorhombic anisotropy is considered an ideal approximation in the realistic case. It has been used in the industry for several years. Although being attractive, broad application of orthorhombic anisotropy has many problems to solve. We have developed a novel approach of prestack time migration in the orthorhombic case. The traveltime and amplitude of a wave propagating in orthorhombic media are calculated directly by launching new anisotropic velocity and anisotropic parameters. We validate our methods with synthetic data. We also highlight our methods with model data set and real data. The results found that our methods work well for prestack time migration in orthorhombic media.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Huang Bai ◽  
Sheng Li ◽  
Qianru Jiang

Dictionary learning problem has become an active topic for decades. Most existing learning methods train the dictionary to adapt to a particular class of signals. But as the number of the dictionary atoms is increased to represent the signals much more sparsely, the coherence between the atoms becomes higher. According to the greedy and compressed sensing theories, this goes against the implementation of sparse coding. In this paper, a novel approach is proposed to learn the dictionary that minimizes the sparse representation error according to the training signals with the coherence taken into consideration. The coherence is constrained by making the Gram matrix of the desired dictionary approximate to an identity matrix of proper dimension. The method for handling the proposed model is mainly based on the alternating minimization procedure and, in each step, the closed-form solution is derived. A series of experiments on synthetic data and audio signals is executed to demonstrate the promising performance of the learnt incoherent dictionary and the superiority of the learning method to the existing ones.


Technologies ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 94
Author(s):  
Daniel Canedo ◽  
Pedro Fonseca ◽  
Petia Georgieva ◽  
António J. R. Neves

Floor-cleaning robots are becoming increasingly more sophisticated over time and with the addition of digital cameras supported by a robust vision system they become more autonomous, both in terms of their navigation skills but also in their capabilities of analyzing the surrounding environment. This document proposes a vision system based on the YOLOv5 framework for detecting dirty spots on the floor. The purpose of such a vision system is to save energy and resources, since the cleaning system of the robot will be activated only when a dirty spot is detected and the quantity of resources will vary according to the dirty area. In this context, false positives are highly undesirable. On the other hand, false negatives will lead to a poor cleaning performance of the robot. For this reason, a synthetic data generator found in the literature was improved and adapted for this work to tackle the lack of real data in this area. This synthetic data generator allows for large datasets with numerous samples of floors and dirty spots. A novel approach in selecting floor images for the training dataset is proposed. In this approach, the floor is segmented from other objects in the image such that dirty spots are only generated on the floor and do not overlap those objects. This helps the models to distinguish between dirty spots and objects in the image, which reduces the number of false positives. Furthermore, a relevant dataset of the Automation and Control Institute (ACIN) was found to be partially labelled. Consequently, this dataset was annotated from scratch, tripling the number of labelled images and correcting some poor annotations from the original labels. Finally, this document shows the process of generating synthetic data which is used for training YOLOv5 models. These models were tested on a real dataset (ACIN) and the best model attained a mean average precision (mAP) of 0.874 for detecting solid dirt. These results further prove that our proposal is able to use synthetic data for the training step and effectively detect dirt on real data. According to our knowledge, there are no previous works reporting the use of YOLOv5 models in this application.


Author(s):  
Zhanpeng Wang ◽  
Jiaping Wang ◽  
Michael Kourakos ◽  
Nhung Hoang ◽  
Hyong Hark Lee ◽  
...  

AbstractPopulation genetics relies heavily on simulated data for validation, inference, and intuition. In particular, since real data is always limited, simulated data is crucial for training machine learning methods. Simulation software can accurately model evolutionary processes, but requires many hand-selected input parameters. As a result, simulated data often fails to mirror the properties of real genetic data, which limits the scope of methods that rely on it. In this work, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project, and show that we can accurately recapitulate the features of real data.


2020 ◽  
Vol 12 (8) ◽  
pp. 1240 ◽  
Author(s):  
Xabier Blanch ◽  
Antonio Abellan ◽  
Marta Guinau

The emerging use of photogrammetric point clouds in three-dimensional (3D) monitoring processes has revealed some constraints with respect to the use of LiDAR point clouds. Oftentimes, point clouds (PC) obtained by time-lapse photogrammetry have lower density and precision, especially when Ground Control Points (GCPs) are not available or the camera system cannot be properly calibrated. This paper presents a new workflow called Point Cloud Stacking (PCStacking) that overcomes these restrictions by making the most of the iterative solutions in both camera position estimation and internal calibration parameters that are obtained during bundle adjustment. The basic principle of the stacking algorithm is straightforward: it computes the median of the Z coordinates of each point for multiple photogrammetric models to give a resulting PC with a greater precision than any of the individual PC. The different models are reconstructed from images taken simultaneously from, at least, five points of view, reducing the systematic errors associated with the photogrammetric reconstruction workflow. The algorithm was tested using both a synthetic point cloud and a real 3D dataset from a rock cliff. The synthetic data were created using mathematical functions that attempt to emulate the photogrammetric models. Real data were obtained by very low-cost photogrammetric systems specially developed for this experiment. Resulting point clouds were improved when applying the algorithm in synthetic and real experiments, e.g., 25th and 75th error percentiles were reduced from 3.2 cm to 1.4 cm in synthetic tests and from 1.5 cm to 0.5 cm in real conditions.


Sign in / Sign up

Export Citation Format

Share Document