A study of robust omnidirectional audio‐visual talker localization algorithm with microphone array and omnidirectional image

In applications where the locations of human subjects are needed, for example, human-computer interface, video conferencing, and security surveillance applications, localizations are often performed using single sensing modalities. These mono localization modalities, such as beamforming microphone array and video-graphical localization techniques, are often prone to errors. In this chapter, a modular multimodal localization framework was constructed by combining multiple mono localization modalities using a Bayesian network. As a case study, a joint audio-video talker localization system for the video conferencing application was presented. Based on the results, the proposed multimodal localization method outperforms localization methods, in terms of accuracy and robustness, when compare with mono modal modalities that rely only on audio or video.

Download Full-text

Realization of Algorithm for Wideband Sound Source Localization in Video Monitoring System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.416-417.1086 ◽

2013 ◽

Vol 416-417 ◽

pp. 1086-1091

Author(s):

Lei Li ◽

Yong Gang Su ◽

Shen Tian ◽

Yong Li ◽

Zhi Tong Li

Keyword(s):

Embedded System ◽

Sound Source ◽

Social Research ◽

Microphone Array ◽

Blind Spot ◽

Video Tracking ◽

Time Delay Estimation ◽

Video Monitoring ◽

Localization Algorithm ◽

Security Monitoring

Video security monitoring has become the focus of social research and development; however, since the camera cannot automatically rotate, there is a blind spot in traditional security monitoring. Considering the abnormal often happens accompanied by corresponding sounds (e.g., where there is an explosion , there will be the sound of explosions), therefore, for compensating the blind spot , the auditory function can be added to the camera to track the direction of sound source automatically which requires the two-dimensional (2-D) localization of sound source to complete , at the mean time , the localization algorithm should be capable of tracking all of the source signals ,as well as be real-time to make the video tracking to be achieved by turning the camera toward sound source timely. This paper realizes the localization of wideband speech signal in video monitoring by using modern signal processing method, linear microphone array, positioning thought based on time delay estimation, frequency domain transform, and spectrum-search method based on energy value. Both the early simulation and late DSP-based embedded system platform have verified the feasibility of the method.

Download Full-text

Research on Sound Source Localization Algorithm of Spatial Distributed Microphone Array Based on PHAT Model

Lecture Notes in Electrical Engineering - Communications, Signal Processing, and Systems ◽

10.1007/978-981-13-6504-1_171 ◽

2019 ◽

pp. 1443-1446

Author(s):

Yong Liu ◽

Jia qi Zhen ◽

Yan chao Li ◽

Zhi qiang Hu

Keyword(s):

Source Localization ◽

Sound Source ◽

Microphone Array ◽

Sound Source Localization ◽

Localization Algorithm

Download Full-text

Sound Source Localization Algorithm for Low-frequency Signal's Diffraction Path Based on a Head-worn Microphone Array

Journal of Information and Computational Science ◽

10.12733/jics20107030 ◽

2015 ◽

Vol 12 (18) ◽

pp. 6709-6716

Author(s):

Yi Zhang

Keyword(s):

Source Localization ◽

Sound Source ◽

Microphone Array ◽

Low Frequency ◽

Sound Source Localization ◽

Localization Algorithm

Download Full-text

Talker localization based on the combination of DOA estimation and statistical sound source identification with microphone array

IEEE Workshop on Statistical Signal Processing, 2003 ◽

10.1109/ssp.2003.1289547 ◽

2004 ◽

Cited By ~ 1

Author(s):

T. Nishiura ◽

S. Nakamura

Keyword(s):

Sound Source ◽

Source Identification ◽

Microphone Array ◽

Doa Estimation ◽

Sound Source Identification ◽

Talker Localization

Download Full-text

Sound Source Localization Algorithm Based on a Helmet-Mounted Microphone Array

2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming ◽

10.1109/paap.2014.12 ◽

2014 ◽

Cited By ~ 2

Author(s):

Yi Zhang ◽

Shujie Meng

Keyword(s):

Source Localization ◽

Sound Source ◽

Microphone Array ◽

Sound Source Localization ◽

Localization Algorithm

Download Full-text

Research of HRTF Character of Head-Mounted Microphone Array

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.743.479 ◽

2015 ◽

Vol 743 ◽

pp. 479-483

Author(s):

Yi Zhang ◽

B.B. Shen ◽

S.J. Meng

Keyword(s):

Frequency Band ◽

High Frequency ◽

Microphone Array ◽

Low Frequency ◽

Diffraction Effect ◽

Localization Algorithm ◽

Attenuation Effect ◽

Practical Applications ◽

Acoustic Localization ◽

Frequency Sound

Head-mounted microphone array has practical applications in robot acoustic localization system and wearable anti-sniper positioning system. Usually, sound source localization methods are based on linear or nonlinear unblocked microphone arrays. But head-mounted microphone array is a kind of blocked arrays, with which it needs information of Head Related Transfer Function (HRTF) for precise localization. In this paper, we research the HRTF character of head-mounted microphone array for localization in high frequency band and low frequency band respectively, and design a localization algorithm for low frequency sound based on head-mounted microphone array to analysis the threshold between high and low frequency. Experimental results show that the Head-mounted Microphone Array causes diffraction effect for low frequency sound, and amplitude attenuation effect for high frequency sound, and when the low frequency band is limited into 1 KHz, the localization algorithm for low frequency realizes the best performance.

Download Full-text

Sound source localization algorithm based on seven-microphone array and PHAT-GCC method

Advanced Control, Automation and Robotics ◽

10.2495/acar140721 ◽

2015 ◽

Author(s):

Y.C. Guo ◽

K.K. Song Gong ◽

N. Zhang

Keyword(s):

Source Localization ◽

Sound Source ◽

Microphone Array ◽

Sound Source Localization ◽

Localization Algorithm

Download Full-text

A Sound Localization Algorithm Based on Helmet-Mounted Microphone Array Using the High-Frequency Intensity Difference

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.743.474 ◽

2015 ◽

Vol 743 ◽

pp. 474-478

Author(s):

Yi Zhang ◽

B.B. Shen

Keyword(s):

Sound Localization ◽

High Frequency ◽

Linear Array ◽

Microphone Array ◽

Intensity Difference ◽

Localization Algorithm ◽

Direct Path ◽

Localization Algorithms ◽

The Common ◽

Intensity Decay

The common localization algorithms of microphone array is based on linear array or circle array which is unblocked (there is no block between sound source and microphone). This paper we research the sound localization algorithm based on helmet-mounted microphone array which is a kind of blocked array. This paper takes the attenuation of high-frequency by helmet into account, and utilizes intensity decay of different backward microphones to construct localization algorithm. Experimental results show that the localization method of the high-frequency sound signal based on the direct path improves the accuracy of localization compared with common algorithm based on unblocked microphone array.

Download Full-text

Dynamically localizing multiple speakers based on the time-frequency domain

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00203-w ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Hodaya Hammer ◽

Shlomo E. Chazan ◽

Jacob Goldberger ◽

Sharon Gannot

Keyword(s):

Deep Neural Network ◽

Microphone Array ◽

Real Life ◽

Localization Algorithm ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Time Frequency ◽

Spatial Features ◽

Single Speaker ◽

Orthogonality Principle

AbstractIn this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks.

Download Full-text