A study of robust omnidirectional audio‐visual talker localization algorithm with microphone array and omnidirectional image

2006 ◽  
Vol 120 (5) ◽  
pp. 3040-3040
Author(s):  
Yuki Denda ◽  
Tanaknobu Nishiura ◽  
Yoichi Yamashita
2007 ◽  
pp. 194-221 ◽  
Author(s):  
David Lo

In applications where the locations of human subjects are needed, for example, human-computer interface, video conferencing, and security surveillance applications, localizations are often performed using single sensing modalities. These mono localization modalities, such as beamforming microphone array and video-graphical localization techniques, are often prone to errors. In this chapter, a modular multimodal localization framework was constructed by combining multiple mono localization modalities using a Bayesian network. As a case study, a joint audio-video talker localization system for the video conferencing application was presented. Based on the results, the proposed multimodal localization method outperforms localization methods, in terms of accuracy and robustness, when compare with mono modal modalities that rely only on audio or video.


2013 ◽  
Vol 416-417 ◽  
pp. 1086-1091
Author(s):  
Lei Li ◽  
Yong Gang Su ◽  
Shen Tian ◽  
Yong Li ◽  
Zhi Tong Li

Video security monitoring has become the focus of social research and development; however, since the camera cannot automatically rotate, there is a blind spot in traditional security monitoring. Considering the abnormal often happens accompanied by corresponding sounds (e.g., where there is an explosion , there will be the sound of explosions), therefore, for compensating the blind spot , the auditory function can be added to the camera to track the direction of sound source automatically which requires the two-dimensional (2-D) localization of sound source to complete , at the mean time , the localization algorithm should be capable of tracking all of the source signals ,as well as be real-time to make the video tracking to be achieved by turning the camera toward sound source timely. This paper realizes the localization of wideband speech signal in video monitoring by using modern signal processing method, linear microphone array, positioning thought based on time delay estimation, frequency domain transform, and spectrum-search method based on energy value. Both the early simulation and late DSP-based embedded system platform have verified the feasibility of the method.


2015 ◽  
Vol 743 ◽  
pp. 479-483
Author(s):  
Yi Zhang ◽  
B.B. Shen ◽  
S.J. Meng

Head-mounted microphone array has practical applications in robot acoustic localization system and wearable anti-sniper positioning system. Usually, sound source localization methods are based on linear or nonlinear unblocked microphone arrays. But head-mounted microphone array is a kind of blocked arrays, with which it needs information of Head Related Transfer Function (HRTF) for precise localization. In this paper, we research the HRTF character of head-mounted microphone array for localization in high frequency band and low frequency band respectively, and design a localization algorithm for low frequency sound based on head-mounted microphone array to analysis the threshold between high and low frequency. Experimental results show that the Head-mounted Microphone Array causes diffraction effect for low frequency sound, and amplitude attenuation effect for high frequency sound, and when the low frequency band is limited into 1 KHz, the localization algorithm for low frequency realizes the best performance.


2015 ◽  
Vol 743 ◽  
pp. 474-478
Author(s):  
Yi Zhang ◽  
B.B. Shen

The common localization algorithms of microphone array is based on linear array or circle array which is unblocked (there is no block between sound source and microphone). This paper we research the sound localization algorithm based on helmet-mounted microphone array which is a kind of blocked array. This paper takes the attenuation of high-frequency by helmet into account, and utilizes intensity decay of different backward microphones to construct localization algorithm. Experimental results show that the localization method of the high-frequency sound signal based on the direct path improves the accuracy of localization compared with common algorithm based on unblocked microphone array.


Author(s):  
Hodaya Hammer ◽  
Shlomo E. Chazan ◽  
Jacob Goldberger ◽  
Sharon Gannot

AbstractIn this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks.


Sign in / Sign up

Export Citation Format

Share Document