scholarly journals Voice activity detection method based on multivalued coarse-graining Lempel-Ziv complexity

2011 ◽  
Vol 8 (3) ◽  
pp. 869-888 ◽  
Author(s):  
Huan Zhao ◽  
Gangjin Wang ◽  
Cheng Xu ◽  
Fei Yu

One of the key issues in practical speech processing is to locate precisely endpoints of the input utterance to be free of nonspeech regions. Although lots of studies have been performed to solve this problem, the operation of existing voice activity detection (VAD) algorithms is still far away from ideal. This paper proposes a novel robust feature for VAD method that is based on multi-valued coarsegraining Lempel-Ziv Complexity (MLZC), which is an improved algorithm of the binary coarse-graining Lempel-Ziv Complexity (BLZC). In addition, we use fuzzy c-Means clustering algorithm and the Bayesian information criterion algorithm to estimate the thresholds of the MLZC characteristic, and adopt the dual-thresholds method for VAD. Experimental results on the TIMIT continuous speech database show that at low SNR environments, the detection performance of the proposed MLZC method is superior to the VAD in GSM ARM, G.729 and BLZC method.

10.14311/1251 ◽  
2010 ◽  
Vol 50 (4) ◽  
Author(s):  
E. Verteletskaya ◽  
K. Sakhnov

This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity of the signal, full band signal energy and high band to low band signal energy ratio. Conventional VADs are sensitive to a variably noisy environment especially with low SNR, and also result in cutting off unvoiced regions of speech as well as random oscillating of output VAD decisions. To overcome these problems, the proposed algorithm first identifies voiced regions of speech and then differentiates unvoiced regions from silence or background noise using the energy ratio and total signal energy. The performance of the proposed VAD algorithm is tested on real speech signals. Comparisons confirm that the proposed VAD algorithm outperforms the conventional VAD algorithms, especially in the presence of background noise.


2004 ◽  
Vol 1 (16) ◽  
pp. 495-500
Author(s):  
Rajkishore Prasad ◽  
Hiroshi Saruwatari ◽  
Kiyohiro Shikano

2011 ◽  
Vol 181-182 ◽  
pp. 765-769
Author(s):  
Chang Peng Ji ◽  
Mo Gao ◽  
Jie Yang

One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. For a class of VAD algorithms based on Gaussian model and Laplacian model, we incorporate complex Laplacian probability density function to our analysis of statistical properties. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in stationary noise environment is evaluated with the aid of an objective measure.


Sign in / Sign up

Export Citation Format

Share Document