scholarly journals Combining augmented statistical noise suppression and framewise speech/non-speech classification for robust voice activity detection

Author(s):  
Yasunari Obuchi

This paper proposes a new voice activity detection (VAD) algorithm based on statistical noise suppression and framewise speech/non-speech classification. Although many VAD algorithms have been developed that are robust in noisy environments, the most successful ones are related to statistical noise suppression in some way. Accordingly, we formulate our VAD algorithm as a combination of noise suppression and subsequent framewise classification. The noise suppression part is improved by introducing the idea that any unreliable frequency component should be removed, and the decision can be made by the remaining signal. This augmentation can be realized using a few additional parameters embedded in the gain-estimation process. The framewise classification part can be either model-less or model-based. A model-less classifier has the advantage that it can be applied to any situation, even if no training data are available. In contrast, a model-based classifier (e.g., neural network-based classifier) requires training data but tends to be more accurate. The accuracy of the proposed algorithm is evaluated using the CENSREC-1-C public framework and confirmed to be superior to many existing algorithms.

2006 ◽  
Author(s):  
Ángel de la Torre ◽  
Javier Ramírez ◽  
Carmen Benítez ◽  
José C. Segura ◽  
L. García ◽  
...  

Author(s):  
Charaf Eddine Chelloug ◽  
◽  
Atef Farrouki ◽  

In speech compression systems, Voice Activity Detection (VAD) is frequently used to distinguish active voice from other noisy sounds. In this paper, a robust approach of VAD is presented to deal with non-stationary noisy environments. The proposed algorithm exploits adaptive thresholding technique to keep a desired False Acceptance (FA) rate. Iterative hypothesis tests, using signal energy, are implemented to discard or to accept the successive audio frames as active voice. According to the stationary property of the speech, we provide a smoothing method to obtain final VAD decisions. The main contribution of the proposed algorithm concerns its ability to automatically adjust the energy threshold according to the local noise estimator. We analyzed the proposed approach by presenting a comparison with the G.729-B via the NOIZEUS database. The VAD architecture is implemented on a Microcontroller-based system (MCU). Several tests have been conducted by performing real time acquisition via the Input/Output ports of the MCU-system.


Sign in / Sign up

Export Citation Format

Share Document