scholarly journals Hidden Markov convolutive mixture model for pitch contour analysis of speech

Author(s):  
Kota Yoshizato ◽  
Hirokazu Kameoka ◽  
Daisuke Saito ◽  
Shigeki Sagayama
2021 ◽  
Vol 11 (7) ◽  
pp. 3138
Author(s):  
Mingchi Zhang ◽  
Xuemin Chen ◽  
Wei Li

In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model (HMM). The hybrid HMM, i.e., DNN-HMM, consists of a deep neural network (DNN) with multiple layers to exploit the non-linear data. The DNN is initialized by using a deep belief network (DBN). The DBN is a pre-trained model built by stacking top-down restricted Boltzmann machines (RBM) that compute the emission probabilities for the HMM instead of Gaussian mixture model (GMM). Two comparative studies based on different numbers of states using Gaussian mixture model-hidden Markov model (GMM-HMM) and DNN-HMM are performed. The accuracy of the testing performance between detected state sequence and actual state sequence is measured by micro F1 score. The micro F1 score approaches 0.94 for GMM-HMM method and it is close to 0.95 for DNN-HMM method when the pipeline is divided into three sections. In the experiment that divides the pipeline as five sections, the micro F1 score for GMM-HMM is 0.69, while it approaches 0.96 with DNN-HMM method. The results demonstrate that the DNN-HMM can learn a better model of non-linear data and achieve better performance compared to GMM-HMM method.


Author(s):  
WU-JI YANG ◽  
JYH-CHYANG LEE ◽  
YUEH-CHIN CHANG ◽  
HSIAO-CHUAN WANG

This study purposes a method for recognizing the lexical tones in Mandarin speech. The method is based on Vector Quantization (VQ) and Hidden Markov Models (HMM). The pitch periods are extracted to derive the feature vectors which represent pitch height and pitch contour slope. One HMM is trained by the feature vectors of monosyllables for each tone. Then the HMMs are used to recognize the tone of monosyllables and disyllables. For the monosyllables, the accuracy rate can be 93.75% for speaker-independent cases. For the disyllables, the accuracy rates are 93% for the first syllables and 90% for the second syllables. It shows that the tone of the second syllable may be affected by the preceding syllable. This degradation also reveals the fact of tone variation in Mandarin speech.


Sign in / Sign up

Export Citation Format

Share Document