Joint-character-POC N-gram language modeling for Chinese speech recognition

Author(s):  
Bin Wang ◽  
Zhijian Ou ◽  
Jian Li ◽  
Akinori Kawamura
2012 ◽  
Vol 20 (2) ◽  
pp. 235-259 ◽  
Author(s):  
MARTHA YIFIRU TACHBELIE ◽  
SOLOMON TEFERRA ABATE ◽  
WOLFGANG MENZEL

AbstractThis paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.


Author(s):  
Daoyuan Li ◽  
Tegawende F. Bissyande ◽  
Sylvain Kubler ◽  
Jacques Klein ◽  
Yves Le Traon

2013 ◽  
Author(s):  
Haşim Sak ◽  
Yun-hsuan Sung ◽  
Françoise Beaufays ◽  
Cyril Allauzen

2004 ◽  
Author(s):  
Dimitra Vergyri ◽  
Katrin Kirchhoff ◽  
Kevin Duh ◽  
Andreas Stolcke

10.5772/6380 ◽  
2008 ◽  
Author(s):  
Ebru Arsoy ◽  
Mikko Kurimo ◽  
Murat Saralar ◽  
Teemu Hirsimki ◽  
Janne Pylkknen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document