lm smoothing
Recently Published Documents

TOTAL DOCUMENTS

(FIVE YEARS 0)

H-INDEX

(FIVE YEARS 0)

Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords

Compressed Nonparametric Language Modelling

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/376 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ehsan Shareghi ◽

Gholamreza Haffari ◽

Trevor Cohn

Keyword(s):

Statistical Inference ◽

State Of The Art ◽

Language Models ◽

Point Estimate ◽

Suffix Trees ◽

Time Usage ◽

Memory Footprint ◽

Language Modelling ◽

Learning Language ◽

Lm Smoothing

Hierarchical Pitman-Yor Process priors are compelling for learning language models, outperforming point-estimate based methods. However, these models remain unpopular due to computational and statistical inference issues, such as memory and time usage, as well as poor mixing of sampler. In this work we propose a novel framework which represents the HPYP model compactly using compressed suffix trees. Then, we develop an efficient approximate inference scheme in this framework that has a much lower memory footprint compared to full HPYP and is fast in the inference time. The experimental results illustrate that our model can be built on significantly larger datasets compared to previous HPYP models, while being several orders of magnitudes smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney count-based LM smoothing by up to 15%.

Download Full-text

Pipelined language model construction for Polish speech recognition

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2013-0049 ◽

2013 ◽

Vol 23 (3) ◽

pp. 649-668 ◽

Cited By ~ 2

Author(s):

Jerzy Sas ◽

Andrzej Żołnierek

Keyword(s):

Speech Recognition ◽

Word Order ◽

Language Model ◽

Short Word ◽

Initial Model ◽

Practical Applications ◽

Second Stage ◽

Polish Language ◽

Consistent Method ◽

Lm Smoothing

Abstract The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

Download Full-text

lm smoothingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Compressed Nonparametric Language Modelling

Pipelined language model construction for Polish speech recognition

lm smoothing
Recently Published Documents