A Hybrid Language Model for Handwritten Chinese Sentence Recognition

Speakers of any (minoritised or majority) language sometimes make language mistakes. Bilingual speakers may use a hybrid language, mixing languages within a sentence or even within a word, especially when they are formally similar, as Spanish and Galician are. For minoritised languages, language errors may contribute to a negative perception towards the minoritised language. The Galician public broadcaster Televisión de Galicia (TVG) has received criticism for not being a high-quality language model, permitting the intrusion of language mistakes in its content. From an exclusively linguistic viewpoint, these errors should be corrected in subtitling. Conversely, subtitling guides and target users favour a verbatim rendition of the audio, in which oral language mistakes should not be corrected. Dialectal features, even if they are not considered errors, are non-standard language. This paper aims at answering the question of “to correct or not to correct” oral errors and dialectal features in the case of minoritised languages. It presents the most relevant data from a literature review, and an analysis of subtitling guidelines & standards and of current practices at TVG. These results have yielded an original protocol for the correction or reproduction of oral errors according to speech control, target audience and broadcast genre, the effect of a mistake, and the type of language error (vocabulary vs. grammar).

Download Full-text

A hybrid language model based on a combination of N -grams and stochastic context-free grammars

ACM Transactions on Asian Language Information Processing ◽

10.1145/1034780.1034783 ◽

2004 ◽

Vol 3 (2) ◽

pp. 113-127 ◽

Cited By ~ 6

Author(s):

Diego Linares ◽

José-Miguel Benedí ◽

Joan-Andreu Sánchez

Keyword(s):

Language Model ◽

Model Based ◽

Hybrid Language ◽

Stochastic Context Free Grammars ◽

Context Free ◽

Context Free Grammars

Download Full-text

A HYBRID LANGUAGE MODEL BASED ON STATISTICS AND LINGUISTIC RULES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405003934 ◽

2005 ◽

Vol 19 (01) ◽

pp. 109-128 ◽

Cited By ~ 2

Author(s):

XIAOLONG WANG ◽

DANIEL S. YEUNG ◽

JAMES N. K. LIU ◽

ROBERT LUK ◽

XUAN WANG

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Large Scale ◽

Language Model ◽

Language Models ◽

Long Distance ◽

Language Understanding ◽

Linguistic Rules ◽

Hybrid Language ◽

N Gram

Language modeling is a current research topic in many domains including speech recognition, optical character recognition, handwriting recognition, machine translation and spelling correction. There are two main types of language models, the mathematical and the linguistic. The most widely used mathematical language model is the n-gram model inferred from statistics. This model has three problems: long distance restriction, recursive nature and partial language understanding. Language models based on linguistics present many difficulties when applied to large scale real texts. We present here a new hybrid language model that combines the advantages of the n-gram statistical language model with those of a linguistic language model which makes use of grammatical or semantic rules. Using suitable rules, this hybrid model can solve problems such as long distance restriction, recursive nature and partial language understanding. The new language model has been effective in experiments and has been incorporated in Chinese sentence input products for Windows and Macintosh OS.

Download Full-text

Documents Ranking Based on a Hybrid Language Model for Chinese Information Retrieval

2006 IEEE International Conference on Information Acquisition ◽

10.1109/icia.2006.306010 ◽

2006 ◽

Cited By ~ 3

Author(s):

Dequan Zheng ◽

Feng Yu ◽

Tiejun Zhao ◽

Sheng Li

Keyword(s):

Information Retrieval ◽

Language Model ◽

Hybrid Language

Download Full-text

Effects of Fundamental Frequency Contours on Sentence Recognition in Mandarin-Speaking Children With Cochlear Implants

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00033 ◽

2020 ◽

Vol 63 (11) ◽

pp. 3855-3864

Author(s):

Wanting Huang ◽

Lena L. N. Wong ◽

Fei Chen ◽

Haihong Liu ◽

Wei Liang

Keyword(s):

Cochlear Implants ◽

Fundamental Frequency ◽

Signal To Noise Ratio ◽

Lexical Tone ◽

Signal To Noise ◽

Sentence Recognition ◽

Test Conditions ◽

Age Appropriate ◽

F0 Contour ◽

Appropriate Sentences

Purpose Fundamental frequency (F0) is the primary acoustic cue for lexical tone perception in tonal languages but is processed in a limited way in cochlear implant (CI) systems. The aim of this study was to evaluate the importance of F0 contours in sentence recognition in Mandarin-speaking children with CIs and find out whether it is similar to/different from that in age-matched normal-hearing (NH) peers. Method Age-appropriate sentences, with F0 contours manipulated to be either natural or flattened, were randomly presented to preschool children with CIs and their age-matched peers with NH under three test conditions: in quiet, in white noise, and with competing sentences at 0 dB signal-to-noise ratio. Results The neutralization of F0 contours resulted in a significant reduction in sentence recognition. While this was seen only in noise conditions among NH children, it was observed throughout all test conditions among children with CIs. Moreover, the F0 contour-induced accuracy reduction ratios (i.e., the reduction in sentence recognition resulting from the neutralization of F0 contours compared to the normal F0 condition) were significantly greater in children with CIs than in NH children in all test conditions. Conclusions F0 contours play a major role in sentence recognition in both quiet and noise among pediatric implantees, and the contribution of the F0 contour is even more salient than that in age-matched NH children. These results also suggest that there may be differences between children with CIs and NH children in how F0 contours are processed.

Download Full-text