A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Aishan Wumaier; Cuiyun Xu; Zaokere Kadeer; Wenqi Liu; Yingbo Wang; Xireaili Haierla; Maihemuti Maimaiti; ShengWei Tian; Alimu Saimaiti

doi:10.3390/info11100492

A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Information ◽

10.3390/info11100492 ◽

2020 ◽

Vol 11 (10) ◽

pp. 492

Author(s):

Aishan Wumaier ◽

Cuiyun Xu ◽

Zaokere Kadeer ◽

Wenqi Liu ◽

Yingbo Wang ◽

...

Keyword(s):

Neural Network ◽

Information Retrieval ◽

Machine Translation ◽

Word Segmentation ◽

Translation System ◽

Attention Model ◽

Segmentation Approach ◽

Cross Lingual ◽

Agglutinative Languages ◽

Transformer Model

The recognition and translation of organization names (ONs) is challenging due to the complex structures and high variability involved. ONs consist not only of common generic words but also names, rare words, abbreviations and business and industry jargon. ONs are a sub-class of named entity (NE) phrases, which convey key information in text. As such, the correct translation of ONs is critical for machine translation and cross-lingual information retrieval. The existing Chinese–Uyghur neural machine translation systems have performed poorly when applied to ON translation tasks. As there are no publicly available Chinese–Uyghur ON translation corpora, an ON translation corpus is developed here, which includes 191,641 ON translation pairs. A word segmentation approach involving characterization, tagged characterization, byte pair encoding (BPE) and syllabification is proposed here for ON translation tasks. A recurrent neural network (RNN) attention framework and transformer are adapted here for ON translation tasks with different sequence granularities. The experimental results indicate that the transformer model not only outperforms the RNN attention model but also benefits from the proposed word segmentation approach. In addition, a Chinese–Uyghur ON translation system is developed here to automatically generate new translation pairs. This work significantly improves Chinese–Uyghur ON translation and can be applied to improve Chinese–Uyghur machine translation and cross-lingual information retrieval. It can also easily be extended to other agglutinative languages.

Download Full-text

Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System

International Journal of Neural Systems ◽

10.1142/s0129065718500077 ◽

2018 ◽

Vol 28 (09) ◽

pp. 1850007

Author(s):

Francisco Zamora-Martinez ◽

Maria Jose Castro-Bleda

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Traditional Approach ◽

Computational Cost ◽

Integrated Approach ◽

Language Models ◽

Translation System ◽

Neural Net ◽

Network Language

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.

Download Full-text

The data-driven Bulgarian WordNet: BTBWN

Cognitive Studies | Études cognitives ◽

10.11649/cs.1713 ◽

2018 ◽

Author(s):

Petya Osenova ◽

Kiril Simov

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Semantic Information ◽

Data Driven ◽

Lexical Resources ◽

Multilingual Information Retrieval ◽

Cross Lingual ◽

Princeton Wordnet ◽

Word Senses

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.

Download Full-text

The Critical Technology Development Status of Machine Translation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.791-793.1622 ◽

2013 ◽

Vol 791-793 ◽

pp. 1622-1625

Author(s):

Dan Han ◽

Zhi Han Yu

Keyword(s):

Machine Translation ◽

Technology Development ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Translation System ◽

Chinese Word ◽

Segmentation Method ◽

Chinese Word Segmentation ◽

Critical Technology ◽

Translation Machine

In this article, we mainly introduce some basic concepts about machine translation. Machine translation means translating a natural language text to another by software. It can be divided into two categories: rule-based and corpus-based. IBM's statistical machine translation, Microsoft's multi-language machine translation project, AT & T's voice translation system and CMUs PANGLOSS system are three typical machine translation systems. Due to sentences are constructed by words continuously in Chinese. Chinese word segmentation is very essential. Three methods of Chinese word segmentation: segmentation methods based on string matching, segmentation method based on the understanding and segmentation method based on the statistics.

Download Full-text

Corpus based Machine Translation System with Deep Neural Network for Sanskrit to Hindi Translation

Procedia Computer Science ◽

10.1016/j.procs.2020.03.306 ◽

2020 ◽

Vol 167 ◽

pp. 2534-2544

Author(s):

Muskaan Singh ◽

Ravinder Kumar ◽

Inderveer Chana

Keyword(s):

Neural Network ◽

Machine Translation ◽

Deep Neural Network ◽

Translation System ◽

Machine Translation System

Download Full-text

A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

Computational Intelligence and Neuroscience ◽

10.1155/2016/9821608 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Phuoc Tran ◽

Dien Dinh ◽

Hien T. Nguyen

Keyword(s):

Machine Translation ◽

Hybrid Approach ◽

Sparse Data ◽

Word Segmentation ◽

Experimental Results ◽

Translation System ◽

Word Level ◽

Data Problem ◽

Sparse Data Problem ◽

Language Pair

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

Download Full-text

Linguistically enhanced word segmentation for better neural machine translation of low resource agglutinative languages

International Journal of Speech Technology ◽

10.1007/s10772-021-09865-5 ◽

2021 ◽

Author(s):

Santwana Chimalamarri ◽

Dinkar Sitaram

Keyword(s):

Machine Translation ◽

Word Segmentation ◽

Neural Machine Translation ◽

Low Resource ◽

Agglutinative Languages

Download Full-text

Deep Neural Network--based Machine Translation System Combination

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3389791 ◽

2020 ◽

Vol 19 (5) ◽

pp. 1-19

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Xiaomian Kang ◽

Chengqing Zong

Keyword(s):

Neural Network ◽

Machine Translation ◽

Deep Neural Network ◽

Translation System ◽

System Combination ◽

Machine Translation System

Download Full-text

Design of English Automatic Translation System Based on Machine Intelligent Translation and Secure Internet of Things

Mobile Information Systems ◽

10.1155/2021/8670739 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Haidong Ban ◽

Jing Ning

Keyword(s):

Neural Network ◽

Machine Translation ◽

Rapid Development ◽

Model Performance ◽

Internet Technology ◽

Target Language ◽

Translation System ◽

Improvement Rate ◽

Translation Model ◽

Automatic Translation

With the rapid development of Internet technology and the development of economic globalization, international exchanges in various fields have become increasingly active, and the need for communication between languages has become increasingly clear. As an effective tool, automatic translation can perform equivalent translation between different languages while preserving the original semantics. This is very important in practice. This paper focuses on the Chinese-English machine translation model based on deep neural networks. In this paper, we use the end-to-end encoder and decoder framework to create a neural machine translation model, the machine automatically learns its function, and the data is converted into word vectors in a distributed method and can be directly through the neural network perform the mapping between the source language and the target language. Research experiments show that, by adding part of the voice information to verify the effectiveness of the model performance improvement, the performance of the translation model can be improved. With the superimposition of the number of network layers from two to four, the improvement ratios of each model are 5.90%, 6.1%, 6.0%, and 7.0%, respectively. Among them, the model with an independent recurrent neural network as the network structure has the largest improvement rate and a higher improvement rate, so the system has high availability.

Download Full-text

Domain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-642-36973-5_80 ◽

2013 ◽

pp. 768-771 ◽

Cited By ~ 1

Author(s):

Vassilina Nikoulina ◽

Stéphane Clinchant

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Cross Lingual

Download Full-text

Research on Mongolian-Chinese machine translation based on the end-to-end neural network

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319410030 ◽

2019 ◽

Vol 18 (01) ◽

pp. 1941003 ◽

Cited By ~ 1

Author(s):

Ren Qing-Dao-Er-Ji ◽

Yila Su ◽

Nier Wu

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Conditional Random Field ◽

Target Language ◽

Word Alignment ◽

Neural Machine Translation ◽

Attention Model ◽

End To End ◽

Improved Model

With the development of natural language processing and neural machine translation, the neural machine translation method of end-to-end (E2E) neural network model has gradually become the focus of research because of its high translation accuracy and strong semantics of translation. However, there are still problems such as limited vocabulary and low translation loyalty, etc. In this paper, the discriminant method and the Conditional Random Field (CRF) model were used to segment and label the stem and affixes of Mongolian in the preprocessing stage of Mongolian-Chinese bilingual corpus. Aiming at the low translation loyalty problem, a decoding model combining Convolution Neural Network (CNN) and Gated Recurrent Unit (GRU) was constructed. The target language decoding was performed by using the GRU. A global attention model was used to obtain the bilingual word alignment information in the process of bilingual word alignment processing. Finally, the quality of the translation was evaluated by Bilingual Evaluation Understudy (BLEU) values and Perplexity (PPL) values. The improved model yields a BLEU value of 25.13 and a PPL value of [Formula: see text]. The experimental results show that the E2E Mongolian-Chinese neural machine translation model was improved in terms of translation quality and semantic confusion compared with traditional statistical methods and machine translation models based on Recurrent Neural Networks (RNN).

Download Full-text