scholarly journals A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Information ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 492
Author(s):  
Aishan Wumaier ◽  
Cuiyun Xu ◽  
Zaokere Kadeer ◽  
Wenqi Liu ◽  
Yingbo Wang ◽  
...  

The recognition and translation of organization names (ONs) is challenging due to the complex structures and high variability involved. ONs consist not only of common generic words but also names, rare words, abbreviations and business and industry jargon. ONs are a sub-class of named entity (NE) phrases, which convey key information in text. As such, the correct translation of ONs is critical for machine translation and cross-lingual information retrieval. The existing Chinese–Uyghur neural machine translation systems have performed poorly when applied to ON translation tasks. As there are no publicly available Chinese–Uyghur ON translation corpora, an ON translation corpus is developed here, which includes 191,641 ON translation pairs. A word segmentation approach involving characterization, tagged characterization, byte pair encoding (BPE) and syllabification is proposed here for ON translation tasks. A recurrent neural network (RNN) attention framework and transformer are adapted here for ON translation tasks with different sequence granularities. The experimental results indicate that the transformer model not only outperforms the RNN attention model but also benefits from the proposed word segmentation approach. In addition, a Chinese–Uyghur ON translation system is developed here to automatically generate new translation pairs. This work significantly improves Chinese–Uyghur ON translation and can be applied to improve Chinese–Uyghur machine translation and cross-lingual information retrieval. It can also easily be extended to other agglutinative languages.

2018 ◽  
Vol 28 (09) ◽  
pp. 1850007
Author(s):  
Francisco Zamora-Martinez ◽  
Maria Jose Castro-Bleda

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.


Author(s):  
Petya Osenova ◽  
Kiril Simov

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.


2013 ◽  
Vol 791-793 ◽  
pp. 1622-1625
Author(s):  
Dan Han ◽  
Zhi Han Yu

In this article, we mainly introduce some basic concepts about machine translation. Machine translation means translating a natural language text to another by software. It can be divided into two categories: rule-based and corpus-based. IBM's statistical machine translation, Microsoft's multi-language machine translation project, AT & T's voice translation system and CMUs PANGLOSS system are three typical machine translation systems. Due to sentences are constructed by words continuously in Chinese. Chinese word segmentation is very essential. Three methods of Chinese word segmentation: segmentation methods based on string matching, segmentation method based on the understanding and segmentation method based on the statistics.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Phuoc Tran ◽  
Dien Dinh ◽  
Hien T. Nguyen

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Haidong Ban ◽  
Jing Ning

With the rapid development of Internet technology and the development of economic globalization, international exchanges in various fields have become increasingly active, and the need for communication between languages has become increasingly clear. As an effective tool, automatic translation can perform equivalent translation between different languages while preserving the original semantics. This is very important in practice. This paper focuses on the Chinese-English machine translation model based on deep neural networks. In this paper, we use the end-to-end encoder and decoder framework to create a neural machine translation model, the machine automatically learns its function, and the data is converted into word vectors in a distributed method and can be directly through the neural network perform the mapping between the source language and the target language. Research experiments show that, by adding part of the voice information to verify the effectiveness of the model performance improvement, the performance of the translation model can be improved. With the superimposition of the number of network layers from two to four, the improvement ratios of each model are 5.90%, 6.1%, 6.0%, and 7.0%, respectively. Among them, the model with an independent recurrent neural network as the network structure has the largest improvement rate and a higher improvement rate, so the system has high availability.


Author(s):  
Ren Qing-Dao-Er-Ji ◽  
Yila Su ◽  
Nier Wu

With the development of natural language processing and neural machine translation, the neural machine translation method of end-to-end (E2E) neural network model has gradually become the focus of research because of its high translation accuracy and strong semantics of translation. However, there are still problems such as limited vocabulary and low translation loyalty, etc. In this paper, the discriminant method and the Conditional Random Field (CRF) model were used to segment and label the stem and affixes of Mongolian in the preprocessing stage of Mongolian-Chinese bilingual corpus. Aiming at the low translation loyalty problem, a decoding model combining Convolution Neural Network (CNN) and Gated Recurrent Unit (GRU) was constructed. The target language decoding was performed by using the GRU. A global attention model was used to obtain the bilingual word alignment information in the process of bilingual word alignment processing. Finally, the quality of the translation was evaluated by Bilingual Evaluation Understudy (BLEU) values and Perplexity (PPL) values. The improved model yields a BLEU value of 25.13 and a PPL value of [Formula: see text]. The experimental results show that the E2E Mongolian-Chinese neural machine translation model was improved in terms of translation quality and semantic confusion compared with traditional statistical methods and machine translation models based on Recurrent Neural Networks (RNN).


Sign in / Sign up

Export Citation Format

Share Document