scholarly journals Leveraging Neural Machine Translation for Word Alignment

2021 ◽  
Vol 116 (1) ◽  
pp. 43-61
Author(s):  
Vilém Zouhar ◽  
Daria Pylypenko
2019 ◽  
Vol 9 (10) ◽  
pp. 2036
Author(s):  
Jinyi Zhang ◽  
Tadahiro Matsumoto

The translation quality of Neural Machine Translation (NMT) systems depends strongly on the training data size. Sufficient amounts of parallel data are, however, not available for many language pairs. This paper presents a corpus augmentation method, which has two variations: one is for all language pairs, and the other is for the Chinese-Japanese language pair. The method uses both source and target sentences of the existing parallel corpus and generates multiple pseudo-parallel sentence pairs from a long parallel sentence pair containing punctuation marks as follows: (1) split the sentence pair into parallel partial sentences; (2) back-translate the target partial sentences; and (3) replace each partial sentence in the source sentence with the back-translated target partial sentence to generate pseudo-source sentences. The word alignment information, which is used to determine the split points, is modified with “shared Chinese character rates” in segments of the sentence pairs. The experiment results of the Japanese-Chinese and Chinese-Japanese translation with ASPEC-JC (Asian Scientific Paper Excerpt Corpus, Japanese-Chinese) show that the method substantially improves translation performance. We also supply the code (see Supplementary Materials) that can reproduce our proposed method.


2019 ◽  
Author(s):  
Xintong Li ◽  
Guanlin Li ◽  
Lemao Liu ◽  
Max Meng ◽  
Shuming Shi

2020 ◽  
Author(s):  
Yun Chen ◽  
Yang Liu ◽  
Guanhua Chen ◽  
Xin Jiang ◽  
Qun Liu

Author(s):  
Ren Qing-Dao-Er-Ji ◽  
Yila Su ◽  
Nier Wu

With the development of natural language processing and neural machine translation, the neural machine translation method of end-to-end (E2E) neural network model has gradually become the focus of research because of its high translation accuracy and strong semantics of translation. However, there are still problems such as limited vocabulary and low translation loyalty, etc. In this paper, the discriminant method and the Conditional Random Field (CRF) model were used to segment and label the stem and affixes of Mongolian in the preprocessing stage of Mongolian-Chinese bilingual corpus. Aiming at the low translation loyalty problem, a decoding model combining Convolution Neural Network (CNN) and Gated Recurrent Unit (GRU) was constructed. The target language decoding was performed by using the GRU. A global attention model was used to obtain the bilingual word alignment information in the process of bilingual word alignment processing. Finally, the quality of the translation was evaluated by Bilingual Evaluation Understudy (BLEU) values and Perplexity (PPL) values. The improved model yields a BLEU value of 25.13 and a PPL value of [Formula: see text]. The experimental results show that the E2E Mongolian-Chinese neural machine translation model was improved in terms of translation quality and semantic confusion compared with traditional statistical methods and machine translation models based on Recurrent Neural Networks (RNN).


2020 ◽  
Vol 27 (3) ◽  
pp. 531-552
Author(s):  
Chunpeng Ma ◽  
Akihiro Tamura ◽  
Masao Utiyama ◽  
Tiejun Zhao ◽  
Eiichiro Sumita

2019 ◽  
Vol 28 (4) ◽  
pp. 1-29 ◽  
Author(s):  
Michele Tufano ◽  
Cody Watson ◽  
Gabriele Bavota ◽  
Massimiliano Di Penta ◽  
Martin White ◽  
...  

Procedia CIRP ◽  
2021 ◽  
Vol 96 ◽  
pp. 9-14
Author(s):  
Uwe Dombrowski ◽  
Alexander Reiswich ◽  
Raphael Lamprecht

Sign in / Sign up

Export Citation Format

Share Document