Leveraging Neural Machine Translation for Word Alignment

Vilém Zouhar; Daria Pylypenko

doi:10.14712/00326585.014

Corpus Augmentation for Neural Machine Translation with Chinese-Japanese Parallel Corpora

Applied Sciences ◽

10.3390/app9102036 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2036

Author(s):

Jinyi Zhang ◽

Tadahiro Matsumoto

Keyword(s):

Machine Translation ◽

Scientific Paper ◽

Training Data ◽

Word Alignment ◽

Sentence Pair ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Parallel Data ◽

Source Sentence

The translation quality of Neural Machine Translation (NMT) systems depends strongly on the training data size. Sufficient amounts of parallel data are, however, not available for many language pairs. This paper presents a corpus augmentation method, which has two variations: one is for all language pairs, and the other is for the Chinese-Japanese language pair. The method uses both source and target sentences of the existing parallel corpus and generates multiple pseudo-parallel sentence pairs from a long parallel sentence pair containing punctuation marks as follows: (1) split the sentence pair into parallel partial sentences; (2) back-translate the target partial sentences; and (3) replace each partial sentence in the source sentence with the back-translated target partial sentence to generate pseudo-source sentences. The word alignment information, which is used to determine the split points, is modified with “shared Chinese character rates” in segments of the sentence pairs. The experiment results of the Japanese-Chinese and Chinese-Japanese translation with ASPEC-JC (Asian Scientific Paper Excerpt Corpus, Japanese-Chinese) show that the method substantially improves translation performance. We also supply the code (see Supplementary Materials) that can reproduce our proposed method.

Download Full-text

On the Word Alignment from Neural Machine Translation

10.18653/v1/p19-1124 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xintong Li ◽

Guanlin Li ◽

Lemao Liu ◽

Max Meng ◽

Shuming Shi

Keyword(s):

Machine Translation ◽

Word Alignment ◽

Neural Machine Translation

Download Full-text

Accurate Word Alignment Induction from Neural Machine Translation

10.18653/v1/2020.emnlp-main.42 ◽

2020 ◽

Author(s):

Yun Chen ◽

Yang Liu ◽

Guanhua Chen ◽

Xin Jiang ◽

Qun Liu

Keyword(s):

Machine Translation ◽

Word Alignment ◽

Neural Machine Translation

Download Full-text

Research on Mongolian-Chinese machine translation based on the end-to-end neural network

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319410030 ◽

2019 ◽

Vol 18 (01) ◽

pp. 1941003 ◽

Cited By ~ 1

Author(s):

Ren Qing-Dao-Er-Ji ◽

Yila Su ◽

Nier Wu

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Conditional Random Field ◽

Target Language ◽

Word Alignment ◽

Neural Machine Translation ◽

Attention Model ◽

End To End ◽

Improved Model

With the development of natural language processing and neural machine translation, the neural machine translation method of end-to-end (E2E) neural network model has gradually become the focus of research because of its high translation accuracy and strong semantics of translation. However, there are still problems such as limited vocabulary and low translation loyalty, etc. In this paper, the discriminant method and the Conditional Random Field (CRF) model were used to segment and label the stem and affixes of Mongolian in the preprocessing stage of Mongolian-Chinese bilingual corpus. Aiming at the low translation loyalty problem, a decoding model combining Convolution Neural Network (CNN) and Gated Recurrent Unit (GRU) was constructed. The target language decoding was performed by using the GRU. A global attention model was used to obtain the bilingual word alignment information in the process of bilingual word alignment processing. Finally, the quality of the translation was evaluated by Bilingual Evaluation Understudy (BLEU) values and Perplexity (PPL) values. The improved model yields a BLEU value of 25.13 and a PPL value of [Formula: see text]. The experimental results show that the E2E Mongolian-Chinese neural machine translation model was improved in terms of translation quality and semantic confusion compared with traditional statistical methods and machine translation models based on Recurrent Neural Networks (RNN).

Download Full-text

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

10.18653/v1/w19-5201 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shuoyang Ding ◽

Hainan Xu ◽

Philipp Koehn

Keyword(s):

Machine Translation ◽

Word Alignment ◽

Neural Machine Translation

Download Full-text

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

10.18653/v1/2021.emnlp-main.664 ◽

2021 ◽

Author(s):

Alessandro Raganato ◽

Raúl Vázquez ◽

Mathias Creutz ◽

Jörg Tiedemann

Keyword(s):

Machine Translation ◽

Empirical Investigation ◽

Word Alignment ◽

Neural Machine Translation

Download Full-text

Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation

Journal of Natural Language Processing ◽

10.5715/jnlp.27.531 ◽

2020 ◽

Vol 27 (3) ◽

pp. 531-552

Author(s):

Chunpeng Ma ◽

Akihiro Tamura ◽

Masao Utiyama ◽

Tiejun Zhao ◽

Eiichiro Sumita

Keyword(s):

Machine Translation ◽

Axiomatic Method ◽

Word Alignment ◽

Neural Machine Translation ◽

Word Alignments

Download Full-text

Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation

Journal of Natural Language Processing ◽

10.5715/jnlp.28.694 ◽

2021 ◽

Vol 28 (2) ◽

pp. 694-699

Author(s):

Chunpeng Ma

Keyword(s):

Machine Translation ◽

Axiomatic Method ◽

Word Alignment ◽

Neural Machine Translation ◽

Word Alignments

Download Full-text

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3340544 ◽

2019 ◽

Vol 28 (4) ◽

pp. 1-29 ◽

Cited By ~ 2

Author(s):

Michele Tufano ◽

Cody Watson ◽

Gabriele Bavota ◽

Massimiliano Di Penta ◽

Martin White ◽

...

Keyword(s):

Empirical Study ◽

Machine Translation ◽

Neural Machine Translation ◽

Bug Fixing ◽

In The Wild

Download Full-text

Neural Machine Translation for Semantic-Driven Q&A Systems in the Factory Planning

Procedia CIRP ◽

10.1016/j.procir.2021.01.044 ◽

2021 ◽

Vol 96 ◽

pp. 9-14

Author(s):

Uwe Dombrowski ◽

Alexander Reiswich ◽

Raphael Lamprecht

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Factory Planning

Download Full-text