Synchronous Bidirectional Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00256 ◽

2019 ◽

Vol 7 ◽

pp. 91-105 ◽

Cited By ~ 8

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Target Language ◽

Single Model ◽

Neural Machine Translation ◽

German Translation ◽

Transformer Model ◽

Target Side ◽

Future Information

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art per- formance on Chinese-English and English- German translation tasks. 1

Download Full-text

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Download Full-text

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00065 ◽

2017 ◽

Vol 5 ◽

pp. 339-351 ◽

Cited By ~ 159

Author(s):

Melvin Johnson ◽

Mike Schuster ◽

Quoc V. Le ◽

Maxim Krikun ◽

Yonghui Wu ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Target Language ◽

Translation System ◽

Single Model ◽

Neural Machine Translation ◽

Comparable Performance ◽

Machine Translation System ◽

Input Sentence ◽

Multiple Languages

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

Download Full-text

Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5341 ◽

2020 ◽

Vol 34 (01) ◽

pp. 115-122 ◽

Cited By ~ 3

Author(s):

Baijun Ji ◽

Zhirui Zhang ◽

Xiangyu Duan ◽

Min Zhang ◽

Boxing Chen ◽

...

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Large Scale ◽

Feature Space ◽

Target Language ◽

Smooth Transition ◽

Training Methods ◽

Neural Machine Translation ◽

Cross Lingual ◽

Effective Transfer

Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.

Download Full-text

Transductive Ensemble Learning for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6097 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6291-6298

Author(s):

Yiren Wang ◽

Lijun Wu ◽

Yingce Xia ◽

Tao Qin ◽

ChengXiang Zhai ◽

...

Keyword(s):

Machine Translation ◽

Ensemble Learning ◽

State Of The Art ◽

Ensemble Methods ◽

Target Language ◽

Test Set ◽

Neural Machine Translation ◽

Learning Tasks ◽

Marginal Improvement ◽

Source Test

Ensemble learning, which aggregates multiple diverse models for inference, is a common practice to improve the accuracy of machine learning tasks. However, it has been observed that the conventional ensemble methods only bring marginal improvement for neural machine translation (NMT) when individual models are strong or there are a large number of individual models. In this paper, we study how to effectively aggregate multiple NMT models under the transductive setting where the source sentences of the test set are known. We propose a simple yet effective approach named transductive ensemble learning (TEL), in which we use all individual models to translate the source test set into the target language space and then finetune a strong model on the translated synthetic corpus. We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English↔{German, Finnish}). The results show that our approach boosts strong individual models with significant improvement and benefits a lot from more individual models. Specifically, we achieve the state-of-the-art performances on the WMT2016-2018 English↔German translations.

Download Full-text

A Survey on Low-Resource Neural Machine Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/629 ◽

2021 ◽

Author(s):

Rui Wang ◽

Xu Tan ◽

Renqian Luo ◽

Tao Qin ◽

Tie-Yan Liu

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Neural Machine Translation ◽

Modal Data ◽

Low Resource ◽

Resource Setting ◽

Low Resource Setting ◽

Parallel Data ◽

Target Languages

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.

Download Full-text

Neural Machine Translation with Joint Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6344 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8285-8292

Author(s):

Yanyang Li ◽

Qiang Wang ◽

Tong Xiao ◽

Tongran Liu ◽

Jingbo Zhu

Keyword(s):

Machine Translation ◽

English Translation ◽

Large Scale ◽

State Of The Art ◽

Statistical Machine Translation ◽

The State ◽

Small Scale ◽

Neural Machine Translation ◽

Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to-Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point. We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

Download Full-text

Explicit Sentence Compression for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6347 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8311-8318

Author(s):

Zuchao Li ◽

Rui Wang ◽

Kehai Chen ◽

Masao Utiyama ◽

Eiichiro Sumita ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

General Information ◽

Compression Method ◽

Neural Machine Translation ◽

Sentence Compression ◽

French And English ◽

Source Sentence ◽

Empirical Tests ◽

Target Side

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

Download Full-text

A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units

Sensors ◽

10.3390/s21196509 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6509

Author(s):

Laith H. Baniata ◽

Isaac. K. E. Ampomah ◽

Seyoung Park

Keyword(s):

Machine Translation ◽

Target Language ◽

Modern Standard Arabic ◽

Neural Machine Translation ◽

Unknown Word ◽

Translation Model ◽

Standard Arabic ◽

Arabic Dialects ◽

Transformer Model ◽

Modern Standard

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).

Download Full-text

Extremely Low-Resource Text Simplification with Pre-trained Transformer Language Model

International Journal of Asian Language Processing ◽

10.1142/s2717554520500010 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050001

Author(s):

Takumi Maruyama ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Neural Machine Translation ◽

Low Resource ◽

Resource Setting ◽

Text Simplification ◽

Low Resource Setting

Inspired by machine translation task, recent text simplification approaches regard a task as a monolingual text-to-text generation, and neural machine translation models have significantly improved the performance of simplification tasks. Although such models require a large-scale parallel corpus, such corpora for text simplification are very few in number and smaller in size compared to machine translation task. Therefore, we have attempted to facilitate the training of simplification rewritings using pre-training from a large-scale monolingual corpus such as Wikipedia articles. In addition, we propose a translation language model to seamlessly conduct a fine-tuning of text simplification from the pre-training of the language model. The experimental results show that the translation language model substantially outperforms a state-of-the-art model under a low-resource setting. In addition, a pre-trained translation language model with only 3000 supervised examples can achieve a performance comparable to that of the state-of-the-art model using 30,000 supervised examples.

Download Full-text

Comparing Statistical and Neural Machine Translation Performance on Hindi-To-Tamil and English-To-Tamil

Digital ◽

10.3390/digital1020007 ◽

2021 ◽

Vol 1 (2) ◽

pp. 86-102

Author(s):

Akshai Ramesh ◽

Venkatesh Balavadhani Parthasarathy ◽

Rejwanul Haque ◽

Andy Way

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Training Data ◽

Research Translation ◽

Neural Machine Translation ◽

Low Resource ◽

Evaluation Scheme ◽

Dominant Paradigm ◽

Target Side

Phrase-based statistical machine translation (PB-SMT) has been the dominant paradigm in machine translation (MT) research for more than two decades. Deep neural MT models have been producing state-of-the-art performance across many translation tasks for four to five years. To put it another way, neural MT (NMT) took the place of PB-SMT a few years back and currently represents the state-of-the-art in MT research. Translation to or from under-resourced languages has been historically seen as a challenging task. Despite producing state-of-the-art results in many translation tasks, NMT still poses many problems such as performing poorly for many low-resource language pairs mainly because of its learning task’s data-demanding nature. MT researchers have been trying to address this problem via various techniques, e.g., exploiting source- and/or target-side monolingual data for training, augmenting bilingual training data, and transfer learning. Despite some success, none of the present-day benchmarks have entirely overcome the problem of translation in low-resource scenarios for many languages. In this work, we investigate the performance of PB-SMT and NMT on two rarely tested under-resourced language pairs, English-to-Tamil and Hindi-to-Tamil, taking a specialised data domain into consideration. This paper demonstrates our findings and presents results showing the rankings of our MT systems produced via a social media-based human evaluation scheme.

Download Full-text