On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov; Nadir Durrani; Fahim Dalvi; Hassan Sajjad; James Glass

doi:10.1162/coli_a_00367

On the Linguistic Representational Power of Neural Machine Translation Models

Computational Linguistics ◽

10.1162/coli_a_00367 ◽

2020 ◽

Vol 46 (1) ◽

pp. 1-52

Author(s):

Yonatan Belinkov ◽

Nadir Durrani ◽

Fahim Dalvi ◽

Hassan Sajjad ◽

James Glass

Keyword(s):

Machine Translation ◽

Language Processing ◽

Lexical Semantics ◽

Linguistic Information ◽

Neural Machine Translation ◽

Recent Success ◽

Part Of Speech ◽

Morphologically Rich Languages ◽

Representational Power ◽

Semantic Dependencies

Despite the recent success of deep neural networks in natural language processing and other spheres of artificial intelligence, their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word structure captured within the learned representations, which is an important aspect in translating morphologically rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: (i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about word-morphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

A Survey on Hybrid Machine Translation

E3S Web of Conferences ◽

10.1051/e3sconf/202018401061 ◽

2020 ◽

Vol 184 ◽

pp. 01061

Author(s):

Anusha Anugu ◽

Gajula Ramesh

Keyword(s):

Machine Translation ◽

Language Processing ◽

Literature Survey ◽

Neural Machine Translation ◽

Translation Tools ◽

Translation Techniques ◽

Hybrid Machine ◽

Hybrid Machine Translation ◽

Translation Systems ◽

Evaluation Techniques

Machine translation has gradually developed in past 1940’s.It has gained more and more attention because of effective and efficient nature. As it makes the translation automatically without the involvement of human efforts. The distinct models of machine translation along with “Neural Machine Translation (NMT)” is summarized in this paper. Researchers have previously done lots of work on Machine Translation techniques and their evaluation techniques. Thus, we want to demonstrate an analysis of the existing techniques for machine translation including Neural Machine translation, their differences and the translation tools associated with them. Now-a-days the combination of two Machine Translation systems has the full advantage of using features from both the systems which attracts in the domain of natural language processing. So, the paper also includes the literature survey of the Hybrid Machine Translation (HMT).

Download Full-text

Effect of linguistic information in neural machine translation

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) ◽

10.1109/icaicta.2017.8090975 ◽

2017 ◽

Cited By ~ 1

Author(s):

Naomichi Nakamura ◽

Hitoshi Isahara

Keyword(s):

Machine Translation ◽

Linguistic Information ◽

Neural Machine Translation

Download Full-text

PhraseAttn: Dynamic Slot Capsule Networks for phrase representation in Neural Machine Translation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212101 ◽

2021 ◽

pp. 1-8

Author(s):

Binh Nguyen ◽

Binh Le ◽

Long H.B. Nguyen ◽

Dien Dinh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Vital Role ◽

Attention Mechanism ◽

Neural Machine Translation ◽

Translation Model ◽

Word Representation

Word representation plays a vital role in most Natural Language Processing systems, especially for Neural Machine Translation. It tends to capture semantic and similarity between individual words well, but struggle to represent the meaning of phrases or multi-word expressions. In this paper, we investigate a method to generate and use phrase information in a translation model. To generate phrase representations, a Primary Phrase Capsule network is first employed, then iteratively enhancing with a Slot Attention mechanism. Experiments on the IWSLT English to Vietnamese, French, and German datasets show that our proposed method consistently outperforms the baseline Transformer, and attains competitive results over the scaled Transformer with two times lower parameters.

Download Full-text

A comparative study of neural machine translation models for Turkish language

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211453 ◽

2021 ◽

pp. 1-11

Author(s):

Özgür Özdemir ◽

Emre Salih Akın ◽

Rıza Velioğlu ◽

Tuğba Dalyan

Keyword(s):

Machine Translation ◽

Computational Linguistics ◽

Attention Mechanism ◽

Neural Machine Translation ◽

English Translations ◽

Benchmark Datasets ◽

Important Challenge ◽

Morphologically Rich Languages ◽

Transformer Model ◽

Ted Talks

Machine translation (MT) is an important challenge in the fields of Computational Linguistics. In this study, we conducted neural machine translation (NMT) experiments on two different architectures. First, Sequence to Sequence (Seq2Seq) architecture along with a variation that utilizes attention mechanism is performed on translation task. Second, an architecture that is fully based on the self-attention mechanism, namely Transformer, is employed to perform a comprehensive comparison. Besides, the contribution of employing Byte Pair Encoding (BPE) and Gumbel Softmax distributions are examined for both architectures. The experiments are conducted on two different datasets: TED Talks that is one of the popular benchmark datasets for NMT especially among morphologically rich languages like Turkish and WMT18 News dataset that is provided by The Third Conference on Machine Translation (WMT) for shared tasks on various aspects of machine translation. The evaluation of Turkish-to-English translations’ results demonstrate that the Transformer model with combination of BPE and Gumbel Softmax achieved 22.4 BLEU score on TED Talks and 38.7 BLUE score on WMT18 News dataset. The empirical results support that using Gumbel Softmax distribution improves the quality of translations for both architectures.

Download Full-text

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

Chemical Science ◽

10.1039/c8sc02339e ◽

2018 ◽

Vol 9 (28) ◽

pp. 6091-6098 ◽

Cited By ~ 78

Author(s):

Philippe Schwaller ◽

Théophile Gaudin ◽

Dávid Lányi ◽

Costas Bekas ◽

Teodoro Laino

Keyword(s):

Organic Chemistry ◽

Machine Translation ◽

Chemical Reactions ◽

Language Processing ◽

Neural Machine Translation ◽

Translation Model ◽

Complex Organic

Using a text-based representation of molecules, chemical reactions are predicted with a neural machine translation model borrowed from language processing.

Download Full-text

Natural language processing for similar languages, varieties, and dialects: A survey

Natural Language Engineering ◽

10.1017/s1351324920000492 ◽

2020 ◽

Vol 26 (6) ◽

pp. 595-612

Author(s):

Marcos Zampieri ◽

Preslav Nakov ◽

Yves Scherrer

Keyword(s):

Natural Language Processing ◽

Data Collection ◽

Natural Language ◽

Machine Translation ◽

Computational Methods ◽

Language Processing ◽

Language Varieties ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

AbstractThere has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.

Download Full-text

Sublemma-Based Neural Machine Translation

Complexity ◽

10.1155/2021/5935958 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Thien Nguyen ◽

Huu Nguyen ◽

Phuoc Tran

Keyword(s):

Machine Translation ◽

Quality Data ◽

Human Judgment ◽

Linguistic Features ◽

Neural Machine Translation ◽

Low Resource ◽

Part Of Speech ◽

Proposed Model ◽

Translation Systems ◽

Ted Talks

Powerful deep learning approach frees us from feature engineering in many artificial intelligence tasks. The approach is able to extract efficient representations from the input data, if the data are large enough. Unfortunately, it is not always possible to collect large and quality data. For tasks in low-resource contexts, such as the Russian ⟶ Vietnamese machine translation, insights into the data can compensate for their humble size. In this study of modelling Russian ⟶ Vietnamese translation, we leverage the input Russian words by decomposing them into not only features but also subfeatures. First, we break down a Russian word into a set of linguistic features: part-of-speech, morphology, dependency labels, and lemma. Second, the lemma feature is further divided into subfeatures labelled with tags corresponding to their positions in the lemma. Being consistent with the source side, Vietnamese target sentences are represented as sequences of subtokens. Sublemma-based neural machine translation proves itself in our experiments on Russian-Vietnamese bilingual data collected from TED talks. Experiment results reveal that the proposed model outperforms the best available Russian ⟶ Vietnamese model by 0.97 BLEU. In addition, automatic machine judgment on the experiment results is verified by human judgment. The proposed sublemma-based model provides an alternative to existing models when we build translation systems from an inflectionally rich language, such as Russian, Czech, or Bulgarian, in low-resource contexts.

Download Full-text

Latent Part-of-Speech Sequences for Neural Machine Translation

10.18653/v1/d19-1072 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xuewen Yang ◽

Yingru Liu ◽

Dongliang Xie ◽

Xin Wang ◽

Niranjan Balasubramanian

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Part Of Speech

Download Full-text

Analyzing Architectures for Neural Machine Translation using Low Computational Resources

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2021.10502 ◽

2021 ◽

Vol 10 (5) ◽

pp. 9-16

Author(s):

Aditya Mandke ◽

Onkar Litake ◽

Dipali Kadam

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Time Constraints ◽

Neural Machine Translation ◽

Recent Developments ◽

Computationally Expensive ◽

Computational Resources

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.

Download Full-text