Unsupervised Quality Estimation for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00330 ◽

2020 ◽

Vol 8 ◽

pp. 539-555

Author(s):

Marina Fomicheva ◽

Shuo Sun ◽

Lisa Yankovskaya ◽

Frédéric Blain ◽

Francisco Guzmán ◽

...

Keyword(s):

Machine Translation ◽

Real World ◽

State Of The Art ◽

Black Box ◽

Test Time ◽

Quality Estimation ◽

Neural Machine Translation ◽

Real World Applications ◽

Unsupervised Approach

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation, and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By utilizing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivaling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.

Download Full-text

How to evaluate machine translation: A review of automated and human metrics

Natural Language Engineering ◽

10.1017/s1351324919000469 ◽

2019 ◽

Vol 26 (2) ◽

pp. 137-161

Author(s):

Eirini Chatzikoumi

Keyword(s):

Machine Translation ◽

Subjective Evaluation ◽

Evaluation Methods ◽

Quality Estimation ◽

Neural Machine Translation ◽

Mt Evaluation ◽

Error Classification ◽

Better Than ◽

Detailed Presentation

AbstractThis article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Download Full-text

Is Neural Machine Translation the New State of the Art?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0013 ◽

2017 ◽

Vol 108 (1) ◽

pp. 109-120 ◽

Cited By ~ 37

Author(s):

Sheila Castilho ◽

Joss Moorkens ◽

Federico Gaspari ◽

Iacer Calixto ◽

John Tinsley ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Evaluation Methods ◽

Automatic Evaluation ◽

New Paradigm ◽

Neural Machine Translation ◽

Human Evaluation ◽

Statistical Mt

Abstract This paper discusses neural machine translation (NMT), a new paradigm in the MT field, comparing the quality of NMT systems with statistical MT by describing three studies using automatic and human evaluation methods. Automatic evaluation results presented for NMT are very promising, however human evaluations show mixed results. We report increases in fluency but inconsistent results for adequacy and post-editing effort. NMT undoubtedly represents a step forward for the MT field, but one that the community should be careful not to oversell.

Download Full-text

Estimating word-level quality of statistical machine translation output using monolingual information alone

Natural Language Engineering ◽

10.1017/s1351324919000111 ◽

2019 ◽

Vol 26 (1) ◽

pp. 73-94

Author(s):

Arda Tezcan ◽

Véronique Hoste ◽

Lieve Macken

Keyword(s):

Machine Translation ◽

Network Architecture ◽

State Of The Art ◽

Statistical Machine Translation ◽

Neural Network Architecture ◽

Quality Estimation ◽

Word Level ◽

Syntactic Features ◽

Grammatical Errors

AbstractVarious studies show that statistical machine translation (SMT) systems suffer from fluency errors, especially in the form of grammatical errors and errors related to idiomatic word choices. In this study, we investigate the effectiveness of using monolingual information contained in the machine-translated text to estimate word-level quality of SMT output. We propose a recurrent neural network architecture which uses morpho-syntactic features and word embeddings as word representations within surface and syntactic n-grams. We test the proposed method on two language pairs and for two tasks, namely detecting fluency errors and predicting overall post-editing effort. Our results show that this method is effective for capturing all types of fluency errors at once. Moreover, on the task of predicting post-editing effort, while solely relying on monolingual information, it achieves on-par results with the state-of-the-art quality estimation systems which use both bilingual and monolingual information.

Download Full-text

A Survey on Bias and Fairness in Machine Learning

ACM Computing Surveys ◽

10.1145/3457607 ◽

2021 ◽

Vol 54 (6) ◽

pp. 1-35

Author(s):

Ninareh Mehrabi ◽

Fred Morstatter ◽

Nripsuta Saxena ◽

Kristina Lerman ◽

Aram Galstyan

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Real World ◽

State Of The Art ◽

Future Directions ◽

Discriminatory Behavior ◽

Real World Applications ◽

Near Future ◽

Different Sources

With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them. In this survey, we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and ways they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Knowledge Graphs Enhanced Neural Machine Translation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/559 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jiajun Zhang ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Semantic Space ◽

Neural Machine Translation ◽

Translation Quality ◽

Structured Information ◽

Knowledge Graphs ◽

Japanese Translation

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.

Download Full-text

Sliding-Window Thompson Sampling for Non-Stationary Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11407 ◽

2020 ◽

Vol 68 ◽

pp. 311-364

Author(s):

Francesco Trovo ◽

Stefano Paladino ◽

Marcello Restelli ◽

Nicola Gatti

Keyword(s):

Real World ◽

State Of The Art ◽

Sliding Window ◽

Upper Bounds ◽

Decision Problems ◽

Sequential Decision ◽

Thompson Sampling ◽

The Past ◽

Real World Applications ◽

Window Approach

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.

Download Full-text

Comparative Quality Estimation for Machine Translation Observations on Machine Learning and Features

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0029 ◽

2017 ◽

Vol 108 (1) ◽

pp. 307-318 ◽

Cited By ~ 1

Author(s):

Eleftherios Avramidis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Translation ◽

State Of The Art ◽

Linear Method ◽

The State ◽

Quality Estimation ◽

Art Methods ◽

Improved Performance

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.

Download Full-text

Insertion-based Decoding with Automatically Inferred Generation Order

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00292 ◽

2019 ◽

Vol 7 ◽

pp. 661-676 ◽

Cited By ~ 3

Author(s):

Jiatao Gu ◽

Qi Liu ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Real World ◽

Word Order ◽

Code Generation ◽

State Of The Art ◽

Generation Model ◽

Beam Search ◽

Input Information ◽

Sequence Generation ◽

Image Caption

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Download Full-text