scholarly journals Translation Quality and Error Recognition in Professional Neural Machine Translation Post-Editing

Informatics ◽  
2019 ◽  
Vol 6 (3) ◽  
pp. 41 ◽  
Author(s):  
Jennifer Vardaro ◽  
Moritz Schaeffer ◽  
Silvia Hansen-Schirra

This study aims to analyse how translation experts from the German department of the European Commission’s Directorate-General for Translation (DGT) identify and correct different error categories in neural machine translated texts (NMT) and their post-edited versions (NMTPE). The term translation expert encompasses translator, post-editor as well as revisor. Even though we focus on neural machine-translated segments, translator and post-editor are used synonymously because of the combined workflow using CAT-Tools as well as machine translation. Only the distinction between post-editor, which refers to a DGT translation expert correcting the neural machine translation output, and revisor, which refers to a DGT translation expert correcting the post-edited version of the neural machine translation output, is important and made clear whenever relevant. Using an automatic error annotation tool and the more fine-grained manual error annotation framework to identify characteristic error categories in the DGT texts, a corpus analysis revealed that quality assurance measures by post-editors and revisors of the DGT are most often necessary for lexical errors. More specifically, the corpus analysis showed that, if post-editors correct mistranslations, terminology or stylistic errors in an NMT sentence, revisors are likely to correct the same error type in the same post-edited sentence, suggesting that the DGT experts were being primed by the NMT output. Subsequently, we designed a controlled eye-tracking and key-logging experiment to compare participants’ eye movements for test sentences containing the three identified error categories (mistranslations, terminology or stylistic errors) and for control sentences without errors. We examined the three error types’ effect on early (first fixation durations, first pass durations) and late eye movement measures (e.g., total reading time and regression path durations). Linear mixed-effects regression models predict what kind of behaviour of the DGT experts is associated with the correction of different error types during the post-editing process.

2017 ◽  
Vol 108 (1) ◽  
pp. 121-132 ◽  
Author(s):  
Filip Klubička ◽  
Antonio Toral ◽  
Víctor M. Sánchez-Cartagena

AbstractWe compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems’ outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for such a task, and results show that the best performing system (neural) reduces the errors produced by the worst system (phrase-based) by 54%.


2020 ◽  
Vol 8 ◽  
pp. 393-408
Author(s):  
Xuan Zhang ◽  
Kevin Duh

Hyperparameter selection is a crucial part of building neural machine translation (NMT) systems across both academia and industry. Fine-grained adjustments to a model’s architecture or training recipe can mean the difference between a positive and negative research result or between a state-of-the-art and underperforming system. While recent literature has proposed methods for automatic hyperparameter optimization (HPO), there has been limited work on applying these methods to neural machine translation (NMT), due in part to the high costs associated with experiments that train large numbers of model variants. To facilitate research in this space, we introduce a lookup-based approach that uses a library of pre-trained models for fast, low cost HPO experimentation. Our contributions include (1) the release of a large collection of trained NMT models covering a wide range of hyperparameters, (2) the proposal of targeted metrics for evaluating HPO methods on NMT, and (3) a reproducible benchmark of several HPO methods against our model library, including novel graph-based and multiobjective methods.


2018 ◽  
Vol 284 ◽  
pp. 171-176 ◽  
Author(s):  
Heeyoul Choi ◽  
Kyunghyun Cho ◽  
Yoshua Bengio

2020 ◽  
Vol 10 (4) ◽  
pp. 43
Author(s):  
Linda Alkhawaja ◽  
Hanan Ibrahim ◽  
Fida’ Ghnaim ◽  
Sirine Awwad

The neural machine translation (NMT) revolution is upon us. Since 2016, an increasing number of scientific publications have examined the improvements in the quality of machine translation (MT) systems. However, much remains to be done for specific language pairs, such as Arabic and English. This raises the question whether NMT is a useful tool for translating text from English to Arabic. For this purpose, 100 English passages were obtained from different broadcasting websites and translated using NMT in Google Translate. The NMT outputs were reviewed by three professional bilingual evaluators specializing in linguistics and translation, who scored the translations based on the translation quality assessment (QA) model. First, the evaluators identified the most common errors that appeared in the translated text. Next, they evaluated adequacy and fluency of MT using a 5-point scale. Our results indicate that mistranslation is the most common type of error, followed by corruption of the overall meaning of the sentence and orthographic errors. Nevertheless, adequacy and fluency of the translated text are of acceptable quality. The results of our research can be used to improve the quality of Google NMT output.


2021 ◽  
Vol 9 ◽  
pp. 740-755
Author(s):  
Gongbo Tang ◽  
Philipp Rönchen ◽  
Rico Sennrich ◽  
Joakim Nivre

In this paper, we evaluate the translation of negation both automatically and manually, in English–German (EN–DE) and English– Chinese (EN–ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual evaluation in EN→DE, DE→EN, EN→ZH, and ZH→EN is 95.7%, 94.8%, 93.4%, and 91.7%, respectively. In addition, we show that under-translation is the most significant error type in NMT, which contrasts with the more diverse error profile previously observed for statistical machine translation. To better understand the root of the under-translation of negation, we study the model’s information flow and training data. While our information flow analysis does not reveal any deficiencies that could be used to detect or fix the under-translation of negation, we find that negation is often rephrased during training, which could make it more difficult for the model to learn a reliable link between source and target negation. We finally conduct intrinsic analysis and extrinsic probing tasks on negation, showing that NMT models can distinguish negation and non-negation tokens very well and encode a lot of information about negation in hidden states but nevertheless leave room for improvement.


2019 ◽  
Vol 28 (4) ◽  
pp. 1-29 ◽  
Author(s):  
Michele Tufano ◽  
Cody Watson ◽  
Gabriele Bavota ◽  
Massimiliano Di Penta ◽  
Martin White ◽  
...  

Procedia CIRP ◽  
2021 ◽  
Vol 96 ◽  
pp. 9-14
Author(s):  
Uwe Dombrowski ◽  
Alexander Reiswich ◽  
Raphael Lamprecht

Sign in / Sign up

Export Citation Format

Share Document