Tradução automática, ma non troppo

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.

Download Full-text

Modeling Past and Future for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00011 ◽

2018 ◽

Vol 6 ◽

pp. 145-157 ◽

Cited By ~ 2

Author(s):

Zaixiang Zheng ◽

Hao Zhou ◽

Shujian Huang ◽

Lili Mou ◽

Xinyu Dai ◽

...

Keyword(s):

Machine Translation ◽

Alignment Error ◽

Source Information ◽

Neural Machine Translation ◽

Attention Model ◽

Translation Quality ◽

The Past ◽

Proposed Model ◽

Coverage Model ◽

Translation Systems

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.

Download Full-text

Investigating usability in postediting neural machine translation: Evidence from translation trainees' self-perception and performance

Across Languages and Cultures ◽

10.1556/084.2021.00006 ◽

2021 ◽

Vol 22 (1) ◽

pp. 100-123

Author(s):

Xiangling Wang ◽

Tingting Wang ◽

Ricardo Muñoz Martín ◽

Yanfang Jia

Keyword(s):

Machine Translation ◽

Processing Speed ◽

Text Processing ◽

Ease Of Use ◽

Chinese Translation ◽

Neural Machine Translation ◽

Translation Quality ◽

Retrospective Reports ◽

And Performance ◽

Translation Systems

AbstractThis is a report on an empirical study on the usability for translation trainees of neural machine translation systems when post-editing (mtpe). Sixty Chinese translation trainees completed a questionnaire on their perceptions of mtpe's usability. Fifty of them later performed both a post-editing task and a regular translation task, designed to examine mtpe's usability by comparing their performance in terms of text processing speed, effort, and translation quality. Contrasting data collected by the questionnaire, keylogging, eyetracking and retrospective reports we found that, compared with regular, unaided translation, mtpe's usefulness in performance was remarkable: (1) it increased translation trainees' text processing speed and also improved their translation quality; (2) mtpe's ease of use in performance was partly proved in that it significantly reduced informants' effort as measured by (a) fixation duration and fixation counts; (b) total task time; and (c) the number of insertion keystrokes and total keystrokes. However, (3) translation trainees generally perceived mtpe to be useful to increase productivity, but they were skeptical about its use to improve quality. They were neutral towards the ease of use of mtpe.

Download Full-text

Machine Translation Quality of Khalil Gibran’s The Prophet

10.31235/osf.io/9mvrf ◽

2017 ◽

Author(s):

AWEJ for Translation & Literary Studies ◽

Zakaryia Mustafa Almahasees

Keyword(s):

Machine Translation ◽

Low Cost ◽

Literary Texts ◽

Future Research ◽

Output Analysis ◽

Translation Quality ◽

Automatic Translation ◽

Word Level ◽

Full Analysis ◽

Translation Systems

Machine translation (MT) systems are widely used throughout the world freely or at low cost. The spread of MT entails a thorough analysis of translation produced by such translation systems. The present study evaluates the capacity of two MT systems-Google Translate and Microsoft Bing translator- in translation from Arabic into English of Khalil Gibran’s literary masterpiece - The Prophet (2000). The question that arises in the study is could we trust MT in the translation of literary masterpieces across languages and particularly from Arabic to English? How close does MT output to human translation? To conduct that, the study is adopted Bilingual Evaluation Understudy (BLEU) of Papineni (2000). MT output analysis showed that MT is not accurate, intelligible and natural in translating literary texts due to the difficulty of literary texts, as they are full of metaphors and cultural specifications. Besides, there are some linguistic errors: lexical, syntactic and misinformation. The study also found that both systems provided similar translation for the same input due to either the use of similar MT approach or learning from previous translated texts. Moreover, both systems in some instances, achieve good results at the word level, but bad results at collocation units. The study also showed that automatic translation is insufficient for providing a full analysis of MT output because all automatic metrics are misleading due to dependence on text similarity to a reference human translation. For future research, the study recommended conducting a correlative study that combines manual and automatic evaluation methods to ensure best analysis of MT output. Machine Translation (MT) is still far from reaching fully automatic translation of a quality obtained by human translators.

Download Full-text

Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment

E3S Web of Conferences ◽

10.1051/e3sconf/202128408001 ◽

2021 ◽

Vol 284 ◽

pp. 08001

Author(s):

Ilya Ulitkin ◽

Irina Filippova ◽

Natalia Ivanova ◽

Alexey Poroykov

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Dramatic Improvement ◽

Automatic Evaluation ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation ◽

Translation Systems ◽

Qualitative Changes

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.

Download Full-text

Dual-Source Transformer Model for Neural Machine Translation with Linguistic Knowledge

10.20944/preprints202002.0273.v1 ◽

2020 ◽

Author(s):

Yirong Pan ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Linguistic Features ◽

Neural Machine Translation ◽

Popular Method ◽

Translation Quality ◽

Knowledge Based ◽

Syntactic Information ◽

Dual Source ◽

Transformer Model

Incorporating source-side linguistic knowledge into the neural machine translation (NMT) model has recently achieved impressive performance on machine translation tasks. One popular method is to generalize the word embedding layer of the encoder to encode each word and its linguistic features. The other method is to change the architecture of the encoder to encode syntactic information. However, the former cannot explicitly balance the contribution from the word and its linguistic features. The latter cannot flexibly utilize various types of linguistic information. Focusing on the above issues, this paper proposes a novel NMT approach that models the words in parallel to the linguistic knowledge by using two separate encoders. Compared with the single encoder based NMT model, the proposed approach additionally employs the knowledge-based encoder to specially encode linguistic features. Moreover, it shares parameters across encoders to enhance the model representation ability of the source-side language. Extensive experiments show that the approach achieves significant improvements of up to 2.4 and 1.1 BLEU points on Turkish→English and English→Turkish machine translation tasks, respectively, which indicates that it is capable of better utilizing the external linguistic knowledge and effective improving the machine translation quality.

Download Full-text

Lost in Machine Translation: Contextual Linguistic Uncertainty

Vestnik Volgogradskogo gosudarstvennogo universiteta Serija 2 Jazykoznanije ◽

10.15688/jvolsu2.2019.4.10 ◽

2019 ◽

pp. 129-144

Author(s):

Anton Sukhoverkhov ◽

Dorothy DeWitt ◽

Ioannis Manasidi ◽

Keiko Nitta ◽

Vladimir Krstić

Keyword(s):

Machine Translation ◽

Native Speakers ◽

Open Systems ◽

Test Results ◽

Translation Quality ◽

Different Types ◽

Idiomatic Phrase ◽

National Languages ◽

Translation Systems

The article considers the issues related to the semantic, grammatical, stylistic and technical difficulties currently present in machine translation and compares its four main approaches: Rule-based (RBMT), Corpora-based (CBMT), Neural (NMT), and Hybrid (HMT). It also examines some "open systems", which allow the correction or augmentation of content by the users themselves ("crowdsourced translation"). The authors of the article, native speakers presenting different countries (Russia, Greece, Malaysia, Japan and Serbia), tested the translation quality of the most representative phrases from the English, Russian, Greek, Malay and Japanese languages by using different machine translation systems: PROMT (RBMT), Yandex. Translate (HMT) and Google Translate (NMT). The test results presented by the authors show low "comprehension level" of semantic, linguistic and pragmatic contexts of translated texts, mistranslations of rare and culture-specific words,unnecessary translation of proper names, as well as a low rate of idiomatic phrase and metaphor recognition. It is argued that the development of machine translation requires incorporation of literal, conceptual, and content-and-contextual forms of meaning processing into text translation expansion of metaphor corpora and contextological dictionaries, and implementation of different types and styles of translation, which take into account gender peculiarities, specific dialects and idiolects of users. The problem of untranslatability ('linguistic relativity') of the concepts, unique to a particular culture, has been reviewed from the perspective of machine translation. It has also been shown, that the translation of booming Internet slang, where national languages merge with English, is almost impossible without human correction.

Download Full-text

Can machine translation systems be evaluated by the crowd alone

Natural Language Engineering ◽

10.1017/s1351324915000339 ◽

2015 ◽

Vol 23 (1) ◽

pp. 3-30 ◽

Cited By ~ 10

Author(s):

YVETTE GRAHAM ◽

TIMOTHY BALDWIN ◽

ALISTAIR MOFFAT ◽

JUSTIN ZOBEL

Keyword(s):

Machine Translation ◽

Large Scale ◽

Statistical Machine Translation ◽

Crowd Sourcing ◽

Direct Estimate ◽

Translation Quality ◽

Relative Preference ◽

Human Evaluation ◽

Estimate Method ◽

Translation Systems

AbstractCrowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.

Download Full-text

The Varibility of Reproduction: Emotive Units in a Literary Text (On the material of Ukrainian, Russian and Chinese) Variation of reflection of English-speaking emotional units in the translation of an artistic work (in Ukrainian, Russian, Chinese)

Naukovy Visnyk of South Ukrainian National Pedagogical University named after K D Ushynsky Linguistic Sciences ◽

10.24195/2616-5317-2018-27-24 ◽

2019 ◽

Vol 26 (27) ◽

pp. 211-222

Author(s):

Yana Fedorko ◽

Tetiana Yablonskaya

Keyword(s):

Key Words ◽

Machine Translation ◽

Political Discourse ◽

Literary Text ◽

Translation Quality ◽

Automatic Translation ◽

Advantages And Disadvantages ◽

English Speaking ◽

Translation Systems

The article is focused on peculiarities of English and Chinese political discourse translation into Ukrainian. The advantages and disadvantages of machine translation are described on the basis of linguistic analysis of online Google Translate and M-Translate systems. The reasons of errors in translation are identified and the need of post-correction to improve the quality of translation is wanted. Key words: political discourse, automatic translation, online machine translation systems, machine translation quality assessment.

Download Full-text

POTENTIAL FOR THE USE OF ICT IN TEACHING SCIENTIFIC AND TECHNICAL TRANSLATION

Information Technologies and Learning Tools ◽

10.33407/itlt.v76i2.2812 ◽

2020 ◽

Vol 76 (2) ◽

pp. 86-95

Author(s):

Оксана Борисівна Бігич ◽

Валентина Василівна Стрілець

Keyword(s):

Machine Translation ◽

Professional Training ◽

General Purpose ◽

Word Processor ◽

Linguistic Knowledge ◽

Advantages And Disadvantages ◽

Text Corpora ◽

Technical Translation ◽

Ict Tools ◽

Translation Systems

The article analyzes the features of both general-purpose and translation ICT tools, which can be applied at different stages of scientific and technical translation and, accordingly, in the process of would-be translators’ professional training. General-purpose ICT tools include a Word processor as a means of the target text formatting and finalizing; search engines necessary for widening background, subject-matter and linguistic knowledge at the information-reference and terminology search stages; means of electronic communication, in particular email, providing interaction among the translation project participants. The paper focuses on advantages and disadvantages of translation ICT tools: translation databases (Reverso context); electronic dictionaries (ABBYY Lingvо, multitran) necessary for improving and expanding would-be translators’ vocabulary; electronic text corpora as sources of authentic linguistic materials, machine translation systems (Google Translate) and CAT tools (Trados, SmartCAT) capable of accelerating target text drafting. In the context of machine translation analysis, the necessity to determine the indicators of machine translatability and to carry out source text pre-editing based on controlled language rules and target text post-editing is emphasized. It is pointed out that CAT tools can increase translators’ performance, provide terminology consistency and enable translation project management; at the same time, their application requires developing special skills. A number of issues that need further research are outlined: concretizing the translator training stages appropriate for introducing specific ICT tools; the ratio of the training time allotted for pen and paper translation and ICT-based one; the expediency of applying machine translation systems, in particular Google Translate, in each particular case.

Download Full-text