Proceedings of the workshop on Data-driven methods in machine translation -

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts data driven machine translation and beyond -

10.3115/1118905 ◽

2003 ◽

Keyword(s):

Machine Translation ◽

Data Driven ◽

Parallel Texts

Download Full-text

The data-driven Bulgarian WordNet: BTBWN

Cognitive Studies | Études cognitives ◽

10.11649/cs.1713 ◽

2018 ◽

Author(s):

Petya Osenova ◽

Kiril Simov

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Semantic Information ◽

Data Driven ◽

Lexical Resources ◽

Multilingual Information Retrieval ◽

Cross Lingual ◽

Princeton Wordnet ◽

Word Senses

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.

Download Full-text

Machine Translation of Online Product Support Articles Using a Data-Driven MT System

Machine Translation: From Real Users to Research - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30194-3_27 ◽

2004 ◽

pp. 246-251

Author(s):

Stephen D. Richardson

Keyword(s):

Machine Translation ◽

Data Driven

Download Full-text

Deep Learning-based Roman-Urdu to Urdu Transliteration

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520017 ◽

2020 ◽

pp. 2152001

Author(s):

Mehreen Alam ◽

Sibt ul Hussain

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Research Problem ◽

Attention Mechanism ◽

Data Driven ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Source Language ◽

Data Driven Approach ◽

Modern Machine

Attention-based encoder-decoder models have superseded conventional techniques due to their unmatched performance on many neural machine translation problems. Usually, the encoders and decoders are two recurrent neural networks where the decoder is directed to focus on relevant parts of the source language using attention mechanism. This data-driven approach leads to generic and scalable solutions with no reliance on manual hand-crafted features. To the best of our knowledge, none of the modern machine translation approaches has been applied to address the research problem of Urdu machine transliteration. Ours is the first attempt to apply the deep neural network-based encoder-decoder using attention mechanism to address the aforementioned problem using Roman-Urdu and Urdu parallel corpus. To this end, we present (i) the first ever Roman-Urdu to Urdu parallel corpus of 1.1 million sentences, (ii) three state of the art encoder-decoder models, and (iii) a detailed empirical analysis of these three models on the Roman-Urdu to Urdu parallel corpus. Overall, attention-based model gives state-of-the-art performance with the benchmark of 70 BLEU score. Our qualitative experimental evaluation shows that our models generate coherent transliterations which are grammatically and logically correct.

Download Full-text

A comparative evaluation of data-driven models in translation selection of machine translation

10.3115/1072228.1072300 ◽

2002 ◽

Cited By ~ 2

Author(s):

Yu-Seop Kim ◽

Jeong-Ho Chang ◽

Byoung-Tak Zhang

Keyword(s):

Machine Translation ◽

Comparative Evaluation ◽

Data Driven ◽

Evaluation Of Data ◽

Selection Of

Download Full-text

Towards Data-Driven Machine Translation for Lumasaaba

Advances in Intelligent Systems and Computing - Digital Science ◽

10.1007/978-3-030-02351-5_1 ◽

2018 ◽

pp. 3-11 ◽

Cited By ~ 1

Author(s):

Peter Nabende

Keyword(s):

Machine Translation ◽

Data Driven

Download Full-text

Bidirectional Machine Translation Between Turkish and Turkish Sign Language : A Data-Driven Approach

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2017.6303 ◽

2017 ◽

Vol 6 (3) ◽

pp. 33-46 ◽

Cited By ~ 2

Author(s):

Merve Selcuk-Simsek ◽

Ilyas Cicekli

Keyword(s):

Sign Language ◽

Machine Translation ◽

Data Driven ◽

Data Driven Approach

Download Full-text

Preface

Natural Language Engineering ◽

10.1017/s1351324916000103 ◽

2016 ◽

Vol 22 (4) ◽

pp. 497-500

Author(s):

REINHARD RAPP ◽

SERGE SHAROFF ◽

PIERRE ZWEIGENBAUM

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Learning Systems ◽

Data Driven ◽

Paradigm Change ◽

High Quality ◽

Rule Based ◽

Self Learning ◽

High Quality Sample ◽

Automatic Systems

After several decades of work on rule-based machine translation (MT) where linguists try to manually encode their knowledge about language, the time around 1990 brought a paradigm change towards automatic systems which try to learn how to translate by looking at large collections of high-quality sample translations as produced by professional translators. The first such attempts were called example- or analogy-based translation, and somewhat later the so-called statistical approach to MT was introduced. Both can be subsumed under the label data-driven approaches to MT. It took about 10 years until these self-learning systems became serious competitors of the traditional rule-based systems, and by now some of the most successful MT systems, such as Google Translate and Moses, are based on the statistical approach.

Download Full-text

Hybrid data-driven models of machine translation

Machine Translation ◽

10.1007/s10590-006-9015-5 ◽

2006 ◽

Vol 19 (3-4) ◽

pp. 301-323 ◽

Cited By ~ 5

Author(s):

Declan Groves ◽

Andy Way

Keyword(s):

Machine Translation ◽

Data Driven ◽

Hybrid Data

Download Full-text