Co-occurrence Weight Selection in Generation of Word Embeddings for Low Resource Languages

Author(s):  
Veysel Yücesoy ◽  
Aykut Koç
Keyword(s):  
2021 ◽  
Author(s):  
Tobias Eder ◽  
Viktor Hangya ◽  
Alexander Fraser
Keyword(s):  

2020 ◽  
pp. 1-21
Author(s):  
Ahmet Üstün ◽  
Burcu Can

Abstract We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.


2021 ◽  
Author(s):  
Takashi Wada ◽  
Tomoharu Iwata ◽  
Yuji Matsumoto ◽  
Timothy Baldwin ◽  
Jey Han Lau

2017 ◽  
Author(s):  
Oliver Adams ◽  
Adam Makarucha ◽  
Graham Neubig ◽  
Steven Bird ◽  
Trevor Cohn

2021 ◽  
Author(s):  
Arthur T. Estrella ◽  
João B. O. Souza Filho

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.


Sign in / Sign up

Export Citation Format

Share Document