Co-occurrence Weight Selection in Generation of Word Embeddings for Low Resource Languages

Veysel Yücesoy; Aykut Koç

doi:10.1145/3282443

Co-occurrence Weight Selection in Generation of Word Embeddings for Low Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3282443 ◽

2019 ◽

Vol 18 (3) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Veysel Yücesoy ◽

Aykut Koç

Keyword(s):

Word Embeddings ◽

Low Resource

Download Full-text

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3390298 ◽

2020 ◽

Vol 19 (5) ◽

pp. 1-15

Author(s):

Santwana Chimalamarri ◽

Dinkar Sitaram ◽

Ashritha Jain

Keyword(s):

Word Embeddings ◽

Low Resource ◽

Morphological Segmentation

Download Full-text

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

10.18653/v1/2021.acl-short.30 ◽

2021 ◽

Author(s):

Tobias Eder ◽

Viktor Hangya ◽

Alexander Fraser

Keyword(s):

Word Embeddings ◽

Low Resource

Download Full-text

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

10.18653/v1/w18-1201 ◽

2018 ◽

Cited By ~ 3

Author(s):

Pamela Shapiro ◽

Kevin Duh

Keyword(s):

Machine Translation ◽

Word Embeddings ◽

Neural Machine Translation ◽

Low Resource Settings ◽

Low Resource

Download Full-text

Word Embeddings in Low Resource Gujarati Language

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) ◽

10.1109/icdarw.2019.40090 ◽

2019 ◽

Author(s):

Ishani Joshi ◽

Purvi Koringa ◽

Suman Mitra

Keyword(s):

Word Embeddings ◽

Low Resource ◽

Gujarati Language

Download Full-text

Incorporating word embeddings in unsupervised morphological segmentation

Natural Language Engineering ◽

10.1017/s1351324920000406 ◽

2020 ◽

pp. 1-21

Author(s):

Ahmet Üstün ◽

Burcu Can

Keyword(s):

Semantic Information ◽

Maximum A Posteriori ◽

Word Embeddings ◽

A Posteriori ◽

Low Resource ◽

A Posteriori Estimate ◽

Morphological Segmentation ◽

Vector Representations ◽

Turkish Language ◽

Maximum A Posteriori Estimate

Abstract We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.

Download Full-text

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

10.18653/v1/2021.mrl-1.2 ◽

2021 ◽

Author(s):

Takashi Wada ◽

Tomoharu Iwata ◽

Yuji Matsumoto ◽

Timothy Baldwin ◽

Jey Han Lau

Keyword(s):

Word Embeddings ◽

Parallel Corpora ◽

Low Resource ◽

Cross Lingual

Download Full-text

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

10.18653/v1/e17-1088 ◽

2017 ◽

Cited By ~ 7

Author(s):

Oliver Adams ◽

Adam Makarucha ◽

Graham Neubig ◽

Steven Bird ◽

Trevor Cohn

Keyword(s):

Language Modeling ◽

Word Embeddings ◽

Low Resource ◽

Cross Lingual

Download Full-text

Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3015467 ◽

2017 ◽

Vol 16 (3) ◽

pp. 1-19 ◽

Cited By ~ 12

Author(s):

Arjun Das ◽

Debasis Ganguly ◽

Utpal Garain

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Low Resource ◽

Named Entity

Download Full-text

Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages

10.21437/interspeech.2019-3119 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zixiaofan Yang ◽

Julia Hirschberg

Keyword(s):

Word Embeddings ◽

Low Resource

Download Full-text

Tackling neural machine translation in low-resource settings: a Portuguese case study

10.5753/stil.2021.17807 ◽

2021 ◽

Author(s):

Arthur T. Estrella ◽

João B. O. Souza Filho

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Word Embeddings ◽

Effective Solution ◽

Computational Power ◽

Limited Data ◽

Neural Machine Translation ◽

Low Resource

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.

Download Full-text