Arabic Gloss WSD Using BERT

Mohammed El-Razzaz; Mohamed Waleed Fakhr; Fahima A. Maghraby

doi:10.3390/app11062567

Arabic Gloss WSD Using BERT

Applied Sciences ◽

10.3390/app11062567 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2567

Author(s):

Mohammed El-Razzaz ◽

Mohamed Waleed Fakhr ◽

Fahima A. Maghraby

Keyword(s):

Target Word ◽

Semantic Similarity ◽

Test Data ◽

Word Sense Disambiguation ◽

Word Sense ◽

Written Text ◽

Knowledge Based ◽

Training Samples ◽

Sense Disambiguation ◽

Definition Of

Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.

Download Full-text

A Knowledge Based Word Sense Disambiguation in Telugu Language

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1911.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 440-445

Keyword(s):

Computational Linguistics ◽

Word Sense Disambiguation ◽

The Other ◽

Word Sense ◽

Knowledge Based ◽

Ambiguous Words ◽

Sense Disambiguation ◽

The Senses ◽

Definition Of ◽

Polysemous Words

Telugu (తెలుగు) is one of the Dravidian languages which are morphologically rich. As within the other languages, it too consists of ambiguous words/phrases which have one-of-a-kind meanings in special contexts. Such words are referred as polysemous words i.e. words having a couple of experiences. A Knowledge based approach is proposed for disambiguating Telugu polysemous phrases using the computational linguistics tool, IndoWordNet. The task of WSD (Word sense disambiguation) requires finding out the similarity among the target phrase and the nearby phrase. In this approach, the similarity is calculated either by means of locating out the range of similar phrases (intersection) between the glosses (definition) of the target and nearby words or by way of finding out the exact occurrence of the nearby phrase's sense in the hierarchy (hypernyms/hyponyms) of the target phrase's senses. The above parameters are changed by using the intersection use of not simplest the glosses but also by using which include the related words. Additionally, it is a third parameter 'distance' which measures the distance among the target and nearby phrases. The proposed method makes use of greater parameters for calculating similarity. It scores the senses based on the general impact of parameters i.e. intersection, hierarchy and distance, after which chooses the sense with the best score. The correct meaning of Telugu polysemous phrase could be identified with this technique.

Download Full-text

Enhancing Word Sense Disambiguation Using A Hybrid Knowledge-Based Technique

Natural Language Processing and Cognitive Science ◽

10.1515/9781501501289.15 ◽

2015 ◽

Author(s):

Eniafe Festus Ayetiran ◽

Guido Boella ◽

Luigi Di Caro ◽

Livio Robaldo

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation ◽

Hybrid Knowledge

Download Full-text

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2012-001350 ◽

2013 ◽

Vol 20 (5) ◽

pp. 882-886 ◽

Cited By ~ 15

Author(s):

Vijay N Garla ◽

Cynthia Brandt

Keyword(s):

Word Sense Disambiguation ◽

Document Classification ◽

Word Sense ◽

Knowledge Based ◽

Clinical Document ◽

Sense Disambiguation

Download Full-text

DISTRIBUTIONAL ANALYSIS OF RELATED SYNSETS IN WordNet FOR A WORD SENSE DISAMBIGUATION TASK

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002478 ◽

2005 ◽

Vol 14 (06) ◽

pp. 919-934 ◽

Cited By ~ 1

Author(s):

KOSTAS FRAGOS ◽

YANIS MAISTROS

Keyword(s):

Target Word ◽

Goodness Of Fit ◽

Hypothesis Test ◽

Word Sense Disambiguation ◽

The Other ◽

Reasonable Assumption ◽

P Value ◽

Word Sense ◽

Chi Square ◽

Sense Disambiguation

This work presents a new method for an unsupervised word sense disambiguation task using WordNet semantic relations. In this method we expand the context of a word being disambiguated with related synsets from the available WordNet relations and study within this set the distribution of the related synset that correspond to each sense of the target word. A single sample Pearson-Chi-Square goodness-of-fit hypothesis test is used to determine whether the null hypothesis of a composite normality PDF is a reasonable assumption for a set of related synsets corresponding to a sense. The calculated p-value from this test is a critical value for deciding the correct sense. The target word is assigned the sense, the related synsets of which are distributed more "abnormally" relative to the other sets of the other senses. Our algorithm is evaluated on English lexical sample data from the Senseval-2 word sense disambiguation competition. Three WordNet relations, antonymy, hyponymy and hypernymy give a distributional set of related synsets for the context that was proved quite a good word sense discriminator, achieving comparable results with the system obtained the better results among the other competing participants.

Download Full-text

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

10.1109/bibe.2017.00-61 ◽

2017 ◽

Cited By ~ 5

Author(s):

Akm Sabbir ◽

Antonio Jimeno-Yepes ◽

Ramakanth Kavuluru

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation

Download Full-text

Word Sense Disambiguation

Emerging Applications of Natural Language Processing ◽

10.4018/978-1-4666-2169-5.ch002 ◽

2013 ◽

pp. 22-51

Author(s):

Pushpak Bhattacharyya ◽

Mitesh Khapra

Keyword(s):

State Of The Art ◽

Word Sense Disambiguation ◽

Current Trend ◽

General Purpose ◽

Word Sense ◽

Domain Specific ◽

Knowledge Based ◽

Current State ◽

Sense Disambiguation ◽

State Of Affairs

This chapter discusses the basic concepts of Word Sense Disambiguation (WSD) and the approaches to solving this problem. Both general purpose WSD and domain specific WSD are presented. The first part of the discussion focuses on existing approaches for WSD, including knowledge-based, supervised, semi-supervised, unsupervised, hybrid, and bilingual approaches. The accuracy value for general purpose WSD as the current state of affairs seems to be pegged at around 65%. This has motivated investigations into domain specific WSD, which is the current trend in the field. In the latter part of the chapter, we present a greedy neural network inspired algorithm for domain specific WSD and compare its performance with other state-of-the-art algorithms for WSD. Our experiments suggest that for domain-specific WSD, simply selecting the most frequent sense of a word does as well as any state-of-the-art algorithm.

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

A Word Sense Disambiguation Approach for English-Thai Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.287 ◽

2013 ◽

Vol 411-414 ◽

pp. 287-290

Author(s):

Nantapong Keandoungchun ◽

Nithinant Thammakoranonta

Keyword(s):

Target Word ◽

Word Sense Disambiguation ◽

Local Context ◽

Word Sense ◽

Test Statistic ◽

Maximum Probability ◽

Novel Approach ◽

Sense Disambiguation ◽

Stored Information ◽

Paired T Test

This paper proposes a novel approach for word sense disambiguation (WSD) in English to Thai. The approach generated a knowledge base which stored information of local context and then applied this information to analyze probabilities of several meanings of a target word. The meanings with the maximum probability are translated as Thai meaning of that English target word. The approach has been evaluated by analyzing the percentage of accuracy of the target word translation in each paper. It also compared the accuracy with Google translation. The experimental results indicate that the proposed approach is more accuracy than Google Translation by using paired T-test statistic equals to 6.628 with sig. = 0.00 (< 0.05)

Download Full-text