Using syntactic dependency as local context to resolve word sense ambiguity

10.3115/979617.979626 ◽

1997 ◽

Cited By ~ 5

Author(s):

Dekang Lin

Keyword(s):

Local Context ◽

Word Sense ◽

Word Sense Ambiguity ◽

Syntactic Dependency

Download Full-text

A Word Sense Disambiguation Approach for English-Thai Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.287 ◽

2013 ◽

Vol 411-414 ◽

pp. 287-290

Author(s):

Nantapong Keandoungchun ◽

Nithinant Thammakoranonta

Keyword(s):

Target Word ◽

Word Sense Disambiguation ◽

Local Context ◽

Word Sense ◽

Test Statistic ◽

Maximum Probability ◽

Novel Approach ◽

Sense Disambiguation ◽

Stored Information ◽

Paired T Test

This paper proposes a novel approach for word sense disambiguation (WSD) in English to Thai. The approach generated a knowledge base which stored information of local context and then applied this information to analyze probabilities of several meanings of a target word. The meanings with the maximum probability are translated as Thai meaning of that English target word. The approach has been evaluated by analyzing the percentage of accuracy of the target word translation in each paper. It also compared the accuracy with Google translation. The experimental results indicate that the proposed approach is more accuracy than Google Translation by using paired T-test statistic equals to 6.628 with sig. = 0.00 (< 0.05)

Download Full-text

A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00122 ◽

2015 ◽

Vol 3 ◽

pp. 59-71 ◽

Cited By ~ 8

Author(s):

Jing Wang ◽

Mohit Bansal ◽

Kevin Gimpel ◽

Brian D. Ziebart ◽

Clement T. Yu

Keyword(s):

Latent Variables ◽

Topic Model ◽

Model Performance ◽

Ambiguous Word ◽

Local Context ◽

Word Sense ◽

Word Sense Induction ◽

Improve Model ◽

Previous State ◽

Improve Model Performance

Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.

Download Full-text

Unsupervised Acquisition of Predominant Word Senses

Computational Linguistics ◽

10.1162/coli.2007.33.4.553 ◽

2007 ◽

Vol 33 (4) ◽

pp. 553-590 ◽

Cited By ~ 37

Author(s):

Diana McCarthy ◽

Rob Koeling ◽

Julie Weeds ◽

John Carroll

Keyword(s):

Word Sense Disambiguation ◽

Semantic Interpretation ◽

Word Sense ◽

Domain Specific ◽

Sense Disambiguation ◽

Word Sense Ambiguity ◽

Word Token ◽

Low Coverage ◽

Word Senses ◽

Better Than

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.

Download Full-text