scholarly journals Semantic based Information Retrieval System by using WSD and DICE Coefficient

Author(s):  
Prof Thwe ◽  
Thi Thi Tun ◽  
Ohnmar Aung

In many NLP applications such as machine translation, content analysis and information retrieval, word sense disambiguation (WSD) is an important technique. In the information retrieval (IR) system, ambiguous words are damaging effect on the precision of this system. In this situation, WSD process is useful for automatically identifying the correct meaning of an ambiguous word. Therefore, this system proposes the word sense disambiguation algorithm to increase the precision of the IR system. This system provides additional semantics as conceptually related words with the help of glosses to each keyword in the query by disambiguating their meanings. This system uses the WordNet as the lexical resource that encodes concepts of each term. In this system, various senses that are provided by WSD algorithm have been used as semantics for indexing the documents to improve performance of IR system. By using keyword and sense, this system retrieves the relevant information according to the Dice similarity method.

2011 ◽  
Vol 135-136 ◽  
pp. 160-166 ◽  
Author(s):  
Xin Hua Fan ◽  
Bing Jun Zhang ◽  
Dong Zhou

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.


Now-a-days digital documents are playing a major role in all the areas /web, as such all the information is digitalised. Queries are used by the search engines to retrieve the information. Query plays a major role in information retrieval system, as a result relevant and non relevant documents are retrieved. Query expansion techniques will better the performance of the information retrieval system. Our proposed query expansion technique is Word Sense Disambiguation. This is to find the correct sense of the ambiguous word in regional Telugu language. In Query expansion, if the added query term is an ambiguous word, accuracy of relevant documents will be very less. So to avoid this, proposed method Word Sense Disambiguation (WSD) is used, which is related to NLP Natural Language Processing and Artificial Intelligence AI. WSD improves the accuracy of information retrieval system.


2014 ◽  
Vol 981 ◽  
pp. 153-156
Author(s):  
Chun Xiang Zhang ◽  
Long Deng ◽  
Xue Yao Gao ◽  
Li Li Guo

Word sense disambiguation is key to many application problems in natural language processing. In this paper, a specific classifier of word sense disambiguation is introduced into machine translation system in order to improve the quality of the output translation. Firstly, translation of ambiguous word is deleted from machine translation of Chinese sentence. Secondly, ambiguous word is disambiguated and the classification labels are translations of ambiguous word. Thirdly, these two translations are combined. 50 Chinese sentences including ambiguous words are collected for test experiments. Experimental results show that the translation quality is improved after the proposed method is applied.


Author(s):  
Farza Nurifan ◽  
Riyanarto Sarno ◽  
Cahyaningtyas Sekar Wahyuni

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.


2019 ◽  
Vol 26 (4) ◽  
pp. 413-432 ◽  
Author(s):  
Goonjan Jain ◽  
D.K. Lobiyal

AbstractHumans proficiently interpret the true sense of an ambiguous word by establishing association among words in a sentence. The complete sense of text is also based on implicit information, which is not explicitly mentioned. The absence of this implicit information is a significant problem for a computer program that attempts to determine the correct sense of ambiguous words. In this paper, we propose a novel method to uncover the implicit information that links the words of a sentence. We reveal this implicit information using a graph, which is then used to disambiguate the ambiguous word. The experiments show that the proposed algorithm interprets the correct sense for both homonyms and polysemous words. Our proposed algorithm has performed better than the approaches presented in the SemEval-2013 task for word sense disambiguation and has shown an accuracy of 79.6 percent, which is 2.5 percent better than the best unsupervised approach in SemEval-2007.


2001 ◽  
Vol 10 (01n02) ◽  
pp. 5-21 ◽  
Author(s):  
RADA F. MIHALCEA ◽  
DAN I. MOLDOVAN

In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Xin Wang ◽  
Wanli Zuo ◽  
Ying Wang

Word sense disambiguation (WSD) is a fundamental problem in nature language processing, the objective of which is to identify the most proper sense for an ambiguous word in a given context. Although WSD has been researched over the years, the performance of existing algorithms in terms of accuracy and recall is still unsatisfactory. In this paper, we propose a novel approach to word sense disambiguation based on topical and semantic association. For a given document, supposing that its topic category is accurately discriminated, the correct sense of the ambiguous term is identified through the corresponding topic and semantic contexts. We firstly extract topic discriminative terms from document and construct topical graph based on topic span intervals to implement topic identification. We then exploit syntactic features, topic span features, and semantic features to disambiguate nouns and verbs in the context of ambiguous word. Finally, we conduct experiments on the standard data set SemCor to evaluate the performance of the proposed method, and the results indicate that our approach achieves relatively better performance than existing approaches.


2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Adrian-Gabriel Chifu ◽  
Radu-Tudor Ionescu

AbstractSuccess in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.


Sign in / Sign up

Export Citation Format

Share Document