scholarly journals What’s the Matter? Knowledge Acquisition by Unsupervised Multi-Topic Labeling for Spoken Utterances

Author(s):  
Sebastian Weigelt

Systems such as Alexa, Cortana, and Siri appear rather smart. However, they only react to predefined wordings and do not actually grasp the user’s intent. To overcome this limitation, a system must understand the topics the user is talking about. Therefore, we apply unsupervised multi-topic labeling to spoken utterances. Although topic labeling is a well-studied task on textual documents, its potential for spoken input is almost unexplored. Our approach for topic labeling is tailored to spoken utterances; it copes with short and ungrammatical input. The approach is two-tiered. First, we disambiguate word senses. We utilize Wikipedia as pre-labeled corpus to train a naïve-bayes classifier. Second, we build topic graphs based on DBpedia relations. We use two strategies to determine central terms in the graphs, i.e. the shared topics. One focuses on the dominant senses in the utterance and the other covers as many distinct senses as possible. Our approach creates multiple distinct topics per utterance and ranks results. The evaluation shows that the approach is feasible; the word sense disambiguation achieves a recall of 0.799. Concerning topic labeling, in a user study subjects assessed that in 90.9% of the cases at least one proposed topic label among the first four is a good fit. With regard to precision, the subjects judged that 77.2% of the top ranked labels are a good fit or good but somewhat too broad (Fleiss’ kappa κ = 0.27). We illustrate areas of application of topic labeling in the field of programming in spoken language. With topic labeling applied to the spoken input as well as ontologies that model the situational context we are able to select the most appropriate ontologies with an F1-score of 0.907.

2005 ◽  
Vol 14 (06) ◽  
pp. 919-934 ◽  
Author(s):  
KOSTAS FRAGOS ◽  
YANIS MAISTROS

This work presents a new method for an unsupervised word sense disambiguation task using WordNet semantic relations. In this method we expand the context of a word being disambiguated with related synsets from the available WordNet relations and study within this set the distribution of the related synset that correspond to each sense of the target word. A single sample Pearson-Chi-Square goodness-of-fit hypothesis test is used to determine whether the null hypothesis of a composite normality PDF is a reasonable assumption for a set of related synsets corresponding to a sense. The calculated p-value from this test is a critical value for deciding the correct sense. The target word is assigned the sense, the related synsets of which are distributed more "abnormally" relative to the other sets of the other senses. Our algorithm is evaluated on English lexical sample data from the Senseval-2 word sense disambiguation competition. Three WordNet relations, antonymy, hyponymy and hypernymy give a distributional set of related synsets for the context that was proved quite a good word sense discriminator, achieving comparable results with the system obtained the better results among the other competing participants.


2015 ◽  
Vol 54 ◽  
pp. 83-122 ◽  
Author(s):  
Ruben Izquierdo ◽  
Armando Suarez ◽  
German Rigau

As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.


2002 ◽  
Vol 8 (4) ◽  
pp. 359-373 ◽  
Author(s):  
BERNARDO MAGNINI ◽  
CARLO STRAPPARAVA ◽  
GIOVANNI PEZZULO ◽  
ALFIO GLIOZZO

This paper explores the role of domain information in word sense disambiguation. The underlying hypothesis is that domain labels, such as MEDICINE, ARCHITECTURE and SPORT, provide a useful way to establish semantic relations among word senses, which can be profitably used during the disambiguation process. Results obtained at the SENSEVAL-2 initiative confirm that for a significant subset of words domain information can be used to disambiguate with a very high level of precision.


2007 ◽  
Vol 33 (4) ◽  
pp. 553-590 ◽  
Author(s):  
Diana McCarthy ◽  
Rob Koeling ◽  
Julie Weeds ◽  
John Carroll

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.


Author(s):  
Zijian Hu ◽  
Fuli Luo ◽  
Yutong Tan ◽  
Wenxin Zeng ◽  
Zhifang Sui

Word Sense Disambiguation (WSD), as a tough task in Natural Language Processing (NLP), aims to identify the correct sense of an ambiguous word in a given context. There are two mainstreams in WSD. Supervised methods mainly utilize labeled context to train a classifier which generates the right probability distribution of word senses. Meanwhile knowledge-based (unsupervised) methods which focus on glosses (word sense definitions) always calculate the similarity of context-gloss pair as score to find out the right word sense. In this paper, we propose a generative adversarial framework WSD-GAN which combines two mainstream methods in WSD. The generative model, based on supervised methods, tries to generate a probability distribution over the word senses. Meanwhile the discriminative model, based on knowledge-based methods, focuses on predicting the relevancy of the context-gloss pairs and identifies the correct pairs over the others. Furthermore, in order to optimize both two models, we leverage policy gradient to enhance the performances of the two models mutually. Our experimental results show that WSD-GAN achieves competitive results on several English all-words WSD datasets.


2015 ◽  
pp. 269-292 ◽  
Author(s):  
Paweł Kędzia ◽  
Maciej Piasecki ◽  
Marlena Orlińska

Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesLexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.


Author(s):  
Oleg Kalinin

The article dwells on a modern cognitive and discourse study of metaphors. Taking the advantage of the analysis and fusion of information in foreign and domestic papers, the researcher delves into their classification from the ontological, axiological and epistemological points of view. The ontological level breaks down into two basic approaches, namely metaphorical nature of discourse and discursive nature of metaphors. The former analyses metaphors to fathom characteristics of discourse, while the other provides for the study of metaphorical features in the context of discursive communication. The axiological aspect covers critical and descriptive studies and the epistemological angle comprises quantitive and qualitative methods in metaphorical studies. Other issues covered in the paper incorporate a thorough review of methods for identification of metaphors to include computer-assisted solutions (Word Sense Disambiguation, Categorisation, Metaphor Clusters) and numerical analysis of the metaphorical nature of discourse – descriptor analysis, metaphor power index, cluster analysis, and complex metaphor power analysis. On the one hand, the conceptualization of research papers boils down to major features of the discursive approach to metaphors and on the other, multiple studies of metaphors in the context of discourse pave the way for a discursive trend in cognitive metaphorology.


Sign in / Sign up

Export Citation Format

Share Document