Word Sense Induction Using Correlated Topic Model

Author(s):  
Thanh Tung Hoang ◽  
Phuong Thai Nguyen
Author(s):  
Jing Wang ◽  
Mohit Bansal ◽  
Kevin Gimpel ◽  
Brian D. Ziebart ◽  
Clement T. Yu

Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.


Author(s):  
Reinald Kim Amplayo ◽  
Seung-won Hwang ◽  
Min Song

Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the stateof-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.


2014 ◽  
Vol 04 (11) ◽  
pp. 879-888
Author(s):  
Xingchen Yu ◽  
Ernest Fokoué

2018 ◽  
Vol 52 (3) ◽  
pp. 733-770 ◽  
Author(s):  
Flavio Massimiliano Cecchini ◽  
Martin Riedl ◽  
Elisabetta Fersini ◽  
Chris Biemann

Sign in / Sign up

Export Citation Format

Share Document