topic word
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
Zehao Yu

Topic word extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper, two improved algorithms for extracting and discovering topic words are proposed in the Rapid Topic word Detection (RTD) Algorithm and CategoryTextRank (CTextRank) Algorithm, which can effectively obtain information by extracting and filtering the topic words in the text. The algorithms overcome the shortcomings of traditional topic words discovering algorithms that require deep linguistic knowledge, domain or language specific annotated corpora. The two algorithms we proposed can process both short and long text. The biggest advantage of the algorithms is that they are unsupervised machine learning algorithms. They need not be trained to process text directly to get topic words. The Accuracy rate, recall rate and F-measure index have been greatly improved when using the two algorithms which show that the results obtained compare favorably with previously published results on datasets Inspec and SemEval. The first algorithm Rapid Topicword Detection improves the metrics compared to PositionRank and TextRank, the second algorithm CategoryTextRank improves the metrics compared to TextRank, SingleRank and TF-IDF.


2020 ◽  
Vol 10 (11) ◽  
pp. 3831 ◽  
Author(s):  
Sang-Min Park ◽  
Sung Joon Lee ◽  
Byung-Won On

Detecting the main aspects of a particular product from a collection of review documents is so challenging in real applications. To address this problem, we focus on utilizing existing topic models that can briefly summarize large text documents. Unlike existing approaches that are limited because of modifying any topic model or using seed opinion words as prior knowledge, we propose a novel approach of (1) identifying starting points for learning, (2) cleaning dirty topic results through word embedding and unsupervised clustering, and (3) automatically generating right aspects using topic and head word embedding. Experimental results show that the proposed methods create more clean topics, improving about 25% of Rouge–1, compared to the baseline method. In addition, through the proposed three methods, the main aspects suitable for given data are detected automatically.


2019 ◽  
Vol 7 (6) ◽  
pp. 42-47
Author(s):  
Evgeniya Kovaleva

The article reveals the peculiarities of the formation among young schoolchildren of the ability to cooperate in the course of educational interaction in order to successfully educate each student and educate a cohesive class team. The relevance and effectiveness of such approach to the educational process, which is considered as the most appropriate to the tasks set for the modern school, is substantiated. Examples of tasks in the Russian language on the topic “Word-formation” are given, which schoolchildren perform in groups or in pairs


2019 ◽  
Vol 46 (1) ◽  
pp. 23-40 ◽  
Author(s):  
Yezheng Liu ◽  
Fei Du ◽  
Jianshan Sun ◽  
Yuanchun Jiang

User-generated content has been an increasingly important data source for analysing user interests in both industries and academic research. Since the proposal of the basic latent Dirichlet allocation (LDA) model, plenty of LDA variants have been developed to learn knowledge from unstructured user-generated contents. An intractable limitation for LDA and its variants is that low-quality topics whose meanings are confusing may be generated. To handle this problem, this article proposes an interactive strategy to generate high-quality topics with clear meanings by integrating subjective knowledge derived from human experts and objective knowledge learned by LDA. The proposed interactive latent Dirichlet allocation (iLDA) model develops deterministic and stochastic approaches to obtain subjective topic-word distribution from human experts, combines the subjective and objective topic-word distributions by a linear weighted-sum method, and provides the inference process to draw topics and words from a comprehensive topic-word distribution. The proposed model is a significant effort to integrate human knowledge with LDA-based models by interactive strategy. The experiments on two real-world corpora show that the proposed iLDA model can draw high-quality topics with the assistance of subjective knowledge from human experts. It is robust under various conditions and offers fundamental supports for the applications of LDA-based topic modelling.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 44748-44760 ◽  
Author(s):  
Ya Xiao ◽  
Zhijie Fan ◽  
Chengxiang Tan ◽  
Qian Xu ◽  
Wenye Zhu ◽  
...  
Keyword(s):  

2018 ◽  
Vol 12 (03) ◽  
pp. 399-423 ◽  
Author(s):  
Shaheen Syed ◽  
Marco Spruit

Latent Dirichlet Allocation (LDA) has gained much attention from researchers and is increasingly being applied to uncover underlying semantic structures from a variety of corpora. However, nearly all researchers use symmetrical Dirichlet priors, often unaware of the underlying practical implications that they bear. This research is the first to explore symmetrical and asymmetrical Dirichlet priors on topic coherence and human topic ranking when uncovering latent semantic structures from scientific research articles. More specifically, we examine the practical effects of several classes of Dirichlet priors on 2000 LDA models created from abstract and full-text research articles. Our results show that symmetrical or asymmetrical priors on the document–topic distribution or the topic–word distribution for full-text data have little effect on topic coherence scores and human topic ranking. In contrast, asymmetrical priors on the document–topic distribution for abstract data show a significant increase in topic coherence scores and improved human topic ranking compared to a symmetrical prior. Symmetrical or asymmetrical priors on the topic–word distribution show no real benefits for both abstract and full-text data.


Author(s):  
Renai Chen ◽  
Qing Gao ◽  
Weiliang Ji ◽  
Fei Long ◽  
Qiang Ling
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document