scholarly journals A Review of Unsupervised Keyphrase Extraction Methods Using Within-Collection Resources

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1864
Author(s):  
Chengyu Sun ◽  
Liang Hu ◽  
Shuai Li ◽  
Tuohang Li ◽  
Hongtu Li ◽  
...  

An essential part of a text generation task is to extract critical information from the text. People usually obtain critical information in the text via manual extraction; however, the asymmetry between the ability to process information manually and the speed of information growth makes it impossible. This problem can be solved by automatic keyphrase extraction. In this paper, the mainstream unsupervised methods to extract keyphrases are summarized, and we analyze in detail the reasons for the differences in the performance of methods then provided some solutions.

2020 ◽  
pp. 5-9
Author(s):  
Manasvi Srivastava ◽  
◽  
Vikas Yadav ◽  
Swati Singh ◽  
◽  
...  

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-29
Author(s):  
Peijie Sun ◽  
Le Wu ◽  
Kun Zhang ◽  
Yu Su ◽  
Meng Wang

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.


Author(s):  
Ratish Puduppully ◽  
Li Dong ◽  
Mirella Lapata

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released RotoWIRE dataset.


2019 ◽  
Vol 3 (3) ◽  
pp. 58 ◽  
Author(s):  
Tim Haarman ◽  
Bastiaan Zijlema ◽  
Marco Wiering

Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages.


2021 ◽  
Author(s):  
Jiawen Zhu ◽  
◽  
Jinye Ran ◽  
Roy Ka-wei Lee ◽  
Kenny Choo ◽  
...  

2021 ◽  
Author(s):  
Jian Liu ◽  
Zhiyang Teng ◽  
Leyang Cui ◽  
Hanmeng Liu ◽  
Yue Zhang

Author(s):  
Isabella Gagliardi ◽  
Maria Teresa Artese

Keyword/keyphrase extraction is an important research activity in text mining, natural language processing, and information retrieval. A large number of algorithms, divided into supervised or unsupervised methods, have been designed and developed to solve the problem of automatic keyphrases extraction. The aim of the chapter is to critically discuss the unsupervised automatic keyphrases extraction algorithms, analyzing in depth their characteristics. The methods presented will be tested on different datasets, presenting in detail the data, the algorithms, and the different options tested in the runs. Moreover, most of the studies and experiments have been conducted on texts in English, while there are few experiments concerning other languages, such as Italian. Particular attention will be paid to the evaluation of the results of the methods in two different languages, English, and Italian.


2008 ◽  
Vol 10 (4) ◽  
pp. 311-316 ◽  
Author(s):  
Nicola Dundas ◽  
N. Kristine Leos ◽  
Midori Mitui ◽  
Paula Revell ◽  
Beverly Barton Rogers

Sign in / Sign up

Export Citation Format

Share Document