A Review of Unsupervised Keyphrase Extraction Methods Using Within-Collection Resources

Chengyu Sun; Liang Hu; Shuai Li; Tuohang Li; Hongtu Li; Ling Chi

doi:10.3390/sym12111864

A Review of Unsupervised Keyphrase Extraction Methods Using Within-Collection Resources

Symmetry ◽

10.3390/sym12111864 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1864

Author(s):

Chengyu Sun ◽

Liang Hu ◽

Shuai Li ◽

Tuohang Li ◽

Hongtu Li ◽

...

Keyword(s):

Extraction Methods ◽

Text Generation ◽

Keyphrase Extraction ◽

Process Information ◽

Generation Task ◽

Critical Information ◽

The Asymmetry ◽

Unsupervised Methods ◽

Manual Extraction ◽

Information Growth

An essential part of a text generation task is to extract critical information from the text. People usually obtain critical information in the text via manual extraction; however, the asymmetry between the ability to process information manually and the speed of information growth makes it impossible. This problem can be solved by automatic keyphrase extraction. In this paper, the mainstream unsupervised methods to extract keyphrases are summarized, and we analyze in detail the reasons for the differences in the performance of methods then provided some solutions.

Download Full-text

Implementation of Web Application for Disease Prediction Using AI

10.54646/bijdmbd.002 ◽

2020 ◽

pp. 5-9

Author(s):

Manasvi Srivastava ◽

◽

Vikas Yadav ◽

Swati Singh ◽

◽

...

Keyword(s):

Web Application ◽

Ad Hoc ◽

Data Extraction ◽

Extraction Methods ◽

Web Page ◽

Web Based ◽

Web Extraction ◽

Web Scraping ◽

Audio Video ◽

Manual Extraction

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.

Download Full-text

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

ACM Transactions on Information Systems ◽

10.1145/3483611 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-29

Author(s):

Peijie Sun ◽

Le Wu ◽

Kun Zhang ◽

Yu Su ◽

Meng Wang

Keyword(s):

Data Privacy ◽

Auxiliary Information ◽

Generation Process ◽

Text Generation ◽

Generation Task ◽

Auxiliary Data ◽

Fine Grained ◽

Aspect Extraction ◽

Learning Framework ◽

Real World Datasets

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.

Download Full-text

Data-to-Text Generation with Content Selection and Planning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016908 ◽

2019 ◽

Vol 33 ◽

pp. 6908-6915 ◽

Cited By ~ 5

Author(s):

Ratish Puduppully ◽

Li Dong ◽

Mirella Lapata

Keyword(s):

Neural Network ◽

Network Architecture ◽

Large Scale ◽

Network Models ◽

Text Generation ◽

Neural Network Models ◽

Generation Task ◽

Content Selection ◽

End To End ◽

Two Stages

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released RotoWIRE dataset.

Download Full-text

Unsupervised Keyphrase Extraction for Web Pages

Multimodal Technologies and Interaction ◽

10.3390/mti3030058 ◽

2019 ◽

Vol 3 (3) ◽

pp. 58 ◽

Cited By ~ 1

Author(s):

Tim Haarman ◽

Bastiaan Zijlema ◽

Marco Wiering

Keyword(s):

Language Processing ◽

State Of The Art ◽

Structural Information ◽

Extraction Methods ◽

Web Pages ◽

Keyphrase Extraction ◽

Text Documents ◽

Normal Text ◽

Textual Data ◽

Novel Method

Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages.

Download Full-text

Comparisons of keyphrase extraction methods in source retrieval of plagiarism detection

2015 4th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2015.7490831 ◽

2015 ◽

Author(s):

Hui Ning ◽

Leilei Kong ◽

Mingxing Wang ◽

Cuixia Du ◽

Haoliang Qi

Keyword(s):

Extraction Methods ◽

Keyphrase Extraction ◽

Plagiarism Detection ◽

Source Retrieval

Download Full-text

AutoChart: A Dataset for Chart-to-Text Generation Task

10.26615/978-954-452-072-4_183 ◽

2021 ◽

Author(s):

Jiawen Zhu ◽

◽

Jinye Ran ◽

Roy Ka-wei Lee ◽

Kenny Choo ◽

...

Keyword(s):

Text Generation ◽

Generation Task

Download Full-text

Solving Aspect Category Sentiment Analysis as a Text Generation Task

10.18653/v1/2021.emnlp-main.361 ◽

2021 ◽

Author(s):

Jian Liu ◽

Zhiyang Teng ◽

Leyang Cui ◽

Hanmeng Liu ◽

Yue Zhang

Keyword(s):

Sentiment Analysis ◽

Text Generation ◽

Generation Task

Download Full-text

Unsupervised Automatic Keyphrases Extraction on Italian Datasets

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch009 ◽

2021 ◽

pp. 107-126

Author(s):

Isabella Gagliardi ◽

Maria Teresa Artese

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Important Research ◽

Keyphrase Extraction ◽

Research Activity ◽

Unsupervised Methods

Keyword/keyphrase extraction is an important research activity in text mining, natural language processing, and information retrieval. A large number of algorithms, divided into supervised or unsupervised methods, have been designed and developed to solve the problem of automatic keyphrases extraction. The aim of the chapter is to critically discuss the unsupervised automatic keyphrases extraction algorithms, analyzing in depth their characteristics. The methods presented will be tested on different datasets, presenting in detail the data, the algorithms, and the different options tested in the runs. Moreover, most of the studies and experiments have been conducted on texts in English, while there are few experiments concerning other languages, such as Italian. Particular attention will be paid to the evaluation of the results of the methods in two different languages, English, and Italian.

Download Full-text

Comparison of Automated Nucleic Acid Extraction Methods with Manual Extraction

Journal of Molecular Diagnostics ◽

10.2353/jmoldx.2008.070149 ◽

2008 ◽

Vol 10 (4) ◽

pp. 311-316 ◽

Cited By ~ 64

Author(s):

Nicola Dundas ◽

N. Kristine Leos ◽

Midori Mitui ◽

Paula Revell ◽

Beverly Barton Rogers

Keyword(s):

Nucleic Acid ◽

Extraction Methods ◽

Acid Extraction ◽

Nucleic Acid Extraction ◽

Manual Extraction

Download Full-text

RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation

Computer Speech & Language ◽

10.1016/j.csl.2017.07.004 ◽

2018 ◽

Vol 47 ◽

pp. 112-131 ◽

Cited By ~ 4

Author(s):

Gerardo Figueroa ◽

Po-Chi Chen ◽

Yi-Shin Chen

Keyword(s):

Extraction Methods ◽

Error Feedback ◽

Keyphrase Extraction

Download Full-text