Generating Extractive Summaries of Scientific Paradigms

Journal of Artificial Intelligence Research ◽

10.1613/jair.3732 ◽

2013 ◽

Vol 46 ◽

pp. 165-201 ◽

Cited By ~ 22

Author(s):

V. Qazvinian ◽

D. R. Radev ◽

S. M. Mohammad ◽

B. Dorr ◽

D. Zajic ◽

...

Keyword(s):

Text Mining ◽

Community Detection ◽

Question Answering ◽

Scientific Literature ◽

Dependency Parsing ◽

Scientific Paradigms ◽

Unique Information ◽

Technical Material ◽

Scientific Topic

Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.

Download Full-text

Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types

Journal of Translational Medicine ◽

10.1186/s12967-021-02941-z ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Chris Bauer ◽

Ralf Herwig ◽

Matthias Lienhard ◽

Paul Prasse ◽

Tobias Scheffer ◽

...

Keyword(s):

Text Mining ◽

Knowledge Base ◽

Survival Data ◽

Scientific Literature ◽

Entity Recognition ◽

Literature Mining ◽

Cancer Drugs ◽

Classical Text ◽

Anti Cancer ◽

Cancer Types

Abstract Background There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. Methods In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data. Results We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap. Conclusions Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.

Download Full-text

PaperBLAST: Text-mining papers for information about homologs

10.1101/133041 ◽

2017 ◽

Author(s):

Morgan N. Price ◽

Adam P. Arkin

Keyword(s):

Text Mining ◽

Genome Sequencing ◽

Full Text ◽

Large Scale ◽

Scientific Literature ◽

Protein Sequences ◽

Protein Coding ◽

Link Protein ◽

Protein Coding Genes ◽

Link Type

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.

Download Full-text

A Text Mining Approach for Definition Question Answering

Advances in Natural Language Processing - Lecture Notes in Computer Science ◽

10.1007/11816508_10 ◽

2006 ◽

pp. 76-86 ◽

Cited By ~ 11

Author(s):

Claudia Denicia-Carral ◽

Manuel Montes-y-Gómez ◽

Luis Villaseñor-Pineda ◽

René García Hernández

Keyword(s):

Text Mining ◽

Question Answering

Download Full-text

Events Automatic Extraction from Arabic Texts

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch078 ◽

2020 ◽

pp. 1686-1704

Author(s):

Emna Hkiri ◽

Souheyl Mallat ◽

Mounir Zrigui

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Text Mining ◽

Machine Translation ◽

Language Processing ◽

Question Answering ◽

Arabic Language ◽

Event Extraction ◽

Mining Machine ◽

Open Domain

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

Download Full-text

Automatic Genre-Specific Text Classification

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch020 ◽

2011 ◽

pp. 120-127

Author(s):

Xiaoyan Yu ◽

Manas Tungare ◽

Weiguo Fan ◽

Manuel Pérez-Quiñones ◽

Edward A. Fox ◽

...

Keyword(s):

Text Mining ◽

Text Classification ◽

Information Needs ◽

Question Answering ◽

Class Schedule ◽

Semistructured Documents ◽

Linkage Information ◽

Filter Noise ◽

Topic Tracking ◽

Course Syllabus

Starting with a vast number of unstructured or semistructured documents, text mining tools analyze and sift through them to present to users more valuable information specific to their information needs. The technologies in text mining include information extraction, topic tracking, summarization, categorization/ classification, clustering, concept linkage, information visualization, and question answering [Fan, Wallace, Rich, & Zhang, 2006]. In this chapter, we share our hands-on experience with one specific text mining task — text classification [Sebastiani, 2002]. Information occurs in various formats, and some formats have a specific structure or specific information that they contain: we refer to these as `genres’. Examples of information genres include news items, reports, academic articles, etc. In this paper, we deal with a specific genre type, course syllabus. A course syllabus is such a genre, with the following commonly-occurring fields: title, description, instructor’s name, textbook details, class schedule, etc. In essence, a course syllabus is the skeleton of a course. Free and fast access to a collection of syllabi in a structured format could have a significant impact on education, especially for educators and life-long learners. Educators can borrow ideas from others’ syllabi to organize their own classes. It also will be easy for life-long learners to find popular textbooks and even important chapters when they would like to learn a course on their own. Unfortunately, searching for a syllabus on the Web using Information Retrieval [Baeza-Yates & Ribeiro-Neto, 1999] techniques employed by a generic search engine often yields too many non-relevant search result pages (i.e., noise) — some of these only provide guidelines on syllabus creation; some only provide a schedule for a course event; some have outgoing links to syllabi (e.g. a course list page of an academic department). Therefore, a well-designed classifier for the search results is needed, that would help not only to filter noise out, but also to identify more relevant and useful syllabi.

Download Full-text

Analysis of Protein Phosphorylation and Its Functional Impact on Protein–Protein Interactions via Text Mining of the Scientific Literature

Protein Bioinformatics - Methods in Molecular Biology ◽

10.1007/978-1-4939-6783-4_10 ◽

2017 ◽

pp. 213-232 ◽

Cited By ~ 3

Author(s):

Qinghua Wang ◽

Karen E. Ross ◽

Hongzhan Huang ◽

Jia Ren ◽

Gang Li ◽

...

Keyword(s):

Text Mining ◽

Protein Phosphorylation ◽

Protein Interactions ◽

Scientific Literature ◽

Protein Protein Interactions ◽

Functional Impact

Download Full-text

Using Machine Learning and Text Mining in Question Answering

Evaluation of Multilingual and Multi-modal Information Retrieval - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74999-8_49 ◽

2007 ◽

pp. 415-423 ◽

Cited By ~ 3

Author(s):

Antonio Juárez-González ◽

Alberto Téllez-Valero ◽

Claudia Denicia-Carral ◽

Manuel Montes-y-Gómez ◽

Luis Villaseñor-Pineda

Keyword(s):

Machine Learning ◽

Text Mining ◽

Question Answering

Download Full-text

Extracting kinetic information from literature with KineticRE

Journal of Integrative Bioinformatics ◽

10.1515/jib-2015-282 ◽

2015 ◽

Vol 12 (4) ◽

pp. 56-68

Author(s):

Ana Alão Freitas ◽

Hugo Costa ◽

Isabel Rocha

Keyword(s):

Text Mining ◽

Metabolic Networks ◽

Scientific Literature ◽

Kluyveromyces Lactis ◽

Relevant Information ◽

Text Documents ◽

Kinetic Information ◽

Mining Tool ◽

Text Mining Tool

Summary To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.

Download Full-text

Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer

Bioinformatics ◽

10.1093/bioinformatics/btx454 ◽

2017 ◽

Vol 33 (24) ◽

pp. 3973-3981 ◽

Cited By ~ 38

Author(s):

Simon Baker ◽

Imran Ali ◽

Ilona Silins ◽

Sampo Pyysalo ◽

Yufan Guo ◽

...

Keyword(s):

Text Mining ◽

Scientific Literature ◽

Cancer Hallmarks

Download Full-text

Terminological resources for text mining over biomedical scientific literature

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2011.04.011 ◽

2011 ◽

Vol 52 (2) ◽

pp. 107-114 ◽

Cited By ~ 9

Author(s):

Fabio Rinaldi ◽

Kaarel Kaljurand ◽

Rune Sætre

Keyword(s):

Text Mining ◽

Scientific Literature

Download Full-text