Large-Scale Relation Extraction from Web Documents and Knowledge Graphs with Human-in-The-Loop

Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop

Journal of Web Semantics ◽

10.1016/j.websem.2019.100546 ◽

2020 ◽

Vol 60 ◽

pp. 100546

Author(s):

Petar Ristoski ◽

Anna Lisa Gentile ◽

Alfredo Alba ◽

Daniel Gruhl ◽

Steven Welch

Keyword(s):

Large Scale ◽

Relation Extraction ◽

Web Documents ◽

Human In The Loop ◽

Knowledge Graphs

Download Full-text

Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs

Proceedings of The Web Conference 2020 ◽

10.1145/3366423.3380014 ◽

2020 ◽

Author(s):

Trung-Kien Tran ◽

Mohamed H. Gad-Elrab ◽

Daria Stepanova ◽

Evgeny Kharlamov ◽

Jannik Strötgen

Keyword(s):

Large Scale ◽

Fast Computation ◽

Knowledge Graphs

Download Full-text

Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation

Information ◽

10.3390/info12080316 ◽

2021 ◽

Vol 12 (8) ◽

pp. 316

Author(s):

Sarthak Dash ◽

Michael R. Glass ◽

Alfio Gliozzo ◽

Mustafa Canim ◽

Gaetano Rossiello

Keyword(s):

Deep Learning ◽

Relation Extraction ◽

Automated System ◽

External Information ◽

Learning Approach ◽

Wide Margin ◽

Structure Information ◽

Adaptation Cost ◽

Knowledge Graphs ◽

Scale Experiment

In this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep-learning-based technology for relation extraction that can be trained by a distantly supervised approach. In addition, the system uses a deep learning approach for knowledge base completion by utilizing the global structure information of the induced KG to further refine the confidence of the newly discovered relations. The designed system does not require any effort for adaptation to new languages and domains as it does not use any hand-labeled data, NLP analytics, and inference rules. Our experiments, performed on a popular academic benchmark, demonstrate that the suggested system boosts the performance of relation extraction by a wide margin, reporting error reductions of 50%, resulting in relative improvement of up to 100%. Furthermore, a web-scale experiment conducted to extend DBPedia with knowledge from Common Crawl shows that our system is not only scalable but also does not require any adaptation cost, while yielding a substantial accuracy gain.

Download Full-text

Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs

Proceedings of the 29th ACM International Conference on Information & Knowledge Management ◽

10.1145/3340531.3412164 ◽

2020 ◽

Author(s):

Erin Macdonald ◽

Denilson Barbosa

Keyword(s):

Relation Extraction ◽

Knowledge Graphs

Download Full-text

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

Information ◽

10.3390/info11020079 ◽

2020 ◽

Vol 11 (2) ◽

pp. 79 ◽

Cited By ~ 2

Author(s):

Xiaoyu Han ◽

Yue Zhang ◽

Wenkai Zhang ◽

Tinglei Huang

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Additional Information ◽

Named Entity ◽

Proposed Model ◽

The Relationship ◽

Crucial Part

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.

Download Full-text

Constructing a Large-Scale English-Persian Parallel Corpus

Meta Journal des traducteurs ◽

10.7202/029804ar ◽

2009 ◽

Vol 54 (1) ◽

pp. 181-188 ◽

Cited By ~ 10

Author(s):

Tayebeh Mosavi Miangah

Keyword(s):

Large Scale ◽

Target Language ◽

Translation Memory ◽

Web Documents ◽

Parallel Corpus ◽

Translation Quality ◽

Text Corpora ◽

Develop Software ◽

General Translation ◽

The Web

Abstract In recent years the exploitation of large text corpora in solving various kinds of linguistic problems, including those of translation, is commonplace. Yet a large-scale English-Persian corpus is still unavailable, because of certain difficulties and the amount of work required to overcome them. The project reported here is an attempt to constitute an English-Persian parallel corpus composed of digital texts and Web documents containing little or no noise. The Internet is useful because translations of existing texts are often published on the Web. The task is to find parallel pages in English and Persian, to judge their translation quality, and to download and align them. The corpus so created is of course open; that is, more material can be added as the need arises. One of the main activities associated with building such a corpus is to develop software for parallel concordancing, in which a user can enter a search string in one language and see all the citations for that string in it and corresponding sentences in the target language. Our intention is to construct general translation memory software using the present English-Persian parallel corpus.

Download Full-text

BioRel: A Large-Scale Dataset for Biomedical Relation Extraction

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983057 ◽

2019 ◽

Author(s):

Rui Xing ◽

Jie Luo ◽

Tengwei Song

Keyword(s):

Large Scale ◽

Relation Extraction ◽

Large Scale Dataset ◽

Biomedical Relation Extraction

Download Full-text

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning

Database ◽

10.1093/database/baz116 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Tao Chen ◽

Mingfen Wu ◽

Hexi Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Relation Extraction ◽

Training Model ◽

Biomedical Literature ◽

Training Data ◽

Fine Tuning ◽

Learning Approaches ◽

Additional Time ◽

Clinical Records

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.

Download Full-text

Large-Scale Multi-Modal Data Exploration with Human in the Loop

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_12 ◽

2016 ◽

pp. 253-268 ◽

Cited By ~ 1

Author(s):

Guangchen Ruan ◽

Hui Zhang

Keyword(s):

Large Scale ◽

Data Exploration ◽

Modal Data ◽

Human In The Loop

Download Full-text

Causal Knowledge Extraction through Large-Scale Text Mining

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7092 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13610-13611

Author(s):

Oktie Hassanzadeh ◽

Debarun Bhattacharjya ◽

Mark Feblowitz ◽

Kavitha Srinivas ◽

Michael Perrone ◽

...

Keyword(s):

Large Scale ◽

Causal Relation ◽

Relation Extraction ◽

Commercial Application ◽

Causal Knowledge ◽

Text Documents ◽

Semantic Constraints ◽

Enterprise Risk ◽

Supervised Methods ◽

Weakly Supervised

In this demonstration, we present a system for mining causal knowledge from large corpuses of text documents, such as millions of news articles. Our system provides a collection of APIs for causal analysis and retrieval. These APIs enable searching for the effects of a given cause and the causes of a given effect, as well as the analysis of existence of causal relation given a pair of phrases. The analysis includes a score that indicates the likelihood of the existence of a causal relation. It also provides evidence from an input corpus supporting the existence of a causal relation between input phrases. Our system uses generic unsupervised and weakly supervised methods of causal relation extraction that do not impose semantic constraints on causes and effects. We show example use cases developed for a commercial application in enterprise risk management.

Download Full-text