FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion

Hongming Zhu; Xiaowen Wang; Yizhi Jiang; Hongfei Fan; Bowen Du; Qin Liu

doi:10.3390/e23050602

FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion

Entropy ◽

10.3390/e23050602 ◽

2021 ◽

Vol 23 (5) ◽

pp. 602

Author(s):

Hongming Zhu ◽

Xiaowen Wang ◽

Yizhi Jiang ◽

Hongfei Fan ◽

Bowen Du ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Linear Time ◽

Knowledge Graph ◽

Multiple Objects ◽

Instance Matching ◽

Distributed Framework ◽

Knowledge Graphs ◽

Real World Datasets ◽

Data Collections

Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.

Download Full-text

Bootstrapping Entity Alignment with Knowledge Graph Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/611 ◽

2018 ◽

Cited By ~ 35

Author(s):

Zequn Sun ◽

Wei Hu ◽

Qingheng Zhang ◽

Yuzhong Qu

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Training Data ◽

Knowledge Graph ◽

Error Accumulation ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Embedding-based entity alignment represents different knowledge graphs (KGs) as low-dimensional embeddings and finds entity alignment by measuring the similarities between entity embeddings. Existing approaches have achieved promising results, however, they are still challenged by the lack of enough prior alignment as labeled training data. In this paper, we propose a bootstrapping approach to embedding-based entity alignment. It iteratively labels likely entity alignment as training data for learning alignment-oriented KG embeddings. Furthermore, it employs an alignment editing method to reduce error accumulation during iterations. Our experiments on real-world datasets showed that the proposed approach significantly outperformed the state-of-the-art embedding-based ones for entity alignment. The proposed alignment-oriented KG embedding, bootstrapping process and alignment editing method all contributed to the performance improvement.

Download Full-text

Multi-view Knowledge Graph Embedding for Entity Alignment

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/754 ◽

2019 ◽

Cited By ~ 22

Author(s):

Qingheng Zhang ◽

Zequn Sun ◽

Wei Hu ◽

Muhao Chen ◽

Lingbing Guo ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Knowledge Graph ◽

Multiple Views ◽

Combination Strategies ◽

Knowledge Graphs ◽

Real World Datasets ◽

Inference Methods

We study the problem of embedding-based entity alignment between knowledge graphs (KGs). Previous works mainly focus on the relational structure of entities. Some further incorporate another type of features, such as attributes, for refinement. However, a vast of entity features are still unexplored or not equally treated together, which impairs the accuracy and robustness of embedding-based entity alignment. In this paper, we propose a novel framework that unifies multiple views of entities to learn embeddings for entity alignment. Specifically, we embed entities based on the views of entity names, relations and attributes, with several combination strategies. Furthermore, we design some cross-KG inference methods to enhance the alignment between two KGs. Our experiments on real-world datasets show that the proposed framework significantly outperforms the state-of-the-art embedding-based entity alignment methods. The selected views, cross-KG inference and combination strategies all contribute to the performance improvement.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Attention Enhanced Serial Unet++ Network for Removing Unevenly Distributed Haze

Electronics ◽

10.3390/electronics10222868 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2868

Author(s):

Wenxuan Zhao ◽

Yaqin Zhao ◽

Liqi Feng ◽

Jiaxi Tang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real World ◽

Large Scale ◽

Learning Strategy ◽

Contextual Information ◽

Small Scale ◽

Image Dehazing ◽

Atmospheric Scattering ◽

Real World Datasets

The purpose of image dehazing is the reduction of the image degradation caused by suspended particles for supporting high-level visual tasks. Besides the atmospheric scattering model, convolutional neural network (CNN) has been used for image dehazing. However, the existing image dehazing algorithms are limited in face of unevenly distributed haze and dense haze in real-world scenes. In this paper, we propose a novel end-to-end convolutional neural network called attention enhanced serial Unet++ dehazing network (AESUnet) for single image dehazing. We attempt to build a serial Unet++ structure that adopts a serial strategy of two pruned Unet++ blocks based on residual connection. Compared with the simple Encoder–Decoder structure, the serial Unet++ module can better use the features extracted by encoders and promote contextual information fusion in different resolutions. In addition, we take some improvement measures to the Unet++ module, such as pruning, introducing the convolutional module with ResNet structure, and a residual learning strategy. Thus, the serial Unet++ module can generate more realistic images with less color distortion. Furthermore, following the serial Unet++ blocks, an attention mechanism is introduced to pay different attention to haze regions with different concentrations by learning weights in the spatial domain and channel domain. Experiments are conducted on two representative datasets: the large-scale synthetic dataset RESIDE and the small-scale real-world datasets I-HAZY and O-HAZY. The experimental results show that the proposed dehazing network is not only comparable to state-of-the-art methods for the RESIDE synthetic datasets, but also surpasses them by a very large margin for the I-HAZY and O-HAZY real-world dataset.

Download Full-text

Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6166 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6853-6860

Author(s):

Xuchao Zhang ◽

Xian Wu ◽

Fanglan Chen ◽

Liang Zhao ◽

Chang-Tien Lu

Keyword(s):

Real World ◽

Large Scale ◽

Learning Algorithm ◽

Noisy Data ◽

Training Set ◽

Robust Learning ◽

Robust Model ◽

Small Set ◽

Real World Datasets ◽

Theoretical Analyses

The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.

Download Full-text

Entity Alignment between Knowledge Graphs Using Attribute Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301297 ◽

2019 ◽

Vol 33 ◽

pp. 297-304 ◽

Cited By ~ 26

Author(s):

Bayu Distiawan Trisedya ◽

Jianzhong Qi ◽

Rui Zhang

Keyword(s):

Real World ◽

Graph Embedding ◽

Knowledge Bases ◽

Knowledge Graph ◽

World Knowledge ◽

Large Numbers ◽

Proposed Model ◽

Alignment Task ◽

Transitivity Rule ◽

Knowledge Graphs

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

Knowledge Graphs

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73796 ◽

2021 ◽

Vol 5 ◽

Author(s):

Roderic Page

Keyword(s):

Knowledge Management ◽

Large Scale ◽

Personal Knowledge ◽

Knowledge Graph ◽

Specific Knowledge ◽

Management Tools ◽

Global Projects ◽

Knowledge Graphs ◽

Constructing Knowledge ◽

Knowledge Management Tools

Knowledge graphs embody the idea of "everything connected to everything else." As attractive as this seems, there is a substantial gap between the dream of fully interconnected knowledge and the reality of data that is still mostly siloed, or weakly connected by shared strings such as taxonomic names. How do we move forward? Do we focus on building our own domain- or project-specific knowledge graphs, or do we engage with global projects such as Wikidata? Do we construct knowledge graphs, or focus on making our data "knowledge graph ready" by adopting structured markup in the hope that knowledge graphs will spontaneously self-assemble from that data? Do we focus on large-scale, database-driven projects (e.g., triple stores in the cloud), or do we rely on more localised and distributed approaches, such as annotations (e.g., hypothes.is), "content-hash" systems where a cryptographic hash of the data is also its identifier (Elliott et al. 2020), or the growing number of personal knowledge management tools (e.g., Roam, Obsidian, LogSeq)? This talk will share experiences (the good, bad, and the ugly) as I have tried to transition from naïve advocacy to constructing knowledge graphs (Page 2019), or participating in their construction (Page 2021).

Download Full-text

Large-Scale Heterogeneous Feature Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013878 ◽

2019 ◽

Vol 33 ◽

pp. 3878-3885 ◽

Cited By ~ 5

Author(s):

Xiao Huang ◽

Qingquan Song ◽

Fan Yang ◽

Xia Hu

Keyword(s):

Real World ◽

Large Scale ◽

Single Type ◽

Heterogeneous Information ◽

Multiview Learning ◽

Efficiency And Effectiveness ◽

Joint Embedding ◽

Real World Datasets ◽

Low Dimensional ◽

Vector Representations

Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various offthe-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.

Download Full-text

End-to-End Argumentation Knowledge Graph Construction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6231 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7367-7374

Author(s):

Khalid Al-Khatib ◽

Yufang Hou ◽

Henning Wachsmuth ◽

Charles Jochim ◽

Francesca Bonin ◽

...

Keyword(s):

Large Scale ◽

Question Answering ◽

Knowledge Graph ◽

Exploratory Search ◽

Text Generation ◽

Fake News ◽

High Quality ◽

Web Based ◽

Knowledge Graphs ◽

End To End

This paper studies the end-to-end construction of an argumentation knowledge graph that is intended to support argument synthesis, argumentative question answering, or fake news detection, among others. The study is motivated by the proven effectiveness of knowledge graphs for interpretable and controllable text generation and exploratory search. Original in our work is that we propose a model of the knowledge encapsulated in arguments. Based on this model, we build a new corpus that comprises about 16k manual annotations of 4740 claims with instances of the model's elements, and we develop an end-to-end framework that automatically identifies all modeled types of instances. The results of experiments show the potential of the framework for building a web-based argumentation graph that is of high quality and large scale.

Download Full-text