XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and Application

Hailong Jin; Chengjiang Li; Jing Zhang; Lei Hou; Juanzi Li; Peng Zhang

doi:10.1162/dint_a_00003

XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and Application

Data Intelligence ◽

10.1162/dint_a_00003 ◽

2019 ◽

Vol 1 (1) ◽

pp. 77-98 ◽

Cited By ~ 3

Author(s):

Hailong Jin ◽

Chengjiang Li ◽

Jing Zhang ◽

Lei Hou ◽

Juanzi Li ◽

...

Keyword(s):

Large Scale ◽

Knowledge Bases ◽

Type Inference ◽

Knowledge Graph ◽

Entity Linking ◽

Fine Grained ◽

Cross Lingual

Knowledge bases (KBs) are often greatly incomplete, necessitating a demand for KB completion. Although XLORE is an English-Chinese bilingual knowledge graph, there are only 423,974 cross-lingual links between English instances and Chinese instances. We present XLORE2, an extension of the XLORE that is built automatically from Wikipedia, Baidu Baike and Hudong Baike. We add more facts by making cross-lingual knowledge linking, cross-lingual property matching and fine-grained type inference. We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.

Download Full-text

Modelling and Factorizing Large-Scale Knowledge Graph (DBPedia) for Fine-Grained Entity Type Inference

Lecture Notes in Computer Science - Databases Theory and Applications ◽

10.1007/978-3-030-69377-0_17 ◽

2021 ◽

pp. 204-219

Author(s):

A. B. M. Moniruzzaman

Keyword(s):

Large Scale ◽

Type Inference ◽

Knowledge Graph ◽

Fine Grained

Download Full-text

OpenKG Chain: A Blockchain Infrastructure for Open Knowledge Graphs

Data Intelligence ◽

10.1162/dint_a_00095 ◽

2021 ◽

pp. 1-18

Author(s):

Huajun Chen ◽

Ning Hu ◽

Guilin Qi ◽

Haofen Wang ◽

Zhen Bi ◽

...

Keyword(s):

Open Data ◽

Knowledge Bases ◽

Knowledge Graph ◽

Cumulative Number ◽

Trust Network ◽

Fine Grained ◽

Knowledge Based ◽

Granularity Level ◽

Knowledge Graphs ◽

Structured Graphs

Abstract The early concept of knowledge graph originates from the idea of the Semantic Web, which aims at using structured graphs to model the knowledge of the world and record the relationships that exist between things. Currently publishing knowledge bases as open data on the Web has gained significant attention. In China, CIPS(Chinese Information Processing Society) launched the OpenKG in 2015 to foster the development of Chinese Open Knowledge Graphs. Unlike existing open knowledge-based programs, OpenKG chain is envisioned as a blockchain-based open knowledge infrastructure. This article introduces the first attempt at the implementation of sharing knowledge graphs on OpenKG chain, a blockchain-based trust network. We have completed the test of the underlying blockchain platform, as well as the on-chain test of OpenKG's dataset and toolset sharing as well as fine-grained knowledge crowdsourcing at the triple level. We have also proposed novel definitions: K-Point and OpenKG Token, which can be considered as a measurement of knowledge value and user value. 1033 knowledge contributors have been involved in two months of testing on the blockchain, and the cumulative number of on-chain recordings triggered by real knowledge consumers has reached 550,000 with an average daily peak value of more than 10,000. For the first time, We have tested and realized on-chain sharing of knowledge at entity/triple granularity level. At present, all operations on the datasets and toolset in OpenKG.CN, as well as the triplets in OpenBase, are recorded on the chain, and corresponding value will also be generated and assigned in a trusted mode. Via this effort, OpenKG chain looks to provide a more credible and traceable knowledge-sharing platform for the knowledge graph community.

Download Full-text

A large and evolving cognate database

Language Resources and Evaluation ◽

10.1007/s10579-021-09544-6 ◽

2021 ◽

Author(s):

Khuyagbaatar Batsuren ◽

Gábor Bella ◽

Fausto Giunchiglia

Keyword(s):

Quantitative Analysis ◽

Transfer Learning ◽

Large Scale ◽

Knowledge Bases ◽

Language Diversity ◽

Writing Systems ◽

Current Version ◽

Cross Lingual

AbstractWe present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.

Download Full-text

Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/209 ◽

2017 ◽

Cited By ~ 42

Author(s):

Muhao Chen ◽

Yingtao Tian ◽

Mohan Yang ◽

Carlo Zaniolo

Keyword(s):

Knowledge Base ◽

Knowledge Bases ◽

Loss Functions ◽

Knowledge Graph ◽

Graph Embeddings ◽

Linear Transformations ◽

Human Labor ◽

Knowledge Graphs ◽

Cross Lingual ◽

Entity Relationships

Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable cross-lingual alignment by human labor is very costly and error-prone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart.

Download Full-text

Knowledge-driven drug repurposing using a comprehensive drug knowledge graph

Health Informatics Journal ◽

10.1177/1460458220937101 ◽

2020 ◽

Vol 26 (4) ◽

pp. 2737-2750 ◽

Cited By ~ 2

Author(s):

Yongjun Zhu ◽

Chao Che ◽

Bo Jin ◽

Ningrui Zhang ◽

Chang Su ◽

...

Keyword(s):

Large Scale ◽

De Novo ◽

Rapid Development ◽

Drug Repurposing ◽

Data Representation ◽

Knowledge Bases ◽

Knowledge Graph ◽

Multiple Drug ◽

Treatment Information ◽

Drug Knowledge

Due to the huge costs associated with new drug discovery and development, drug repurposing has become an important complement to the traditional de novo approach. With the increasing number of public databases and the rapid development of analytical methodologies, computational approaches have gained great momentum in the field of drug repurposing. In this study, we introduce an approach to knowledge-driven drug repurposing based on a comprehensive drug knowledge graph. We design and develop a drug knowledge graph by systematically integrating multiple drug knowledge bases. We describe path- and embedding-based data representation methods of transforming information in the drug knowledge graph into valuable inputs to allow machine learning models to predict drug repurposing candidates. The evaluation demonstrates that the knowledge-driven approach can produce high predictive results for known diabetes mellitus treatments by only using treatment information on other diseases. In addition, this approach supports exploratory investigation through the review of meta paths that connect drugs with diseases. This knowledge-driven approach is an effective drug repurposing strategy supporting large-scale prediction and the investigation of case studies.

Download Full-text

Error Detection in a Large-Scale Lexical Taxonomy

Information ◽

10.3390/info11020097 ◽

2020 ◽

Vol 11 (2) ◽

pp. 97 ◽

Cited By ~ 1

Author(s):

Yinan An ◽

Sifan Liu ◽

Hongzhi Wang

Keyword(s):

Artificial Intelligence ◽

Knowledge Base ◽

Error Detection ◽

Large Scale ◽

Knowledge Bases ◽

Experimental Results ◽

Knowledge Graph ◽

Significant Challenge

Knowledge base (KB) is an important aspect in artificial intelligence. One significant challenge faced by KB construction is that it contains many noises, which prevent its effective usage. Even though some KB cleansing algorithms have been proposed, they focus on the structure of the knowledge graph and neglect the relation between the concepts, which could be helpful to discover wrong relations in KB. Motived by this, we measure the relation of two concepts by the distance between their corresponding instances and detect errors within the intersection of the conflicting concept sets. For efficient and effective knowledge base cleansing, we first apply a distance-based model to determine the conflicting concept sets using two different methods. Then, we propose and analyze several algorithms on how to detect and repair the errors based on our model, where we use a hash method for an efficient way to calculate distance. Experimental results demonstrate that the proposed approaches could cleanse the knowledge bases efficiently and effectively.

Download Full-text

LATTE: Latent Type Modeling for Biomedical Entity Linking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6526 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9757-9764

Author(s):

Ming Zhu ◽

Busra Celikkaya ◽

Parminder Bhatia ◽

Chandan K. Reddy

Keyword(s):

Large Scale ◽

Biomedical Literature ◽

Biomedical Domain ◽

Entity Linking ◽

Fine Grained ◽

Performance Improvements ◽

Entity Disambiguation ◽

Clinical Records ◽

Type Information ◽

Latent Type

Entity linking is the task of linking mentions of named entities in natural language text, to entities in a curated knowledge-base. This is of significant importance in the biomedical domain, where it could be used to semantically annotate a large volume of clinical records and biomedical literature, to standardized concepts described in an ontology such as Unified Medical Language System (UMLS). We observe that with precise type information, entity disambiguation becomes a straightforward task. However, fine-grained type information is usually not available in biomedical domain. Thus, we propose LATTE, a LATent Type Entity Linking model, that improves entity linking by modeling the latent fine-grained type information about mentions and entities. Unlike previous methods that perform entity linking directly between the mentions and the entities, LATTE jointly does entity disambiguation, and latent fine-grained type learning, without direct supervision. We evaluate our model on two biomedical datasets: MedMentions, a large scale public dataset annotated with UMLS concepts, and a de-identified corpus of dictated doctor's notes that has been annotated with ICD concepts. Extensive experimental evaluation shows our model achieves significant performance improvements over several state-of-the-art techniques.

Download Full-text

Drug-Drug Interactions Prediction via Knowledge Graph and Text Embedding (Preprint)

10.2196/preprints.28277 ◽

2021 ◽

Author(s):

Meng Wang ◽

Haofen Wang ◽

Xing Liu ◽

Xinyu Ma ◽

Beilun Wang

Keyword(s):

Drug Interactions ◽

Large Scale ◽

Dimensional Space ◽

Knowledge Bases ◽

Knowledge Graph ◽

Drug Knowledge ◽

Real World Datasets ◽

Low Dimensional ◽

Different Sources ◽

Opening Up

UNSTRUCTURED Minimizing adverse reactions caused by drug-drug interactions has always been a momentous research topic in clinical pharmacology. Detecting all possible interactions through clinical studies before a drug is released to the market is a demanding task. The power of big data is opening up new approaches to discover various drug-drug interactions. However, these discoveries contain a huge amount of noise and provide knowledge bases far from complete and trustworthy ones to be utilized. Most existing studies focus on predicting binary drug-drug interactions between drug pairs and ignore other interactions. In this paper, we propose a novel framework, called PRD, to predict drug-drug interactions. The framework uses the graph embedding that can overcome data incompleteness and sparsity issues to achieve multiple DDI label prediction. First, a large-scale drug knowledge graph is generated from different sources. Then, the knowledge graph is embedded with comprehensive biomedical text into a common low dimensional space. Finally, the learned embeddings are used to efficiently compute rich DDI information through a link prediction process. To validate the effectiveness of the proposed framework, extensive experiments were conducted on real-world datasets. The results demonstrate that our model outperforms several state-of-the-art baseline methods in terms of capability and accuracy.

Download Full-text

Strategic resources and smallholder performance at the bottom of the pyramid

International Food and Agribusiness Management Review ◽

10.22434/ifamr2018.0111 ◽

2019 ◽

Vol 22 (3) ◽

pp. 365-380 ◽

Cited By ~ 1

Author(s):

Matthias Olthaar ◽

Wilfred Dolfsma ◽

Clemens Lutz ◽

Florian Noseleit

Keyword(s):

Large Scale ◽

Business Environment ◽

Agricultural Economics ◽

Local Market ◽

Bottom Of The Pyramid ◽

Fine Grained ◽

Strategic Resources ◽

Relative Contribution ◽

The Relationship ◽

Resource Based Theory

In a competitive business environment at the Bottom of the Pyramid smallholders supplying global value chains may be thought to be at the whims of downstream large-scale players and local market forces, leaving no room for strategic entrepreneurial behavior. In such a context we test the relationship between the use of strategic resources and firm performance. We adopt the Resource Based Theory and show that seemingly homogenous smallholders deploy resources differently and, consequently, some do outperform others. We argue that the ‘resource-based theory’ results in a more fine-grained understanding of smallholder performance than approaches generally applied in agricultural economics. We develop a mixed-method approach that allows one to pinpoint relevant, industry-specific resources, and allows for empirical identification of the relative contribution of each resource to competitive advantage. The results show that proper use of quality labor, storage facilities, time of selling, and availability of animals are key capabilities.

Download Full-text

Inverse Infiltration Modeling of Dike Covers Made of Dredged Material Using PEST and AMALGAM

Geosciences ◽

10.3390/geosciences11020041 ◽

2021 ◽

Vol 11 (2) ◽

pp. 41

Author(s):

Tim Jurisch ◽

Stefan Cantré ◽

Fokke Saathoff

Keyword(s):

Large Scale ◽

Measurement Data ◽

Fine Grained ◽

Dredged Materials ◽

Measurement Results ◽

Large Scale Field ◽

Eigenvalue Ratio ◽

Ill Posed ◽

Materials Used ◽

Installation Technology

A variety of studies recently proved the applicability of different dried, fine-grained dredged materials as replacement material for erosion-resistant sea dike covers. In Rostock, Germany, a large-scale field experiment was conducted, in which different dredged materials were tested with regard to installation technology, stability, turf development, infiltration, and erosion resistance. The infiltration experiments to study the development of a seepage line in the dike body showed unexpected measurement results. Due to the high complexity of the problem, standard geo-hydraulic models proved to be unable to analyze these results. Therefore, different methods of inverse infiltration modeling were applied, such as the parameter estimation tool (PEST) and the AMALGAM algorithm. In the paper, the two approaches are compared and discussed. A sensitivity analysis proved the presumption of a non-linear model behavior for the infiltration problem and the Eigenvalue ratio indicates that the dike infiltration is an ill-posed problem. Although this complicates the inverse modeling (e.g., termination in local minima), parameter sets close to an optimum were found with both the PEST and the AMALGAM algorithms. Together with the field measurement data, this information supports the rating of the effective material properties of the applied dredged materials used as dike cover material.

Download Full-text