Enriching Knowledge Base by Parse Tree Pattern and Semantic Filter

Hee-Geun Yoon; Seyoung Park; Seong-Bae Park

doi:10.3390/app10186209

Enriching Knowledge Base by Parse Tree Pattern and Semantic Filter

Applied Sciences ◽

10.3390/app10186209 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6209

Author(s):

Hee-Geun Yoon ◽

Seyoung Park ◽

Seong-Bae Park

Keyword(s):

Knowledge Base ◽

Word Embedding ◽

Parse Tree ◽

Long Distance ◽

Tree Pattern ◽

Tree Patterns ◽

Average Accuracy

This paper proposes a simple knowledge base enrichment based on parse tree patterns with a semantic filter. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, the proposed semantic filter, which is a combination of WordNet-based similarity and word embedding similarity, removes parse tree patterns that are semantically irrelevant to the meaning of a target relation. According to our experiments using the DBpedia ontology and Wikipedia corpus, the average accuracy of the top 100 parse tree patterns for ten relations is 68%, which is 16% higher than that of lexical patterns, and the average accuracy of the newly extracted triples is 60.1%. These results prove that the proposed method produces more relevant patterns for the relations of seed knowledge, and thus more accurate triples are generated by the patterns.

Download Full-text

Word-Order Issues in English-to-Urdu Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0007-0 ◽

2011 ◽

Vol 95 (1) ◽

pp. 87-106 ◽

Cited By ~ 3

Author(s):

Bushra Jawaid ◽

Daniel Zeman

Keyword(s):

Machine Translation ◽

Word Order ◽

Statistical Machine Translation ◽

Parse Tree ◽

Hard Problem ◽

Long Distance ◽

Translation Process ◽

English Sentence ◽

European Languages

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

Dual CNN for Relation Extraction with Knowledge-Based Attention and Word Embeddings

Computational Intelligence and Neuroscience ◽

10.1155/2019/6789520 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Jun Li ◽

Guimin Huang ◽

Jianheng Chen ◽

Yabing Wang

Keyword(s):

Knowledge Base ◽

Relation Extraction ◽

Background Knowledge ◽

Word Embedding ◽

Entity Recognition ◽

Instance Selection ◽

Attention Model ◽

Training Tool ◽

Knowledge Based ◽

Proposed Model

Relation extraction is the underlying critical task of textual understanding. However, the existing methods currently have defects in instance selection and lack background knowledge for entity recognition. In this paper, we propose a knowledge-based attention model, which can make full use of supervised information from a knowledge base, to select an entity. We also design a method of dual convolutional neural networks (CNNs) considering the word embedding of each word is restricted by using a single training tool. The proposed model combines a CNN with an attention mechanism. The model inserts the word embedding and supervised information from the knowledge base into the CNN, performs convolution and pooling, and combines the knowledge base and CNN in the full connection layer. Based on these processes, the model not only obtains better entity representations but also improves the performance of relation extraction with the help of rich background knowledge. The experimental results demonstrate that the proposed model achieves competitive performance.

Download Full-text

QLUT at SemEval-2017 Task 2: Word Similarity Based on Word Embedding and Knowledge Base

10.18653/v1/s17-2036 ◽

2017 ◽

Cited By ~ 1

Author(s):

Fanqing Meng ◽

Wenpeng Lu ◽

Yuteng Zhang ◽

Ping Jian ◽

Shumin Shi ◽

...

Keyword(s):

Knowledge Base ◽

Word Embedding ◽

Word Similarity

Download Full-text

Continual Word Embedding Based for Matching Lightweight Ontologies

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.6281 ◽

2014 ◽

Vol 556-562 ◽

pp. 6281-6285

Author(s):

Zhen Le Wu ◽

Ying Li ◽

Yong Bin Wang ◽

Yan Jiao Zang

Keyword(s):

Knowledge Base ◽

Heterogeneous Data ◽

Word Embedding ◽

Ontology Matching ◽

Ontology Alignment ◽

Ontology Alignment Evaluation Initiative

Ontology matching is the task of finding alignments between two different ontologies. It has become the key point of building knowledge base and integrating heterogeneous data. In this paper, a novel ontology matching approach that is based on continual word embedding is proposed. We describe in details how is skip-gram model adapted to capture the semantic of words to learn the word embedding. After computing the name similarity of concepts, similarity flooding algorithm is used to fix the initial similarity. Experiments on Ontology Alignment Evaluation Initiative (OAEI) benchmark without instances show that the proposed method significantly improves the quality of mappings.

Download Full-text

Expanding Our Understanding of COVID-19 from Biomedical Literature Using Word Embedding

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18063005 ◽

2021 ◽

Vol 18 (6) ◽

pp. 3005

Author(s):

Heyoung Yang ◽

Eunsoo Sohn

Keyword(s):

Knowledge Base ◽

Biomedical Literature ◽

Word Embedding ◽

Pharmaceutical Companies ◽

Biomedical Knowledge ◽

Health Crisis ◽

The Past ◽

Related Information ◽

Embedding Technique ◽

Clear Solution

A better understanding of the clinical characteristics of coronavirus disease 2019 (COVID-19) is urgently required to address this health crisis. Numerous researchers and pharmaceutical companies are working on developing vaccines and treatments; however, a clear solution has yet to be found. The current study proposes the use of artificial intelligence methods to comprehend biomedical knowledge and infer the characteristics of COVID-19. A biomedical knowledge base was established via FastText, a word embedding technique, using PubMed literature from the past decade. Subsequently, a new knowledge base was created using recently published COVID-19 articles. Using this newly constructed knowledge base from the word embedding model, a list of anti-infective drugs and proteins of either human or coronavirus origin were inferred to be related, because they are located close to COVID-19 on the knowledge base. This study attempted to form a method to quickly infer related information about COVID-19 using the existing knowledge base, before sufficient knowledge about COVID-19 is accumulated. With COVID-19 not completely overcome, machine learning-based research in the PubMed literature will provide a broad guideline for researchers and pharmaceutical companies working on treatments for COVID-19.

Download Full-text

Indexing ordered trees for (nonlinear) tree pattern matching by pushdown automata

Computer Science and Information Systems ◽

10.2298/csis111220024t ◽

2012 ◽

Vol 9 (3) ◽

pp. 1125-1153

Author(s):

J. Travnícek ◽

J. Janousek ◽

B. Melichar

Keyword(s):

Pattern Matching ◽

Data Structures ◽

Input Pattern ◽

Pushdown Automaton ◽

Ordered Trees ◽

Tree Pattern ◽

Tree Patterns ◽

Tree Pattern Matching ◽

Ordered Tree ◽

Pushdown Automata

Trees are one of the fundamental data structures used in Computer Science. We present a new kind of acyclic pushdown automata, the tree pattern pushdown automaton and the nonlinear tree pattern pushdown automaton, constructed for an ordered tree. These automata accept all tree patterns and nonlinear tree patterns, respectively, which match the tree and represent a full index of the tree for such patterns. Given a tree with n nodes, the numbers of these distinct tree patterns and nonlinear tree patterns can be at most 2n?1 +n and at most (2+v)n?1+2, respectively, where v is the maximal number of nonlinear variables allowed in nonlinear tree patterns. The total sizes of nondeterministic versions of the two pushdown automata are O(n) and O(n2), respectively. We discuss the time complexities and show timings of our implementations using the bit-parallelism technique. The timings show that for a given tree the running time is linear to the size of the input pattern.

Download Full-text

Context-aware Path Ranking for Knowledge Base Completion

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/166 ◽

2017 ◽

Cited By ~ 2

Author(s):

Sahisnu Mazumder ◽

Bing Liu

Keyword(s):

Random Walk ◽

Knowledge Base ◽

Predictive Performance ◽

Word Embedding ◽

Experimental Results ◽

Context Aware ◽

Exploration Strategy

Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity-pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems by introducing a selective path exploration strategy. C-PR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by C-PR not only improve predictive performance but also are more interpretable than existing baselines.

Download Full-text

Non-Contiguous Pattern Avoidance in Binary Trees

The Electronic Journal of Combinatorics ◽

10.37236/2099 ◽

2012 ◽

Vol 19 (3) ◽

Cited By ~ 3

Author(s):

Michael Dairyko ◽

Lara Pudwell ◽

Samantha Tyner ◽

Casey Wynn

Keyword(s):

Positive Integer ◽

Generating Function ◽

Binary Tree ◽

Pattern Avoidance ◽

Binary Trees ◽

Tree Pattern ◽

Tree Patterns ◽

Number Of Leaves ◽

Pattern Avoiding Permutations

In this paper we consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by computing closed formulas for the number of trees avoiding a single binary tree pattern with 4 or fewer leaves and compare these results to analogous work for contiguous tree patterns. Next, we give an explicit generating function that counts binary trees avoiding a single non-contiguous tree pattern according to number of leaves and show that there is exactly one Wilf class of k-leaf tree patterns for any positive integer k. In addition, we give a bijection between between certain sets of pattern-avoiding trees and sets of pattern-avoiding permutations. Finally, we enumerate binary trees that simultaneously avoid more than one tree pattern.

Download Full-text

The Fine Structure of Phloem Cells

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100119867 ◽

1985 ◽

Vol 43 ◽

pp. 632-635

Author(s):

James Cronshaw

Keyword(s):

Structural Studies ◽

High Concentration ◽

Long Distance ◽

Phloem Tissue ◽

Sieve Elements ◽

Long Distance Transport ◽

P Protein ◽

Companion Cells ◽

Electron Microscopical ◽

Distance Transport

Long distance transport in plants takes place in phloem tissue which has characteristic cells, the sieve elements. At maturity these cells have sieve areas in their end walls with specialized perforations. They are associated with companion cells, parenchyma cells, and in some species, with transfer cells. The protoplast of the functioning sieve element contains a high concentration of sugar, and consequently a high hydrostatic pressure, which makes it extremely difficult to fix mature sieve elements for electron microscopical observation without the formation of surge artifacts. Despite many structural studies which have attempted to prevent surge artifacts, several features of mature sieve elements, such as the distribution of P-protein and the nature of the contents of the sieve area pores, remain controversial.

Download Full-text

Assessing the Knowledge Base of Students: An Information Theoretic Approach to Testing

Measurement and Evaluation in Counseling and Development ◽

10.1080/07481756.1986.12022831 ◽

1986 ◽

Vol 19 (3) ◽

pp. 116-130 ◽

Cited By ~ 10

Author(s):

James E. Bruno

Keyword(s):

Knowledge Base ◽

Theoretic Approach ◽

Information Theoretic ◽

Information Theoretic Approach

Download Full-text