Data-Driven Metaphor Recognition and Explanation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00235 ◽

2013 ◽

Vol 1 ◽

pp. 379-390 ◽

Cited By ~ 9

Author(s):

Hongsong Li ◽

Kenny Q. Zhu ◽

Haixun Wang

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Task ◽

Data Driven ◽

Web Pages ◽

Inference Mechanism ◽

Art Methods ◽

Data Driven Approach ◽

Machine Reading

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.

Download Full-text

A Data-Driven Approach to Infer Knowledge Base Representation for Natural Language Relations

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/163 ◽

2017 ◽

Author(s):

Kangqi Luo ◽

Xusheng Luo ◽

Xianyang Chen ◽

Kenny Q. Zhu

Keyword(s):

Knowledge Representation ◽

Natural Language ◽

Knowledge Base ◽

Search Algorithm ◽

Knowledge Bases ◽

Data Driven ◽

Learning Approach ◽

Data Driven Approach ◽

Structured Knowledge

This paper studies the problem of discovering the structured knowledge representation of binary natural language relations.The representation, known as the schema, generalizes the traditional path of predicates to support more complex semantics.We present a search algorithm to generate schemas over a knowledge base, and propose a data-driven learning approach to discover the most suitable representations to one relation. Evaluation results show that inferred schemas are able to represent precise semantics, and can be used to enrich manually crafted knowledge bases.

Download Full-text

Learning to Transfer Relational Representations through Analogy

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110015 ◽

2019 ◽

Vol 33 ◽

pp. 10015-10016

Author(s):

Gaetano Rossiello ◽

Alfio Gliozzo ◽

Michael Glass

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

The State ◽

Large Set ◽

Relational Information ◽

Siamese Network ◽

Distant Supervision ◽

Novel Approach ◽

Art Methods

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.

Download Full-text

Deep Learning-based Roman-Urdu to Urdu Transliteration

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520017 ◽

2020 ◽

pp. 2152001

Author(s):

Mehreen Alam ◽

Sibt ul Hussain

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Research Problem ◽

Attention Mechanism ◽

Data Driven ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Source Language ◽

Data Driven Approach ◽

Modern Machine

Attention-based encoder-decoder models have superseded conventional techniques due to their unmatched performance on many neural machine translation problems. Usually, the encoders and decoders are two recurrent neural networks where the decoder is directed to focus on relevant parts of the source language using attention mechanism. This data-driven approach leads to generic and scalable solutions with no reliance on manual hand-crafted features. To the best of our knowledge, none of the modern machine translation approaches has been applied to address the research problem of Urdu machine transliteration. Ours is the first attempt to apply the deep neural network-based encoder-decoder using attention mechanism to address the aforementioned problem using Roman-Urdu and Urdu parallel corpus. To this end, we present (i) the first ever Roman-Urdu to Urdu parallel corpus of 1.1 million sentences, (ii) three state of the art encoder-decoder models, and (iii) a detailed empirical analysis of these three models on the Roman-Urdu to Urdu parallel corpus. Overall, attention-based model gives state-of-the-art performance with the benchmark of 70 BLEU score. Our qualitative experimental evaluation shows that our models generate coherent transliterations which are grammatically and logically correct.

Download Full-text

Robust Common Spatial Filters with a Maxmin Approach

Neural Computation ◽

10.1162/neco_a_00544 ◽

2014 ◽

Vol 26 (2) ◽

pp. 349-376 ◽

Cited By ~ 25

Author(s):

Motoaki Kawanabe ◽

Wojciech Samek ◽

Klaus-Robert Müller ◽

Carmen Vidaurre

Keyword(s):

State Of The Art ◽

Minimum Variance ◽

Covariance Matrices ◽

Data Driven ◽

Computer Interface ◽

Variance Ratio ◽

Spatial Filters ◽

Data Set ◽

Data Driven Approach ◽

World Brain

Electroencephalographic signals are known to be nonstationary and easily affected by artifacts; therefore, their analysis requires methods that can deal with noise. In this work, we present a way to robustify the popular common spatial patterns (CSP) algorithm under a maxmin approach. In contrast to standard CSP that maximizes the variance ratio between two conditions based on a single estimate of the class covariance matrices, we propose to robustly compute spatial filters by maximizing the minimum variance ratio within a prefixed set of covariance matrices called the tolerance set. We show that this kind of maxmin optimization makes CSP robust to outliers and reduces its tendency to overfit. We also present a data-driven approach to construct a tolerance set that captures the variability of the covariance matrices over time and shows its ability to reduce the nonstationarity of the extracted features and significantly improve classification accuracy. We test the spatial filters derived with this approach and compare them to standard CSP and a state-of-the-art method on a real-world brain-computer interface (BCI) data set in which we expect substantial fluctuations caused by environmental differences. Finally we investigate the advantages and limitations of the maxmin approach with simulations.

Download Full-text

Principles and practice in verifying rule-based systems

The Knowledge Engineering Review ◽

10.1017/s026988890000624x ◽

1992 ◽

Vol 7 (2) ◽

pp. 115-141 ◽

Cited By ~ 40

Author(s):

Alun D. Preece ◽

Rajjan Shinghal ◽

Aïda Batarekh

Keyword(s):

Expert System ◽

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Order Logic ◽

First Order Logic ◽

Rule Based ◽

First Order ◽

Rule Bases ◽

System Knowledge

AbstractThis paper surveys the verification of expert system knowledge bases by detecting anomalies. Such anomalies are highly indicative of errors in the knowledge base. The paper is in two parts. The first part describes four types of anomaly: redundancy, ambivalence, circularity, and deficiency. We consider rule bases which are based on first-order logic, and explain the anomalies in terms of the syntax and semantics of logic. The second part presents a review of five programs which have been built to detect various subsets of the anomalies. The four anomalies provide a framework for comparing the capabilities of the five tools, and we highlight the strengths and weaknesses of each approach. This paper therefore provides not only a set of underlying principles for performing knowledge base verification through anomaly detection, but also a survey of the state-of-the-art in building practical tools for carrying out such verification. The reader of this paper is expected to be familiar with first-order logic.

Download Full-text

Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

Computational Intelligence and Neuroscience ◽

10.1155/2017/4092135 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17

Author(s):

Chunhua Li ◽

Pengpeng Zhao ◽

Victor S. Sheng ◽

Xuefeng Xian ◽

Jian Wu ◽

...

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Research ◽

Semantic Constraints ◽

Knowledge Base Refinement ◽

Automatic Methods ◽

Automated Algorithms ◽

Research Challenge

Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.

Download Full-text

The Right to be Forgotten in the Media: A Data-Driven Study

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2016-0046 ◽

2016 ◽

Vol 2016 (4) ◽

pp. 389-402 ◽

Cited By ~ 5

Author(s):

Minhui Xue ◽

Gabriel Magno ◽

Evandro Cunha ◽

Virgilio Almeida ◽

Keith W. Ross

Keyword(s):

Content Analysis ◽

Latent Dirichlet Allocation ◽

Third Party ◽

Data Driven ◽

Web Pages ◽

Right To Be Forgotten ◽

Data Driven Approach ◽

The Media ◽

Financial Misconduct ◽

The Right

Abstract Due to the recent “Right to be Forgotten” (RTBF) ruling, for queries about an individual, Google and other search engines now delist links to web pages that contain “inadequate, irrelevant or no longer relevant, or excessive” information about that individual. In this paper we take a data-driven approach to study the RTBF in the traditional media outlets, its consequences, and its susceptibility to inference attacks. First, we do a content analysis on 283 known delisted UK media pages, using both manual investigation and Latent Dirichlet Allocation (LDA). We find that the strongest topic themes are violent crime, road accidents, drugs, murder, prostitution, financial misconduct, and sexual assault. Informed by this content analysis, we then show how a third party can discover delisted URLs along with the requesters’ names, thereby putting the efficacy of the RTBF for delisted media links in question. As a proof of concept, we perform an experiment that discovers two previously-unknown delisted URLs and their corresponding requesters. We also determine 80 requesters for the 283 known delisted media pages, and examine whether they suffer from the “Streisand effect,” a phenomenon whereby an attempt to hide a piece of information has the unintended consequence of publicizing the information more widely. To measure the presence (or lack of presence) of a Streisand effect, we develop novel metrics and methodology based on Google Trends and Twitter data. Finally, we carry out a demographic analysis of the 80 known requesters. We hope the results and observations in this paper can inform lawmakers as they refine RTBF laws in the future.

Download Full-text

Comparison of an interpretable data-driven approach with state of the art classifiers: Application to cardiovascular risk assessment

2017 IEEE 3rd International Forum on Research and Technologies for Society and Industry (RTSI) ◽

10.1109/rtsi.2017.8065960 ◽

2017 ◽

Author(s):

Diana Mendes ◽

Paulo de Carvalho ◽

Jorge Henriques ◽

Simao Paredes ◽

Teresa Rocha ◽

...

Keyword(s):

Risk Assessment ◽

Cardiovascular Risk ◽

State Of The Art ◽

Data Driven ◽

Cardiovascular Risk Assessment ◽

Data Driven Approach

Download Full-text

Semantic Disambiguation and Linking of Quantitative Mentions in Textual Content

International Journal of Semantic Computing ◽

10.1142/s1793351x16500021 ◽

2016 ◽

Vol 10 (01) ◽

pp. 121-142

Author(s):

Mehrnaz Ghashghaei ◽

Ebrahim Bagheri ◽

John Cuzzola ◽

Ali A. Ghorbani ◽

Zeinab Noorian

Keyword(s):

Knowledge Base ◽

Semantic Annotation ◽

Specific Property ◽

Quantitative Information ◽

Knowledge Bases ◽

Web Pages ◽

Quantitative Content ◽

Gold Standard Dataset ◽

Textual Content ◽

Product Specifications

Semantic annotation techniques provide the basis for linking textual content with concepts in well grounded knowledge bases. In spite of their many application areas, current semantic annotation systems have some limitations. One of the prominent limitations of such systems is that none of the existing semantic annotator systems are able to identify and disambiguate quantitative (numerical) content. In textual documents such as Web pages, specially technical contents, there are many quantitative information such as product specifications that need to be semantically qualified. In this paper, we propose an approach for annotating quantitative values in short textual content. In our approach, we identify numeric values in the text and link them to an existing property in a knowledge base. Based on this mapping, we are then able to find the concept that the property is associated with, whereby identifying both the concept and the specific property of that concept that the numeric value belongs to. Results obtained from the developed gold standard dataset show that the proposed automated semantic annotation platform is quite effective in detecting and disambiguating numerical content, and connecting them to associated properties on the external knowledge base. Our experiments show that our proposed approach is able to reach an accuracy of over 70% for semantically annotating quantitative content.

Download Full-text

Using Semantics and Statistics to Turn Data into Knowledge

AI Magazine ◽

10.1609/aimag.v36i1.2568 ◽

2015 ◽

Vol 36 (1) ◽

pp. 65-74 ◽

Cited By ~ 9

Author(s):

Jay Pujara ◽

Hui Miao ◽

Lise Getoor ◽

William W. Cohen

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Relational Learning ◽

Statistical Relational Learning ◽

Knowledge Bases ◽

Knowledge Graph ◽

Learning Framework ◽

Knowledge Base Construction ◽

Order Of Magnitude ◽

Soft Logic

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.

Download Full-text