Syntax-aware Semantic Role Labeling without Parsing

Are Noisy Sentences Useless for Distant Supervised Relation Extraction?

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6407 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8799-8806

Author(s):

Yuming Shang ◽

He-Yan Huang ◽

Xian-Ling Mao ◽

Xin Sun ◽

Wei Wei

Keyword(s):

Selective Attention ◽

State Of The Art ◽

Relation Extraction ◽

The State ◽

Benchmark Dataset ◽

Experimental Results ◽

Training Data ◽

High Confidence ◽

Feature Representations ◽

Novel Method

The noisy labeling problem has been one of the major obstacles for distant supervised relation extraction. Existing approaches usually consider that the noisy sentences are useless and will harm the model's performance. Therefore, they mainly alleviate this problem by reducing the influence of noisy sentences, such as applying bag-level selective attention or removing noisy sentences from sentence-bags. However, the underlying cause of the noisy labeling problem is not the lack of useful information, but the missing relation labels. Intuitively, if we can allocate credible labels for noisy sentences, they will be transformed into useful training data and benefit the model's performance. Thus, in this paper, we propose a novel method for distant supervised relation extraction, which employs unsupervised deep clustering to generate reliable labels for noisy sentences. Specifically, our model contains three modules: a sentence encoder, a noise detector and a label generator. The sentence encoder is used to obtain feature representations. The noise detector detects noisy sentences from sentence-bags, and the label generator produces high-confidence relation labels for noisy sentences. Extensive experimental results demonstrate that our model outperforms the state-of-the-art baselines on a popular benchmark dataset, and can indeed alleviate the noisy labeling problem.

Download Full-text

Selectional Preferences for Semantic Role Classification

Computational Linguistics ◽

10.1162/coli_a_00145 ◽

2013 ◽

Vol 39 (3) ◽

pp. 631-663 ◽

Cited By ~ 5

Author(s):

Beñat Zapirain ◽

Eneko Agirre ◽

Lluís Màrquez ◽

Mihai Surdeanu

Keyword(s):

Error Analysis ◽

State Of The Art ◽

Error Reduction ◽

Semantic Role ◽

Semantic Role Labeling ◽

Syntactic Information ◽

Open Issue ◽

Selectional Preferences ◽

Post Hoc ◽

Labeling System

This paper focuses on a well-known open issue in Semantic Role Classification (SRC) research: the limited influence and sparseness of lexical features. We mitigate this problem using models that integrate automatically learned selectional preferences (SP). We explore a range of models based on WordNet and distributional-similarity SPs. Furthermore, we demonstrate that the SRC task is better modeled by SP models centered on both verbs and prepositions, rather than verbs alone. Our experiments with SP-based models in isolation indicate that they outperform a lexical baseline with 20 F1 points in domain and almost 40 F1 points out of domain. Furthermore, we show that a state-of-the-art SRC system extended with features based on selectional preferences performs significantly better, both in domain (17% error reduction) and out of domain (13% error reduction). Finally, we show that in an end-to-end semantic role labeling system we obtain small but statistically significant improvements, even though our modified SRC model affects only approximately 4% of the argument candidates. Our post hoc error analysis indicates that the SP-based features help mostly in situations where syntactic information is either incorrect or insufficient to disambiguate the correct role.

Download Full-text

Text Rewriting Improves Semantic Role Labeling

Journal of Artificial Intelligence Research ◽

10.1613/jair.4431 ◽

2014 ◽

Vol 51 ◽

pp. 133-164 ◽

Cited By ~ 1

Author(s):

K. Woodsend ◽

M. Lapata

Keyword(s):

Gold Standard ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Semantic Role ◽

Semantic Role Labeling ◽

Comparable Corpora ◽

Rewrite Rules ◽

Model Training

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

Download Full-text

Text Rewriting Improves Semantic Role Labeling (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/729 ◽

2017 ◽

Author(s):

Kristian Woodsend ◽

Mirella Lapata

Keyword(s):

Gold Standard ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Semantic Role ◽

Semantic Role Labeling ◽

Comparable Corpora ◽

Rewrite Rules ◽

Model Training

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

Download Full-text

Syntax Role for Neural Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli_a_00408 ◽

2021 ◽

pp. 1-48

Author(s):

Zuchao Li ◽

Hai Zhao ◽

Shexia He ◽

Jiaxun Cai

Keyword(s):

Argument Structure ◽

Large Scale ◽

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Empirical Survey ◽

Learning Framework ◽

Syntactic Information ◽

Feature Based ◽

Predicate Argument Structure

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.

Download Full-text

A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis

Applied Sciences ◽

10.3390/app112311344 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11344

Author(s):

Wei Ke ◽

Ka-Hou Chan

Keyword(s):

Probability Distribution ◽

Information Extraction ◽

Sentiment Analysis ◽

State Of The Art ◽

Final Analysis ◽

The State ◽

Experimental Results ◽

Content Adaptive

Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type of CNN-based model as an extractor to explore and capture useful features in the hidden state, which represent the content of the entire paragraph. In particular, we introduce the Chebyshev pooling to connect to the end of the CNN-based extractor instead of using the maximum pooling. This can project the features into a probability distribution so as to provide an interpretable evaluation for the final analysis. Experimental results demonstrate the superiority of the proposed approach, being compared to the state-of-the-art models.

Download Full-text

RDFuzz: Accelerating Directed Fuzzing with Intertwined Schedule and Optimized Mutation

Mathematical Problems in Engineering ◽

10.1155/2020/7698916 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Jiaxi Ye ◽

Ruilin Li ◽

Bin Zhang

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Exploration And Exploitation ◽

Balance Problem ◽

Evaluation Strategy ◽

Testing Schedule ◽

Available Resources

Directed fuzzing is a practical technique, which concentrates its testing energy on the process toward the target code areas, while costing little on other unconcerned components. It is a promising way to make better use of available resources, especially in testing large-scale programs. However, by observing the state-of-the-art-directed fuzzing engine (AFLGo), we argue that there are two universal limitations, the balance problem between the exploration and the exploitation and the blindness in mutation toward the target code areas. In this paper, we present a new prototype RDFuzz to address these two limitations. In RDFuzz, we first introduce the frequency-guided strategy in the exploration and improve its accuracy by adopting the branch-level instead of the path-level frequency. Then, we introduce the input-distance-based evaluation strategy in the exploitation stage and present an optimized mutation to distinguish and protect the distance sensitive input content. Moreover, an intertwined testing schedule is leveraged to perform the exploration and exploitation in turn. We test RDFuzz on 7 benchmarks, and the experimental results demonstrate that RDFuzz is skilled at driving the program toward the target code areas, and it is not easily stuck by the balance problem of the exploration and the exploitation.

Download Full-text

FedOpt: Towards Communication Efficiency and Privacy Preservation in Federated Learning

Applied Sciences ◽

10.3390/app10082864 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2864 ◽

Cited By ~ 4

Author(s):

Muhammad Asad ◽

Ahmed Moustafa ◽

Takayuki Ito

Keyword(s):

Privacy Preservation ◽

State Of The Art ◽

Homomorphic Encryption ◽

Evaluation Criteria ◽

The State ◽

Compression Algorithm ◽

Experimental Results ◽

Local Learning ◽

Learning Models ◽

Communication Efficiency

Artificial Intelligence (AI) has been applied to solve various challenges of real-world problems in recent years. However, the emergence of new AI technologies has brought several problems, especially with regard to communication efficiency, security threats and privacy violations. Towards this end, Federated Learning (FL) has received widespread attention due to its ability to facilitate the collaborative training of local learning models without compromising the privacy of data. However, recent studies have shown that FL still consumes considerable amounts of communication resources. These communication resources are vital for updating the learning models. In addition, the privacy of data could still be compromised once sharing the parameters of the local learning models in order to update the global model. Towards this end, we propose a new approach, namely, Federated Optimisation (FedOpt) in order to promote communication efficiency and privacy preservation in FL. In order to implement FedOpt, we design a novel compression algorithm, namely, Sparse Compression Algorithm (SCA) for efficient communication, and then integrate the additively homomorphic encryption with differential privacy to prevent data from being leaked. Thus, the proposed FedOpt smoothly trade-offs communication efficiency and privacy preservation in order to adopt the learning task. The experimental results demonstrate that FedOpt outperforms the state-of-the-art FL approaches. In particular, we consider three different evaluation criteria; model accuracy, communication efficiency and computation overhead. Then, we compare the proposed FedOpt with the baseline configurations and the state-of-the-art approaches, i.e., Federated Averaging (FedAvg) and the paillier-encryption based privacy-preserving deep learning (PPDL) on all these three evaluation criteria. The experimental results show that FedOpt is able to converge within fewer training epochs and a smaller privacy budget.

Download Full-text

Saga of Superfluid Solids

Physics ◽

10.3390/physics2010006 ◽

2020 ◽

Vol 2 (1) ◽

pp. 49-66 ◽

Cited By ~ 4

Author(s):

Vyacheslav I. Yukalov

Keyword(s):

Numerical Simulations ◽

State Of The Art ◽

Theoretical Models ◽

The State ◽

Experimental Results ◽

Standing Problem

The article presents the state of the art and reviews the literature on the long-standing problem of the possibility for a sample to be at the same time solid and superfluid. Theoretical models, numerical simulations, and experimental results are discussed.

Download Full-text

Context-aware Frame-Semantic Role Labeling

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00150 ◽

2015 ◽

Vol 3 ◽

pp. 449-460 ◽

Cited By ~ 5

Author(s):

Michael Roth ◽

Mirella Lapata

Keyword(s):

Social Network ◽

Question Answering ◽

State Of The Art ◽

Semantic Role ◽

Context Aware ◽

Semantic Role Labeling ◽

Current State ◽

Sentence Level ◽

Small Set ◽

Labeling System

Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling.

Download Full-text