scholarly journals Text Rewriting Improves Semantic Role Labeling

2014 ◽  
Vol 51 ◽  
pp. 133-164 ◽  
Author(s):  
K. Woodsend ◽  
M. Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

Author(s):  
Kristian Woodsend ◽  
Mirella Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.


2008 ◽  
Vol 34 (2) ◽  
pp. 225-255 ◽  
Author(s):  
Nianwen Xue

In this article we report work on Chinese semantic role labeling, taking advantage of two recently completed corpora, the Chinese PropBank, a semantically annotated corpus of Chinese verbs, and the Chinese Nombank, a companion corpus that annotates the predicate-argument structure of nominalized predicates. Because the semantic role labels are assigned to the constituents in a parse tree, we first report experiments in which semantic role labels are automatically assigned to hand-crafted parses in the Chinese Treebank. This gives us a measure of the extent to which semantic role labels can be bootstrapped from the syntactic annotation provided in the treebank. We then report experiments using automatic parses with decreasing levels of human annotation in the input to the syntactic parser: parses that use gold-standard segmentation and POS-tagging, parses that use only gold-standard segmentation, and fully automatic parses. These experiments gauge how successful semantic role labeling for Chinese can be in more realistic situations. Our results show that when hand-crafted parses are used, semantic role labeling accuracy for Chinese is comparable to what has been reported for the state-of-the-art English semantic role labeling systems trained and tested on the English PropBank, even though the Chinese PropBank is significantly smaller in size. When an automatic parser is used, however, the accuracy of our system is significantly lower than the English state of the art. This indicates that an improvement in Chinese parsing is critical to high-performance semantic role labeling for Chinese.


2019 ◽  
Vol 7 ◽  
pp. 343-356 ◽  
Author(s):  
Rui Cai ◽  
Mirella Lapata

In this paper we focus on learning dependency aware representations for semantic role labeling without recourse to an external parser. The backbone of our model is an LSTM-based semantic role labeler jointly trained with two auxiliary tasks: predicting the dependency label of a word and whether there exists an arc linking it to the predicate. The auxiliary tasks provide syntactic information that is specific to semantic role labeling and are learned from training data (dependency annotations) without relying on existing dependency parsers, which can be noisy (e.g., on out-of-domain data or infrequent constructions). Experimental results on the CoNLL-2009 benchmark dataset show that our model outperforms the state of the art in English, and consistently improves performance in other languages, including Chinese, German, and Spanish.


2012 ◽  
Vol 38 (1) ◽  
pp. 135-171 ◽  
Author(s):  
Hagen Fürstenau ◽  
Mirella Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance semantic role labeling systems. Unfortunately, such corpora are expensive to produce, limited in size, and may not be representative. Our work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning. The key idea of our approach is to find novel instances for classifier training based on their similarity to manually labeled seed instances. The underlying assumption is that sentences that are similar in their lexical material and syntactic structure are likely to share a frame semantic analysis. We formalize the detection of similar sentences and the projection of role annotations as a graph alignment problem, which we solve exactly using integer linear programming. Experimental results on semantic role labeling show that the automatic annotations produced by our method improve performance over using hand-labeled instances alone.


2017 ◽  
Vol 90 (2) ◽  
pp. 225-237 ◽  
Author(s):  
Zhijun Yang ◽  
Baochun Guo ◽  
Liqun Zhang

ABSTRACT Graphene has attracted a great deal of interest in recent years, illustrated by its potential in a variety of areas in physics, chemistry, and engineering. Specifically, graphene has opened up exciting possibilities for high-performance and functional rubber composites. Although copious literature deals with the fascinating properties related to graphene, its real (large scale) applications in rubber-based composites have not been approached. We discuss the state of the art in development in processing and the status in understanding of structure/performance relationships. Accordingly, the prospectives and challenges of some real applications of graphene-based rubber composites such as tires and sensors are surveyed and discussed.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehdi Srifi ◽  
Ahmed Oussous ◽  
Ayoub Ait Lahcen ◽  
Salma Mouline

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.


Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


1992 ◽  
Vol 36 (5) ◽  
pp. 821-828 ◽  
Author(s):  
K. H. Brown ◽  
D. A. Grose ◽  
R. C. Lange ◽  
T. H. Ning ◽  
P. A. Totta

2021 ◽  
Vol 14 (4) ◽  
pp. 1-28
Author(s):  
Tao Yang ◽  
Zhezhi He ◽  
Tengchuan Kou ◽  
Qingzheng Li ◽  
Qi Han ◽  
...  

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.


2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document