Improving identifier informativeness using part of speech information

Author(s):  
Dave Binkley ◽  
Matthew Hearn ◽  
Dawn Lawrie
2017 ◽  
pp. 35-46 ◽  
Author(s):  
Irene Doval

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.


2016 ◽  
Vol 105 (1) ◽  
pp. 63-76
Author(s):  
Theresa Guinard

Abstract Morphological analysis (finding the component morphemes of a word and tagging morphemes with part-of-speech information) is a useful preprocessing step in many natural language processing applications, especially for synthetic languages. Compound words from the constructed language Esperanto are formed by straightforward agglutination, but for many words, there is more than one possible sequence of component morphemes. However, one segmentation is usually more semantically probable than the others. This paper presents a modified n-gram Markov model that finds the most probable segmentation of any Esperanto word, where the model’s states represent morpheme part-of-speech and semantic classes. The overall segmentation accuracy was over 98% for a set of presegmented dictionary words.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaoqiang Chi ◽  
Yang Xiang

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.


2002 ◽  
Vol 8 (2-3) ◽  
pp. 193-207 ◽  
Author(s):  
TOKUNAGA TAKENOBU ◽  
KIMURA KENJI ◽  
OGIBAYASHI HIRONORI ◽  
TANAKA HOZUMI

This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.


Author(s):  
Ting Lu ◽  
Yan Xiang ◽  
Junge Liang ◽  
Li Zhang ◽  
Mingfang Zhang

The grand challenge of cross-domain sentiment analysis is that classifiers trained in a specific domain are very sensitive to the discrepancy between domains. A sentiment classifier trained in the source domain usually have a poor performance in the target domain. One of the main strategies to solve this problem is the pivot-based strategy, which regards the feature representation as an important component. However, part-of-speech information was not considered to guide the learning of feature representation and feature mapping in previous pivot-based models. Therefore, we present a fused part-of-speech vectors and attention-based model (FAM). In our model, we fuse part-of-speech vectors and feature word embeddings as the representation of features, giving deep semantics to mapping features. And we adopt Multi-Head attention mechanism to train the cross-domain sentiment classifier to obtain the connection between different features. The results of 12 groups comparative experiments on the Amazon dataset demonstrate that our model outperforms all baseline models in this paper.


Sign in / Sign up

Export Citation Format

Share Document