Improving identifier informativeness using part of speech information

Part-of-speech persistence: The influence of part-of-speech information on lexical processes☆

Journal of Memory and Language ◽

10.1016/j.jml.2006.12.001 ◽

2007 ◽

Vol 56 (4) ◽

pp. 472-489 ◽

Cited By ~ 12

Author(s):

Alissa Melinger ◽

Jean-Pierre Koenig

Keyword(s):

Part Of Speech ◽

Speech Information ◽

Lexical Processes

Download Full-text

POS-tagging a bilingual parallel corpus: methods and challenges

Research in Corpus Linguistics ◽

10.32714/ricl.05.03 ◽

2017 ◽

pp. 35-46 ◽

Cited By ~ 2

Author(s):

Irene Doval

Keyword(s):

The Other ◽

Major Error ◽

Error Patterns ◽

Parallel Corpus ◽

Pos Tagging ◽

Ongoing Process ◽

Part Of Speech ◽

Improve Accuracy ◽

The One ◽

Speech Information

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.

Download Full-text

An Algorithm for Morphological Segmentation of Esperanto Words

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0003 ◽

2016 ◽

Vol 105 (1) ◽

pp. 63-76

Author(s):

Theresa Guinard

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Morphological Analysis ◽

Compound Words ◽

Part Of Speech ◽

Semantic Classes ◽

Morphological Segmentation ◽

Segmentation Accuracy ◽

N Gram ◽

Speech Information

Abstract Morphological analysis (finding the component morphemes of a word and tagging morphemes with part-of-speech information) is a useful preprocessing step in many natural language processing applications, especially for synthetic languages. Compound words from the constructed language Esperanto are formed by straightforward agglutination, but for many words, there is more than one possible sequence of component morphemes. However, one segmentation is usually more semantically probable than the others. This paper presents a modified n-gram Markov model that finds the most probable segmentation of any Esperanto word, where the model’s states represent morpheme part-of-speech and semantic classes. The overall segmentation accuracy was over 98% for a set of presegmented dictionary words.

Download Full-text

Fusing Part-of-Speech Information in Low-Resource Neural Paraphrase Generation

Computational Intelligence and Neuroscience ◽

10.1155/2021/9022193 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Xiaoqiang Chi ◽

Yang Xiang

Keyword(s):

Language Processing ◽

Sequence Learning ◽

Neural Nets ◽

Linguistic Knowledge ◽

Underlying Assumption ◽

Low Resource ◽

Part Of Speech ◽

Multiple Datasets ◽

Speech Information ◽

Paraphrase Generation

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.

Download Full-text

Semi-supervised and unsupervised categorization of posts in Web discussion forums using part-of-speech information and minimal features

10.18653/v1/w16-0417 ◽

2016 ◽

Author(s):

Krish Perumal ◽

Graeme Hirst

Keyword(s):

Discussion Forums ◽

Part Of Speech ◽

Speech Information

Download Full-text

Selecting effective index terms using a decision tree

Natural Language Engineering ◽

10.1017/s1351324902002899 ◽

2002 ◽

Vol 8 (2-3) ◽

pp. 193-207 ◽

Cited By ~ 2

Author(s):

TOKUNAGA TAKENOBU ◽

KIMURA KENJI ◽

OGIBAYASHI HIRONORI ◽

TANAKA HOZUMI

Keyword(s):

Decision Tree ◽

Effective Index ◽

Test Collection ◽

Retrieval Effectiveness ◽

Part Of Speech ◽

Information Retrieval Systems ◽

Index Terms ◽

Two Phases ◽

Syntactic Relations ◽

Speech Information

This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.

Download Full-text

Domain-Specific Chinese Transformer-XL Language Model with Part-of-Speech Information

2020 16th International Conference on Computational Intelligence and Security (CIS) ◽

10.1109/cis52066.2020.00026 ◽

2020 ◽

Author(s):

Huaichang Qu ◽

Haifeng Zhao ◽

Xin Wang

Keyword(s):

Language Model ◽

Domain Specific ◽

Part Of Speech ◽

Speech Information

Download Full-text

Fusion of part-of-speech vectors and attention mechanisms for cross-domain sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201295 ◽

2021 ◽

pp. 1-9

Author(s):

Ting Lu ◽

Yan Xiang ◽

Junge Liang ◽

Li Zhang ◽

Mingfang Zhang

Keyword(s):

Sentiment Analysis ◽

Poor Performance ◽

Feature Representation ◽

Grand Challenge ◽

Feature Mapping ◽

Target Domain ◽

Specific Domain ◽

Part Of Speech ◽

Cross Domain ◽

Speech Information

The grand challenge of cross-domain sentiment analysis is that classifiers trained in a specific domain are very sensitive to the discrepancy between domains. A sentiment classifier trained in the source domain usually have a poor performance in the target domain. One of the main strategies to solve this problem is the pivot-based strategy, which regards the feature representation as an important component. However, part-of-speech information was not considered to guide the learning of feature representation and feature mapping in previous pivot-based models. Therefore, we present a fused part-of-speech vectors and attention-based model (FAM). In our model, we fuse part-of-speech vectors and feature word embeddings as the representation of features, giving deep semantics to mapping features. And we adopt Multi-Head attention mechanism to train the cross-domain sentiment classifier to obtain the connection between different features. The results of 12 groups comparative experiments on the Amazon dataset demonstrate that our model outperforms all baseline models in this paper.

Download Full-text

Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information

10.21437/interspeech.2016-400 ◽

2016 ◽

Cited By ~ 1

Author(s):

Wen-Yu Huang ◽

Shan-Wen Hsiao ◽

Hung-Ching Sun ◽

Ming-Chuan Hsieh ◽

Ming-Hsueh Tsai ◽

...

Keyword(s):

Oral Presentation ◽

Assessment System ◽

Part Of Speech ◽

Word Representation ◽

Speech Information

Download Full-text

Creating a test corpus of clinical notes manually tagged for part-of-speech information

10.3115/1567594.1567607 ◽

2004 ◽

Cited By ~ 3

Author(s):

Serguei Pakhomov ◽

Anni Coden ◽

Christopher Chute

Keyword(s):

Clinical Notes ◽

Part Of Speech ◽

Test Corpus ◽

Speech Information

Download Full-text