The Linguistic Annotation of Corpora

Jan Aarts; Hans van Halteren; Nelleke Oostdijk

doi:10.1075/ijcl.3.2.02aar

The Linguistic Annotation of Corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.3.2.02aar ◽

1998 ◽

Vol 3 (2) ◽

pp. 189-210 ◽

Cited By ~ 1

Author(s):

Jan Aarts ◽

Hans van Halteren ◽

Nelleke Oostdijk

Keyword(s):

Language Processing ◽

Corpus Linguistics ◽

System Performance ◽

Annotation System ◽

Corpus Linguistic ◽

Linguistic Annotation ◽

Corpus Data ◽

Analysis System ◽

Performance Results

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

Download Full-text

Cross-linguistic automated detection of metaphors for poverty and cancer

Language and Cognition ◽

10.1017/langcog.2018.11 ◽

2018 ◽

Vol 10 (3) ◽

pp. 467-493 ◽

Cited By ~ 2

Author(s):

OANA DAVID ◽

TEENIE MATLOCK

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Social Problems ◽

Conceptual Metaphor ◽

Automated Detection ◽

Analytic Tool ◽

Corpus Linguistic ◽

Recent Developments ◽

Corpus Data

abstractConceptual metaphor research has benefited from advances in discourse analytic and corpus linguistic methodologies over the years, especially given recent developments with Natural Language Processing (NLP) technologies. Such technologies are now capable of identifying metaphoric expressions across large bodies of text. Here we focus on how one particular analytic tool, MetaNet, can be used to study everyday discourse about personal and social problems, in particular, poverty and cancer, by leveraging reusable networks of primary metaphors enhanced with specific metaphor subcases. We discuss the advantages of this approach in allowing us to gain valuable insights into cross-linguistic metaphor commonalities and variation. To demonstrate its utility, we analyze corpus data from English and Spanish.

Download Full-text

Cognitive Corpus Linguistics: five points of debate on current theory and methodology

Corpora ◽

10.3366/cor.2010.0001 ◽

2010 ◽

Vol 5 (1) ◽

pp. 1-27 ◽

Cited By ~ 45

Author(s):

Antti Arppe ◽

Gaëtanelle Gilquin ◽

Dylan Glynn ◽

Martin Hilpert ◽

Arne Zeschel

Keyword(s):

Corpus Linguistics ◽

Cognitive Linguistics ◽

Language Use ◽

Current Theory ◽

Linguistic Data ◽

Corpus Linguistic ◽

Grammaticality Judgments ◽

Corpus Data ◽

Authentic Language ◽

Discussion Note

Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.

Download Full-text

The Role of Native and Learner Corpora in Vocabulary Test Design

English Language Teaching ◽

10.5539/elt.v9n7p10 ◽

2016 ◽

Vol 9 (7) ◽

pp. 10

Author(s):

Eman Saleh Akeel

Keyword(s):

Corpus Linguistics ◽

Teaching And Learning ◽

Language Teaching ◽

Language Testing ◽

Test Design ◽

Vocabulary Test ◽

Language Pedagogy ◽

Learner Corpora ◽

Corpus Data

<p>The growing field of corpus linguistics has been engaged heavily in language pedagogy during the last two decades. This has encouraged researchers to look for more applications that corpora have on language teaching and learning and led to the emersion of using corpora in language testing. The aim of this article is to provide an overview of using corpus data for the purpose of vocabulary test designing. It presents some native and learner corpora which are available for item writers to use. It covers the benefits and limitations of using corpora in language testing and argues for the importance and usefulness of using native as well as learner corpora as tools for designing a vocabulary test. The article aims to illustrate how both native and learner corpora can be used in language testing in general and in the development of vocabulary tests in particular.</p>

Download Full-text

Corpus linguistics and language testing: Navigating uncharted waters

Language Testing ◽

10.1177/0265532217713045 ◽

2017 ◽

Vol 34 (4) ◽

pp. 555-564 ◽

Cited By ~ 1

Author(s):

Jesse Egbert

Keyword(s):

Corpus Linguistics ◽

Language Testing ◽

Great Promise ◽

Corpus Linguistic ◽

Methodological Rigor ◽

The Future ◽

Corpus Data ◽

High Standards ◽

Linguistic Methods ◽

Development And Validation

The use of corpora and corpus linguistic methods in language testing research is increasing at an accelerated pace. The growing body of language testing research that uses corpus linguistic data is a testament to their utility in test development and validation. Although there are many reasons to be optimistic about the future of using corpus data in language testing, the convergence of these two fields introduces uncharted waters that should be traversed carefully to ensure that high standards of methodological rigor are maintained. The objectives of this paper are as follows: (1) to describe and evaluate the ways corpora and corpus data have been used in language testing to date; and (2) to offer recommendations for best practices to encourage rigorous and appropriate corpus linguistic methods for language testing purposes. This is accomplished with the aid of examples from papers in this special issue, as well as other previous work in this area. The future holds great promise for a useful methodological synergy between corpus linguistics and language testing. The choices researchers make as they navigate the uncharted and challenging waters that lie ahead will ultimately determine whether that potential is fully realized.

Download Full-text

Applying corpus linguistics to pedagogy

International Journal of Corpus Linguistics ◽

10.1075/ijcl.14.3.05flo ◽

2009 ◽

Vol 14 (3) ◽

pp. 393-417 ◽

Cited By ~ 62

Author(s):

Lynne Flowerdew

Keyword(s):

Corpus Linguistics ◽

Top Down ◽

Bottom Up ◽

Inductive Approach ◽

Corpus Linguistic ◽

Final Consideration ◽

Corpus Data

This article reviews and discusses four somewhat contentious issues in the application of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic techniques have been criticized on the grounds that they encourage a more bottom-up rather than top-down processing of text in which concordance lines are examined atomistically. One criticism levelled against corpus data is that a corpus presents language out of its original context. For this reason, some corpus linguists have underscored the importance of ‘pedagogic mediation’ to contextualize the data for the students’ own writing environment. Concerns relating to the inductive approach associated with corpus-based pedagogy have also been raised as this approach may not always be the most appropriate one. A final consideration relates to the issue of whether a corpus is always the most appropriate resource to use among the wealth of other resources available.

Download Full-text

The Relation Between Financial Incentives, Motivation, and Performance

Journal of Personnel Psychology ◽

10.1027/1866-5888/a000182 ◽

2017 ◽

Vol 16 (2) ◽

pp. 61-76 ◽

Cited By ~ 12

Author(s):

Anaïs Thibault Landry ◽

Marylène Gagné ◽

Jacques Forest ◽

Sylvie Guerrero ◽

Michel Séguin ◽

...

Keyword(s):

Financial Incentives ◽

Work Performance ◽

Autonomous Motivation ◽

Compensation Plans ◽

Work Settings ◽

Path Analyses ◽

Three Samples ◽

And Performance ◽

Performance Results

Abstract. To this day, researchers are debating the adequacy of using financial incentives to bolster performance in work settings. Our goal was to contribute to current understanding by considering the moderating role of distributive justice in the relation between financial incentives, motivation, and performance. Based on self-determination theory, we hypothesized that when bonuses are fairly distributed, using financial incentives makes employees feel more competent and autonomous, which in turn fosters greater autonomous motivation and lower controlled motivation, and better work performance. Results from path analyses in three samples supported our hypotheses, suggesting that the effect of financial incentives is contextual, and that compensation plans using financial incentives and bonuses can be effective when properly managed.

Download Full-text

Role of Natural Language Processing in Social Media

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i12.287289 ◽

2017 ◽

Vol 5 (12) ◽

pp. 287-289

Author(s):

Aditya . ◽

◽

hare . ◽

...

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

The Role of Noun Animacy in Language Processing

Advances in Psychological Science ◽

10.3724/sp.j.1042.2013.01371 ◽

2013 ◽

Vol 21 (8) ◽

pp. 1371-1381

Author(s):

Guangzhen JIA ◽

Youyi LIU ◽

Hua SHU ◽

Xiaoping Fang

Keyword(s):

Language Processing

Download Full-text

Role Of Taurine In Male Reproductive System Performance In Adult Male Rats Exposed To Oxiative Stress By Hydrogen Peroxide

Journal of Applied Veterinary Sciences ◽

10.21608/javs.2019.62661 ◽

2019 ◽

Vol 4 (2) ◽

pp. 71-79

Author(s):

Nadhem Al-kassim

Keyword(s):

Hydrogen Peroxide ◽

Adult Male ◽

Reproductive System ◽

System Performance ◽

Male Reproductive System ◽

Male Rats

Download Full-text

Deep Structured Learning for Natural Language Processing

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3433538 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-14

Author(s):

Yong Li ◽

Xiaojun Yang ◽

Min Zuo ◽

Qingyu Jin ◽

Haisheng Li ◽

...

Keyword(s):

Public Opinion ◽

Food Safety ◽

Language Processing ◽

Early Warning ◽

Conditional Random Field ◽

Semantic Features ◽

Related Sequence ◽

The Public ◽

Network Public Opinion

The real-time and dissemination characteristics of network information make net-mediated public opinion become more and more important food safety early warning resources, but the data of petabyte (PB) scale growth also bring great difficulties to the research and judgment of network public opinion, especially how to extract the event role of network public opinion from these data and analyze the sentiment tendency of public opinion comment. First, this article takes the public opinion of food safety network as the research point, and a BLSTM-CRF model for automatically marking the role of event is proposed by combining BLSTM and conditional random field organically. Second, the Attention mechanism based on vocabulary in the field of food safety is introduced, the distance-related sequence semantic features are extracted by BLSTM, and the emotional classification of sequence semantic features is realized by using CNN. A kind of Att-BLSTM-CNN model for the analysis of public opinion and emotional tendency in the field of food safety is proposed. Finally, based on the time series, this article combines the role extraction of food safety events and the analysis of emotional tendency and constructs a net-mediated public opinion early warning model in the field of food safety according to the heat of the event and the emotional intensity of the public to food safety public opinion events.

Download Full-text