The Linguistic Annotation of Corpora

1998 ◽  
Vol 3 (2) ◽  
pp. 189-210 ◽  
Author(s):  
Jan Aarts ◽  
Hans van Halteren ◽  
Nelleke Oostdijk

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

2018 ◽  
Vol 10 (3) ◽  
pp. 467-493 ◽  
Author(s):  
OANA DAVID ◽  
TEENIE MATLOCK

abstractConceptual metaphor research has benefited from advances in discourse analytic and corpus linguistic methodologies over the years, especially given recent developments with Natural Language Processing (NLP) technologies. Such technologies are now capable of identifying metaphoric expressions across large bodies of text. Here we focus on how one particular analytic tool, MetaNet, can be used to study everyday discourse about personal and social problems, in particular, poverty and cancer, by leveraging reusable networks of primary metaphors enhanced with specific metaphor subcases. We discuss the advantages of this approach in allowing us to gain valuable insights into cross-linguistic metaphor commonalities and variation. To demonstrate its utility, we analyze corpus data from English and Spanish.


Corpora ◽  
2010 ◽  
Vol 5 (1) ◽  
pp. 1-27 ◽  
Author(s):  
Antti Arppe ◽  
Gaëtanelle Gilquin ◽  
Dylan Glynn ◽  
Martin Hilpert ◽  
Arne Zeschel

Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.


2016 ◽  
Vol 9 (7) ◽  
pp. 10
Author(s):  
Eman Saleh Akeel

<p>The growing field of corpus linguistics has been engaged heavily in language pedagogy during the last two decades. This has encouraged researchers to look for more applications that corpora have on language teaching and learning and led to the emersion of using corpora in language testing. The aim of this article is to provide an overview of using corpus data for the purpose of vocabulary test designing. It presents some native and learner corpora which are available for item writers to use. It covers the benefits and limitations of using corpora in language testing and argues for the importance and usefulness of using native as well as learner corpora as tools for designing a vocabulary test. The article aims to illustrate how both native and learner corpora can be used in language testing in general and in the development of vocabulary tests in particular.</p>


2017 ◽  
Vol 34 (4) ◽  
pp. 555-564 ◽  
Author(s):  
Jesse Egbert

The use of corpora and corpus linguistic methods in language testing research is increasing at an accelerated pace. The growing body of language testing research that uses corpus linguistic data is a testament to their utility in test development and validation. Although there are many reasons to be optimistic about the future of using corpus data in language testing, the convergence of these two fields introduces uncharted waters that should be traversed carefully to ensure that high standards of methodological rigor are maintained. The objectives of this paper are as follows: (1) to describe and evaluate the ways corpora and corpus data have been used in language testing to date; and (2) to offer recommendations for best practices to encourage rigorous and appropriate corpus linguistic methods for language testing purposes. This is accomplished with the aid of examples from papers in this special issue, as well as other previous work in this area. The future holds great promise for a useful methodological synergy between corpus linguistics and language testing. The choices researchers make as they navigate the uncharted and challenging waters that lie ahead will ultimately determine whether that potential is fully realized.


2009 ◽  
Vol 14 (3) ◽  
pp. 393-417 ◽  
Author(s):  
Lynne Flowerdew

This article reviews and discusses four somewhat contentious issues in the application of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic techniques have been criticized on the grounds that they encourage a more bottom-up rather than top-down processing of text in which concordance lines are examined atomistically. One criticism levelled against corpus data is that a corpus presents language out of its original context. For this reason, some corpus linguists have underscored the importance of ‘pedagogic mediation’ to contextualize the data for the students’ own writing environment. Concerns relating to the inductive approach associated with corpus-based pedagogy have also been raised as this approach may not always be the most appropriate one. A final consideration relates to the issue of whether a corpus is always the most appropriate resource to use among the wealth of other resources available.


2017 ◽  
Vol 16 (2) ◽  
pp. 61-76 ◽  
Author(s):  
Anaïs Thibault Landry ◽  
Marylène Gagné ◽  
Jacques Forest ◽  
Sylvie Guerrero ◽  
Michel Séguin ◽  
...  

Abstract. To this day, researchers are debating the adequacy of using financial incentives to bolster performance in work settings. Our goal was to contribute to current understanding by considering the moderating role of distributive justice in the relation between financial incentives, motivation, and performance. Based on self-determination theory, we hypothesized that when bonuses are fairly distributed, using financial incentives makes employees feel more competent and autonomous, which in turn fosters greater autonomous motivation and lower controlled motivation, and better work performance. Results from path analyses in three samples supported our hypotheses, suggesting that the effect of financial incentives is contextual, and that compensation plans using financial incentives and bonuses can be effective when properly managed.


2013 ◽  
Vol 21 (8) ◽  
pp. 1371-1381
Author(s):  
Guangzhen JIA ◽  
Youyi LIU ◽  
Hua SHU ◽  
Xiaoping Fang
Keyword(s):  

Author(s):  
Yong Li ◽  
Xiaojun Yang ◽  
Min Zuo ◽  
Qingyu Jin ◽  
Haisheng Li ◽  
...  

The real-time and dissemination characteristics of network information make net-mediated public opinion become more and more important food safety early warning resources, but the data of petabyte (PB) scale growth also bring great difficulties to the research and judgment of network public opinion, especially how to extract the event role of network public opinion from these data and analyze the sentiment tendency of public opinion comment. First, this article takes the public opinion of food safety network as the research point, and a BLSTM-CRF model for automatically marking the role of event is proposed by combining BLSTM and conditional random field organically. Second, the Attention mechanism based on vocabulary in the field of food safety is introduced, the distance-related sequence semantic features are extracted by BLSTM, and the emotional classification of sequence semantic features is realized by using CNN. A kind of Att-BLSTM-CNN model for the analysis of public opinion and emotional tendency in the field of food safety is proposed. Finally, based on the time series, this article combines the role extraction of food safety events and the analysis of emotional tendency and constructs a net-mediated public opinion early warning model in the field of food safety according to the heat of the event and the emotional intensity of the public to food safety public opinion events.


Sign in / Sign up

Export Citation Format

Share Document