Cognitive Corpus Linguistics: five points of debate on current theory and methodology

Antti Arppe; Gaëtanelle Gilquin; Dylan Glynn; Martin Hilpert; Arne Zeschel

doi:10.3366/cor.2010.0001

Cognitive Corpus Linguistics: five points of debate on current theory and methodology

Corpora ◽

10.3366/cor.2010.0001 ◽

2010 ◽

Vol 5 (1) ◽

pp. 1-27 ◽

Cited By ~ 45

Author(s):

Antti Arppe ◽

Gaëtanelle Gilquin ◽

Dylan Glynn ◽

Martin Hilpert ◽

Arne Zeschel

Keyword(s):

Corpus Linguistics ◽

Cognitive Linguistics ◽

Language Use ◽

Current Theory ◽

Linguistic Data ◽

Corpus Linguistic ◽

Grammaticality Judgments ◽

Corpus Data ◽

Authentic Language ◽

Discussion Note

Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.

Download Full-text

Corpus Linguistics and Second Language Acquisition – the use of ACORN in the teaching of Spanish Grammar

Lebende Sprachen ◽

10.1515/les.2010.008 ◽

2010 ◽

Vol 55 (1) ◽

Cited By ~ 1

Author(s):

Guadalupe Ruiz Yepes ◽

Ramesh Krishnamurthy

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Corpus Linguistics ◽

Language Use ◽

Potential Role ◽

Real Life ◽

Grammar Teaching ◽

Real Language ◽

Authentic Language

AbstractWith increasing demands for the use of authentic language in the teaching of a second language, the potential role of corpora has been an important issue of discussion in the last two decades. Corpora have helped to reveal patterns of real language use and uncovered discrepancies between the language portrayed in textbooks and the language used in real life.This article focuses on corpus-driven as well as corpus-based grammar teaching, summarising the experience of applying ACORN (the Aston Corpus Network) in the teaching of Spanish Grammar to students in the School of Languages and Social Sciences at Aston University. Our main goals were to show the students a large number of examples taken from authentic language texts, in order to support the grammar explained in class, and to provide them with a very useful resource that they can use while writing essays, preparing for exams, etc.

Download Full-text

The Linguistic Annotation of Corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.3.2.02aar ◽

1998 ◽

Vol 3 (2) ◽

pp. 189-210 ◽

Cited By ~ 1

Author(s):

Jan Aarts ◽

Hans van Halteren ◽

Nelleke Oostdijk

Keyword(s):

Language Processing ◽

Corpus Linguistics ◽

System Performance ◽

Annotation System ◽

Corpus Linguistic ◽

Linguistic Annotation ◽

Corpus Data ◽

Analysis System ◽

Performance Results

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

Download Full-text

Corpus Linguistics

10.1093/oxfordhb/9780199276349.013.0024 ◽

2012 ◽

Cited By ~ 43

Author(s):

Tony McEnery

Keyword(s):

Corpus Linguistics ◽

Language Use ◽

Research Question ◽

Large Body ◽

Linguistic Information ◽

Raw Data ◽

Parallel Corpora ◽

Comparative Framework ◽

Linguistic Evidence ◽

Corpus Data

Corpus data have emerged as the raw data/benchmark for several NLP applications. Corpus is described as a large body of linguistic evidence composed of attested language use. It may be contrasted against sentences constructed from metalinguist reflection upon language use, rather than as a result of communication in context. Corpus can be both spoken and written. It can be categorized as follows: monolingual, representing one language; comparable, using multiple monolingual corpora to create a comparative framework; parallel corpora, wherein, corpus of one language is considered, and the data obtained, is translated in other languages. The choice of corpus depends on the research question/the chosen application. Adding linguistic information can enhance a corpus. Analysts, human or mechanical, or a combination achieves annotation. The modern computerized corpus has been in vogue only since the 1940s. Ever since, the volume of corpus banks have risen steadily and assumed an increasingly multilingual nature.

Download Full-text

Comparing languages and cultures: Parametrization of analytic criteria

Russian Journal of Linguistics ◽

10.22363/2687-0088-2021-25-2-343-368 ◽

2021 ◽

Vol 25 (2) ◽

pp. 343-368

Author(s):

Barbara Lewandowska-Tomaszczyk

Keyword(s):

English Translation ◽

Cognitive Linguistics ◽

Cultural Dimensions ◽

Parallel Corpus ◽

Linguistic Data ◽

The Core ◽

Corpus Linguistic ◽

Study Methodology ◽

Linguistic Analyses ◽

Core Part

The focus of the paper is to present arguments in favour of a complex set of areas of reference in cross-linguistic analyses of meanings, aimed in particular at the identification of a set of relevant analytic criteria to perform such a comparison. The arguments are based on lexicographic and corpus linguistic data and specifically on the polysemic concept of integrity in English and its lexical counterparts in Polish. It is generally assumed in Cognitive Linguistics, which is taken as the basic framework of the present study, that meanings, which are defined as convention-based conceptualizations, are not discrete entities, fully determined, even in fuller context but rather they are dynamic conventional conceptualizations[13]. Therefore, it is considered essential to identify first their basic, prototypical senses and then their broad meanings , which include, apart from the core part, their contextual, culture-specific, and connotational properties, defined in terms of a parametrized set of semasiological as well as onomasiological properties. The study methodology has also been adjusted towards this multifocused analysis of linguistic forms and considers the interdisciplinary - linguistic, psychological, cultural and social domains to identify the cultural conceptualizations of the analysed forms. In the present case a cognitive corpus-based analysis in monolinguistic English contexts and in the English-to-Polish and Polish-to-English translation data of lexicographic and parallel corpus materials, as well as cultural dimensions will be exemplified to conclude with a parametrized system of cognitive cross-linguistic tertia comparationis to more fully determine their broad linguistic meanings.

Download Full-text

Corpus linguistics and language testing: Navigating uncharted waters

Language Testing ◽

10.1177/0265532217713045 ◽

2017 ◽

Vol 34 (4) ◽

pp. 555-564 ◽

Cited By ~ 1

Author(s):

Jesse Egbert

Keyword(s):

Corpus Linguistics ◽

Language Testing ◽

Great Promise ◽

Corpus Linguistic ◽

Methodological Rigor ◽

The Future ◽

Corpus Data ◽

High Standards ◽

Linguistic Methods ◽

Development And Validation

The use of corpora and corpus linguistic methods in language testing research is increasing at an accelerated pace. The growing body of language testing research that uses corpus linguistic data is a testament to their utility in test development and validation. Although there are many reasons to be optimistic about the future of using corpus data in language testing, the convergence of these two fields introduces uncharted waters that should be traversed carefully to ensure that high standards of methodological rigor are maintained. The objectives of this paper are as follows: (1) to describe and evaluate the ways corpora and corpus data have been used in language testing to date; and (2) to offer recommendations for best practices to encourage rigorous and appropriate corpus linguistic methods for language testing purposes. This is accomplished with the aid of examples from papers in this special issue, as well as other previous work in this area. The future holds great promise for a useful methodological synergy between corpus linguistics and language testing. The choices researchers make as they navigate the uncharted and challenging waters that lie ahead will ultimately determine whether that potential is fully realized.

Download Full-text

Stereotype in Webkorpora: Strategien zur Suche in sehr großen Datenmengen

Linguistik Online ◽

10.13092/lo.79.3349 ◽

2016 ◽

Vol 79 (5) ◽

Author(s):

Uwe Quasthoff ◽

Erla Hallsteinsdóttir

Keyword(s):

Large Scale ◽

Cognitive Linguistics ◽

Language Use ◽

Empirical Studies ◽

Written Language ◽

Corpus Linguistic ◽

Text Corpora ◽

Strong Focus ◽

Image Identity ◽

Linguistic Methods

Research on stereotypes (as well as on similar constructs such as prejudice, image, identity, etc.) goes back around a century. Most research on stereotypes belongs to the field of psychological, social and cultural studies, with a strong focus on cognitive and socio-cultural phenomena. Since Uta Quasthoff’s pioneering analysis in 1973, there have been several linguistic studies of stereotypes, i. e. on linguistic aspects of how stereotypes are realized in language use. Linguistic research on stereotypes uses methods from disciplines such as sociolinguistics, ethnography, discourse and conversation analysis. Discursive psychology and cognitive linguistics and single studies on stereotypes have used corpus-linguistic methods to explore stereotypes in language use. Nevertheless, we state that no large-scale empirical studies have yet investigated the linguistic realization of stereotypes by using empirical data from large text corpora. In this article, we explore the potential of corpus linguistic approaches in the research on the occurrences of stereotypes in written language. By using an analysis of co-occurrences, we aim to describe the relation between associative semantic stereotypes and co-occurrences in a corpus. Our hypothesis is that the linguistic construction of stereotypes as their realization in language use reproduces stereotypic representations of cognitive conceptualizations and meaning patterns, thus, stereotypic meanings should be detectable in statistical patterns in large corpora.

Download Full-text

Applying corpus linguistics to pedagogy

International Journal of Corpus Linguistics ◽

10.1075/ijcl.14.3.05flo ◽

2009 ◽

Vol 14 (3) ◽

pp. 393-417 ◽

Cited By ~ 62

Author(s):

Lynne Flowerdew

Keyword(s):

Corpus Linguistics ◽

Top Down ◽

Bottom Up ◽

Inductive Approach ◽

Corpus Linguistic ◽

Final Consideration ◽

Corpus Data

This article reviews and discusses four somewhat contentious issues in the application of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic techniques have been criticized on the grounds that they encourage a more bottom-up rather than top-down processing of text in which concordance lines are examined atomistically. One criticism levelled against corpus data is that a corpus presents language out of its original context. For this reason, some corpus linguists have underscored the importance of ‘pedagogic mediation’ to contextualize the data for the students’ own writing environment. Concerns relating to the inductive approach associated with corpus-based pedagogy have also been raised as this approach may not always be the most appropriate one. A final consideration relates to the issue of whether a corpus is always the most appropriate resource to use among the wealth of other resources available.

Download Full-text

CORPUS LINGUISTICS, LANGUAGE CORPORA AND LANGUAGE TEACHING

ENGLISH JOURNAL ◽

10.32832/english.v14i2.3845 ◽

2020 ◽

Vol 14 (2) ◽

pp. 75

Author(s):

Eska Perdana Prasetya ◽

Anita Dewi Ekawati ◽

Deni Sapta Nugraha ◽

Ahmad Marzuq ◽

Tiara Saputri Darlis

Keyword(s):

Academic Writing ◽

Corpus Linguistics ◽

Language Use ◽

Language Teaching ◽

Applied Linguistics

<span lang="EN-GB">This research is about Corpus Linguistics, Language Corpora, And Language Teaching. As we know about this science is relatively new and is associated with technology. There are several areas discussed in this study such as several important parts of the corpus, the information generated in the corpus, four main characteristics of the corpus, Types of Corpora, Corpora in Language Teaching, several types that could be related to corpus research, Applications of corpus linguistics to language teaching may be direct or indirect. The field of applied linguistics analyses large collections of written and spoken texts, which have been carefully designed to represent specific domains of language use, such as informal speech or academic writing.</span>

Download Full-text

Pohľad Na Pomenovanie Cez Prizmu Teoretických Rámcov A Slovníkového Hesla

Journal of Linguistics/Jazykovedný casopis ◽

10.2478/jazcas-2019-0011 ◽

2018 ◽

Vol 69 (3) ◽

pp. 277-301

Author(s):

Alexandra Jarošová

Keyword(s):

Corpus Linguistics ◽

Cognitive Linguistics ◽

Linguistic Meaning ◽

Extended Model ◽

Theoretical Frameworks ◽

Departure Point ◽

Lexical Meaning ◽

Linguistic Pragmatics ◽

Situational Contexts ◽

Methodological Procedures

Abstract The first part of this paper outlines the relevant aspects of functional structuralism serving lexicographers as a departure point for building a model of lexical meaning useable in the Dictionary of Contemporary Slovak Language. This section also points to some aspects of Klára Buzássyová’s research on lexis and wordformation that have enriched the functionalstructuralist paradigm. The second section shows other theoretical and methodological frameworks, such as linguistic pragmatics, cognitive linguistics and corpus linguistics (all of them departing in some respect from the structuralism and, in other aspects, being complementary with it) that can enhance the structuralist basis of the model. The third section outlines an extended model of lexical meaning that represents a synthesis of all those theoretical frameworks and, at the same time, represents a reflection of three language constituents: 1. The social constituent is present in consideration of communicative functions of utterances, naming functions of lexical units, functional styles and registers, language norms, and situational contexts; 2. The psychological component takes the form of consideration of the prototype effect, the abolition of boundaries between linguistic meaning and other parts of cognition; 3. Thanks to the structural/systematic component, a description of paradigmatic and syntagmatic behaviour of words can be performed, and an inventory of formalcontent units and categories (lexemes, lexies, wordforming and grammatical structures) can be provided. In our dictionary practice, the abovementioned model is reflected in the methodological procedures as follows: 1. Systemization of repetitive (regular, standardized) phenomena; 2. Prototypicalization of meaning description; 3. Contextualization/encyclopedization of meaning description; 4. Pragmatization of meaning description; 5. Continualized presentation of language phenomena, i.e., introduction of numerous phenomena of transient and indeterminate nature and indicating the existence of a semanticpragmatic and lexicalgrammatical continuum; 6. “Discretization” of combinatorial continuum, i.e., identification and description of entrenched word combinations with naming functions.

Download Full-text

Evaluating Logistic Mixed-Effects Models of Corpus-Linguistic Data in Light of Lexical Diffusion

Quantitative Methods in the Humanities and Social Sciences - Mixed-Effects Regression Models in Linguistics ◽

10.1007/978-3-319-69830-4_6 ◽

2018 ◽

pp. 99-116 ◽

Cited By ~ 4

Author(s):

Danielle Barth ◽

Vsevolod Kapatsinski

Keyword(s):

Mixed Effects ◽

Mixed Effects Models ◽

Linguistic Data ◽

Corpus Linguistic ◽

Lexical Diffusion

Download Full-text