About the discourse meaning function of the phrase ‘-길래’ in corpus data

Multiword units lead to errors of commission in children's spontaneous production: “What corpus data can tell us?*”

Developmental Science ◽

10.1111/desc.13125 ◽

2021 ◽

Author(s):

Stewart M. McCauley ◽

Colin Bannard ◽

Anna Theakston ◽

Michelle Davis ◽

Thea Cameron‐Faulkner ◽

...

Keyword(s):

Spontaneous Production ◽

Multiword Units ◽

Corpus Data

Download Full-text

Analyzing free variation with harmony – A case study of verb-cluster serialization

Zeitschrift für Sprachwissenschaft ◽

10.1515/zfs-2020-2020 ◽

2020 ◽

Vol 39 (3) ◽

pp. 407-437

Author(s):

Markus Bader

Keyword(s):

Native Speakers ◽

Initial Position ◽

Experimental Results ◽

Modal Verbs ◽

Weighted Constraints ◽

Corpus Data ◽

Frequency Relationship ◽

Zero Frequency ◽

Free Variation

Abstract In German, a verb selected by another verb normally precedes the selecting verb. Modal verbs in the perfect tense provide an exception to this generalization because they require the perfective auxiliary to occur in cluster-initial position according to prescriptive grammars. Bader and Schmid (2009b) have shown, however, that native speakers accept the auxiliary in all positions except the cluster-final one. Experimental results as well as corpus data indicate that verb cluster serialization is a case of free variation. I discuss how this variation can be accounted for, focusing on two mismatches between acceptability and frequency: First, slight acceptability advantages can turn into strong frequency advantages. Second, syntactic variants with basically zero frequency can still vary substantially in acceptability. These mismatches remain unaccounted for if acceptability is related to frequency on the level of whole sentence structures, as in Stochastic OT (Boersma and Hayes2001). However, when the acceptability-frequency relationship is modeled on the level of individual weighted constraints, using harmony as link (see Pater2009, for different harmony based frameworks), the two mismatches follow given appropriate linking assumptions.

Download Full-text

The traceback method and the early constructicon: theoretical and methodological considerations

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0045 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Nikolas Koch ◽

Stefan Hartmann ◽

Antje Endesfelder Quick

Keyword(s):

Language Acquisition ◽

Case Studies ◽

Child Language ◽

Critical Evaluation ◽

Methodological Choices ◽

Corpus Data ◽

Methodological Considerations ◽

Language Corpus

AbstractUsage-based approaches assume that children’s early utterances are item-based. This has been demonstrated in a number of studies using the traceback method. In this approach, a small amount of “target utterances” from a child language corpus is “traced back” to earlier utterances. Drawing on a case study of German, this paper provides a critical evaluation of the method from a usage-based perspective. In particular, we check how factors inherent to corpus data as well as methodological choices influence the results of traceback studies. To this end, we present four case studies in which we change thresholds and the composition of the main corpus, use a cross-corpus approach tracing one child’s utterances back to another child’s corpus, and reverse and randomize the target utterances. Overall, the results show that the method can provide interesting insights—particularly regarding different pathways of language acquisition—but they also show the limitations of the method.

Download Full-text

Transitivity prominence within and across modalities

Open Linguistics ◽

10.1515/opli-2019-0037 ◽

2019 ◽

Vol 5 (1) ◽

pp. 666-689

Author(s):

Carl Börstell ◽

Tommi Jantunen ◽

Vadim Kimmelman ◽

Vanja de Lint ◽

Johanna Mesch ◽

...

Keyword(s):

Sign Language ◽

Sign Languages ◽

Corpus Data ◽

Direct Objects ◽

Positive Correlations ◽

Verb Meaning

AbstractWe investigate transitivity prominence of verbs across signed and spoken languages, based on data from both valency dictionaries and corpora. Our methodology relies on the assumption that dictionary data and corpus-based measures of transitivity are comparable, and we find evidence in support of this through the direct comparison of these two types of data across several spoken languages. For the signed modality, we measure the transitivity prominence of verbs in five sign languages based on corpus data and compare the results to the transitivity prominence hierarchy for spoken languages reported in Haspelmath (2015). For each sign language, we create a hierarchy for 12 verb meanings based on the proportion of overt direct objects per verb meaning. We use these hierarchies to calculate correlations between languages – both signed and spoken – and find positive correlations between transitivity hierarchies. Additional findings of this study include the observation that locative arguments seem to behave differently than direct objects judging by our measures of transitivity, and that relatedness among sign languages does not straightforwardly imply similarity in transitivity hierarchies. We conclude that our findings provide support for a modality-independent, semantic basis of transitivity.

Download Full-text

3. Corpus data

The Present-day English Gerund System ◽

10.1515/9783110686661-003 ◽

2020 ◽

pp. 39-48

Keyword(s):

Corpus Data

Download Full-text

‘Almost people’: A Learner Corpus Account of L2 Use and Misuse of Non-numerical Quantification

Open Linguistics ◽

10.1515/opli-2016-0015 ◽

2016 ◽

Vol 2 (1) ◽

Author(s):

Peter Crosthwaite ◽

Lavigne L.Y. Choy ◽

Yeonsuk Bae

Keyword(s):

English Learners ◽

English Speakers ◽

L2 Proficiency ◽

L1 Transfer ◽

Learner Corpus ◽

Proficiency Level ◽

Closed Class ◽

Corpus Data ◽

Noun Number ◽

L1 English

AbstractWe present an Integrated Contrastive Model of non-numerical quantificational NPs (NNQs, i.e. ‘some people’) produced by L1 English speakers and Mandarin and Korean L2 English learners. Learner corpus data was sourced from the ICNALE (Ishikawa, 2011, 2013) across four L2 proficiency levels. An average 10% of L2 NNQs were specific to L2 varieties, including noun number mismatches (*‘many child’), omitting obligatory quantifiers after adverbs (*‘almost people’), adding unnecessary particles (*‘all of people’) and non-L1 English-like quantifier/noun agreement (*‘many water’). Significantly fewer ‘openclass’ NNQs (e.g a number of people) are produced by L2 learners, preferring ‘closed-class’ single lexical quantifiers (following L1-like use). While such production is predictable via L1 transfer, Korean L2 English learners produced significantly more L2-like NNQs at each proficiency level, which was not entirely predictable under a transfer account. We thus consider whether positive transfer of other linguistic forms (i.e. definiteness marking) aids the learnability of other L2 forms (i.e. expression of quantification).

Download Full-text

Efficient Visualisation of the Relative Distribution of Keyword Search Results in a Corpus Data Cube

Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP - DOLAP '15 ◽

10.1145/2811222.2811234 ◽

2015 ◽

Cited By ~ 1

Author(s):

Mark Sifer ◽

Yutaka Watanobe ◽

Subhash Bhalla

Keyword(s):

Keyword Search ◽

Data Cube ◽

Relative Distribution ◽

Search Results ◽

Corpus Data

Download Full-text

Tiger Woods as a linguistic role model?

English Today ◽

10.1017/s0266078408000114 ◽

2008 ◽

Vol 24 (1) ◽

pp. 59-61

Author(s):

Brian Poole

Keyword(s):

Role Model ◽

British English ◽

The Media ◽

Corpus Data ◽

Tiger Woods

ABSTRACTInternational sports stars are often required to speak to the media after their performances. When Tiger Woods does so, it is noticeable that he makes use of the formulaic expression ‘I feel/felt like I’ as a means of introducing descriptions of, or generalizations about, his actions or motivations. Drawing on corpus data, this paper offers some observations about this expression in relation to its use by speakers (and to a lesser extent writers) of both American and British English, and also investigates the apparent disparity in frequency between instances of ‘she’ and ‘he’ when it is used.

Download Full-text

Embracing multidimensionality in phonological analysis

The Linguistic Review ◽

10.1515/tlr-2021-2060 ◽

2021 ◽

Vol 38 (1) ◽

pp. 101-139

Author(s):

Abigail C. Cohn ◽

Margaret E. L. Renwick

Keyword(s):

Native Speaker ◽

Experimental Studies ◽

Spoken Language ◽

Iterative Approach ◽

Full Spectrum ◽

Multiple Factors ◽

Multiple Data ◽

Language Data ◽

Corpus Data ◽

Insight Into

Abstract We pursue the idea, implicit in much current phonological research, that understanding the multiple factors that shape speech production and perception is within the purview of phonology. In particular, increased access to naturalistic data has highlighted the multidimensional reality of variation in spoken language. At the same time, longstanding methods of doing phonology – including impressionistic analysis, and laboratory and experimental studies – remain crucial to understanding native speaker competence and grammar. We advocate for an expanded methodological toolbox in phonological analysis, using an iterative approach that crucially includes naturalistic corpus data. Integrating across multiple data sources offers fuller insight into the nature of the phonological system and native speaker-hearer ability. Several case studies highlight findings gained through linked, iterative studies, showing the importance of naturalistic data for a richer understanding of phonological phenomena, and leading us to reflect on desiderata for corpora to reveal speaker-specific patterns in fine phonetic detail and variability, which we argue are part of a speaker-hearer’s phonological competence. Phonological analysis that embraces the full spectrum of variation in spoken language data (from categorical to gradient, and systematic to sporadic) contributes to a deeper understanding of phonology in this richer sense.

Download Full-text

The L2 syllabus: corpus or contrivance?

Corpora ◽

10.3366/cor.2007.2.2.157 ◽

2007 ◽

Vol 2 (2) ◽

pp. 157-185 ◽

Cited By ~ 3

Author(s):

Terry Shortall

Keyword(s):

Natural Frequency ◽

Language Learners ◽

Target Language ◽

Frequency Data ◽

Present Perfect ◽

Corpus Studies ◽

Verb Forms ◽

Growing Body ◽

Real Language ◽

Corpus Data

Corpus linguists have argued that corpora allow us to present lexical and grammatical patterns to language learners as they occur in real language, thereby exposing the learner to authentic target language (Mindt, 1996; Biber et al., 2002; Sinclair, 2004). And there is now a growing body of empirical research into how corpus studies can benefit ELT materials design and development (Ljung, 1990, 1991; Römer, 2004, 2005). This study investigates how the present perfect is represented in a spoken corpus and in ELT textbooks. The objective is to see whether corpus frequency data can make textbook present perfect presentation represent reality more accurately, and also whether there are sometimes pedagogic aims that may override frequency considerations. Results show that textbooks fail to represent adequately how present perfect interacts with other verb forms to create hybrid tenses such the present perfect passive. Textbooks also over-represent the frequency of structures such as the present perfect continuous. Adverbs such as yet and already are much more frequent in textbooks than in the corpus. Textbook writers seem to deliberately exaggerate the frequency of such adverbs, and arguably use them as tense markers or flagging devices so that learners will expect to see present perfect when they see yet and already. This suggests that disregard for natural frequency data may be justifiable if pedagogic considerations of this kind are taken into account. So, while corpus data provides important and useful frequency data for the teaching of grammar, pedagogic objectives may sometimes require that frequency data is disregarded.

Download Full-text