Quantifying phonetic variation: A large-scale corpus analysis of coronal segments in English infant-directed speech

Ekaterina A. Khlystova; Adam J. Chong; Megha Sundara

doi:10.1121/10.0007947

Consistency and Variability in Children’s Word Learning Across Languages

Open Mind ◽

10.1162/opmi_a_00026 ◽

2019 ◽

Vol 3 ◽

pp. 52-67 ◽

Cited By ~ 10

Author(s):

Mika Braginsky ◽

Daniel Yurovsky ◽

Virginia A. Marchman ◽

Michael C. Frank

Keyword(s):

Word Learning ◽

Large Scale ◽

Parent Report ◽

Corpus Analysis ◽

Function Words ◽

Lexical Categories ◽

Derived Properties ◽

Report Data ◽

And Function

Why do children learn some words earlier than others? The order in which words are acquired can provide clues about the mechanisms of word learning. In a large-scale corpus analysis, we use parent-report data from over 32,000 children to estimate the acquisition trajectories of around 400 words in each of 10 languages, predicting them on the basis of independently derived properties of the words’ linguistic environment (from corpora) and meaning (from adult judgments). We examine the consistency and variability of these predictors across languages, by lexical category, and over development. The patterning of predictors across languages is quite similar, suggesting similar processes in operation. In contrast, the patterning of predictors across different lexical categories is distinct, in line with theories that posit different factors at play in the acquisition of content words and function words. By leveraging data at a significantly larger scale than previous work, our analyses identify candidate generalizations about the processes underlying word learning across languages.

Download Full-text

Punctuation and syntactic structure in obwohl and weil clauses in nonstandard written German

Written Language & Literacy ◽

10.1075/wll.19.2.04sch ◽

2016 ◽

Vol 19 (2) ◽

pp. 212-245 ◽

Cited By ~ 1

Author(s):

Roland Schäfer ◽

Ulrike Sayatz

Keyword(s):

Large Scale ◽

Syntactic Structure ◽

Second Order ◽

Corpus Analysis ◽

Verb Second ◽

Corpus Study ◽

Theoretical Approaches ◽

Corpus Studies ◽

Constituent Order ◽

Insight Into

In this paper, we analyze written sentences containing the German particles obwohl (“although”) and weil (“because”). In standard written German, these particles embed clauses in verb-last constituent order, which is characteristic of subordinated clauses. In spoken and – as we show – nonstandard written German, they embed clauses in verb-second constituent order, which is characteristic of independent sentences. Our usage-based approach to the syntax – graphemics interface includes a large-scale corpus analysis of the patterns of punctuation in the nonstandard variants that provides clues to the syntactic structure and degree of sentential independence of the nonstandard variants. Our corpus study confirms and refines hypotheses from existing theoretical approaches by clearly showing that writers mark obwohl clauses with verb-second order systematically as independent sentences, whereas weil clauses with verb-second order are much less strongly marked as independent. This work suggests that similar corpus studies could provide deeper insight into the interplay between syntax and graphemics.

Download Full-text

Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques

Language Learning ◽

10.1111/lang.12232 ◽

2017 ◽

Vol 67 (S1) ◽

pp. 180-208 ◽

Cited By ~ 33

Author(s):

Theodora Alexopoulou ◽

Marije Michel ◽

Akira Murakami ◽

Detmar Meurers

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Corpus Analysis ◽

Linguistic Complexity ◽

Learner Corpus ◽

Task Effects ◽

Learner Corpus Analysis ◽

Processing Techniques

Download Full-text

Large Scale Corpus Analysis and Recent Applications

PRICAI 2008: Trends in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-89197-0_4 ◽

2008 ◽

pp. 4-4

Author(s):

Yuji Matsumoto

Keyword(s):

Large Scale ◽

Corpus Analysis

Download Full-text

Towards a New Approach to Some and Any Based on Large-Scale Corpus Analysis. Report on My Research into Non-Assertive Contexts with a Special Emphasis on Conditional Sentences

Procedia - Social and Behavioral Sciences ◽

10.1016/j.sbspro.2015.07.471 ◽

2015 ◽

Vol 198 ◽

pp. 500-508

Author(s):

Chris Turner

Keyword(s):

Large Scale ◽

Corpus Analysis ◽

New Approach

Download Full-text

Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2014-0049 ◽

2018 ◽

Vol 14 (1) ◽

pp. 1-34 ◽

Cited By ~ 10

Author(s):

Alexander Koplenig

Keyword(s):

Time Series ◽

Maximum Likelihood ◽

Time Series Analysis ◽

Large Scale ◽

Corpus Analysis ◽

Full Spectrum ◽

Linguistic Change ◽

Linguistic Differences ◽

Diachronic Changes ◽

Zipf Law

AbstractUsing the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.

Download Full-text

How arbitrary is language?

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0299 ◽

2014 ◽

Vol 369 (1651) ◽

pp. 20130299 ◽

Cited By ~ 83

Author(s):

Padraic Monaghan ◽

Richard C. Shillcock ◽

Morten H. Christiansen ◽

Simon Kirby

Keyword(s):

Language Acquisition ◽

Language Development ◽

Language Learning ◽

Large Scale ◽

Vocabulary Development ◽

Corpus Analysis ◽

Early Language ◽

Early Stages ◽

Early Language Development ◽

The Relationship

It is a long established convention that the relationship between sounds and meanings of words is essentially arbitrary—typically the sound of a word gives no hint of its meaning. However, there are numerous reported instances of systematic sound–meaning mappings in language, and this systematicity has been claimed to be important for early language development. In a large-scale corpus analysis of English, we show that sound–meaning mappings are more systematic than would be expected by chance. Furthermore, this systematicity is more pronounced for words involved in the early stages of language acquisition and reduces in later vocabulary development. We propose that the vocabulary is structured to enable systematicity in early language learning to promote language acquisition, while also incorporating arbitrariness for later language in order to facilitate communicative expressivity and efficiency.

Download Full-text

Corpus Analysis with Antconc

The Programming Historian ◽

10.46430/phen0043 ◽

2015 ◽

Cited By ~ 1

Author(s):

Heather Froehlich

Keyword(s):

Text Analysis ◽

Large Scale ◽

Corpus Analysis ◽

Distant Reading

Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called 'distant reading').

Download Full-text

By the Numbers

Boom A Journal of California ◽

10.1525/boom.2014.4.1.46 ◽

2014 ◽

Vol 4 (1) ◽

pp. 46-53

Author(s):

David L. Ulin ◽

Joshua Comer

Keyword(s):

Los Angeles ◽

Longitudinal Studies ◽

San Francisco ◽

Large Scale ◽

Southern California ◽

Corpus Analysis ◽

Northern California ◽

Google Books

In this article the authors explore the use of “California,” its translations, and associated phrases in the nine languages collected in the Google Books corpora since 1525. The article graphs the use across time, analyzes the data, and considers some of the reasons behind the peaks and troughs of the usage of “California” and related phrases. For those new to computational corpus analysis, this article introduces the techniques and concepts of corpus analysis, explains the strengths and weaknesses of large-scale, longitudinal studies of language, and describes the specific methods applied in this analysis. Across all languages, the frequency of “California” increases steadily until the late 1990s. The article also examine the use of the notable but infrequently used phrase “California dream.” Visualizations of the analyses accompany the article, as well as additional graphs comparing the use of Los Angeles to San Francisco and Northern California to Southern California since 1800.

Download Full-text

What Do North American Babies Hear? A large‐scale cross‐corpus analysis

Developmental Science ◽

10.1111/desc.12724 ◽

2018 ◽

Vol 22 (1) ◽

pp. e12724 ◽

Cited By ~ 18

Author(s):

Elika Bergelson ◽

Marisa Casillas ◽

Melanie Soderstrom ◽

Amanda Seidl ◽

Anne S. Warlaumont ◽

...

Keyword(s):

North American ◽

Large Scale ◽

Corpus Analysis

Download Full-text