Quantifying phonetic variation: A large-scale corpus analysis of coronal segments in English infant-directed speech

2021 ◽  
Vol 150 (4) ◽  
pp. A150-A151
Author(s):  
Ekaterina A. Khlystova ◽  
Adam J. Chong ◽  
Megha Sundara
Open Mind ◽  
2019 ◽  
Vol 3 ◽  
pp. 52-67 ◽  
Author(s):  
Mika Braginsky ◽  
Daniel Yurovsky ◽  
Virginia A. Marchman ◽  
Michael C. Frank

Why do children learn some words earlier than others? The order in which words are acquired can provide clues about the mechanisms of word learning. In a large-scale corpus analysis, we use parent-report data from over 32,000 children to estimate the acquisition trajectories of around 400 words in each of 10 languages, predicting them on the basis of independently derived properties of the words’ linguistic environment (from corpora) and meaning (from adult judgments). We examine the consistency and variability of these predictors across languages, by lexical category, and over development. The patterning of predictors across languages is quite similar, suggesting similar processes in operation. In contrast, the patterning of predictors across different lexical categories is distinct, in line with theories that posit different factors at play in the acquisition of content words and function words. By leveraging data at a significantly larger scale than previous work, our analyses identify candidate generalizations about the processes underlying word learning across languages.


2016 ◽  
Vol 19 (2) ◽  
pp. 212-245 ◽  
Author(s):  
Roland Schäfer ◽  
Ulrike Sayatz

In this paper, we analyze written sentences containing the German particles obwohl (“although”) and weil (“because”). In standard written German, these particles embed clauses in verb-last constituent order, which is characteristic of subordinated clauses. In spoken and – as we show – nonstandard written German, they embed clauses in verb-second constituent order, which is characteristic of independent sentences. Our usage-based approach to the syntax – graphemics interface includes a large-scale corpus analysis of the patterns of punctuation in the nonstandard variants that provides clues to the syntactic structure and degree of sentential independence of the nonstandard variants. Our corpus study confirms and refines hypotheses from existing theoretical approaches by clearly showing that writers mark obwohl clauses with verb-second order systematically as independent sentences, whereas weil clauses with verb-second order are much less strongly marked as independent. This work suggests that similar corpus studies could provide deeper insight into the interplay between syntax and graphemics.


2018 ◽  
Vol 14 (1) ◽  
pp. 1-34 ◽  
Author(s):  
Alexander Koplenig

AbstractUsing the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.


2014 ◽  
Vol 369 (1651) ◽  
pp. 20130299 ◽  
Author(s):  
Padraic Monaghan ◽  
Richard C. Shillcock ◽  
Morten H. Christiansen ◽  
Simon Kirby

It is a long established convention that the relationship between sounds and meanings of words is essentially arbitrary—typically the sound of a word gives no hint of its meaning. However, there are numerous reported instances of systematic sound–meaning mappings in language, and this systematicity has been claimed to be important for early language development. In a large-scale corpus analysis of English, we show that sound–meaning mappings are more systematic than would be expected by chance. Furthermore, this systematicity is more pronounced for words involved in the early stages of language acquisition and reduces in later vocabulary development. We propose that the vocabulary is structured to enable systematicity in early language learning to promote language acquisition, while also incorporating arbitrariness for later language in order to facilitate communicative expressivity and efficiency.


Author(s):  
Heather Froehlich

Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called 'distant reading').


2014 ◽  
Vol 4 (1) ◽  
pp. 46-53
Author(s):  
David L. Ulin ◽  
Joshua Comer

In this article the authors explore the use of “California,” its translations, and associated phrases in the nine languages collected in the Google Books corpora since 1525. The article graphs the use across time, analyzes the data, and considers some of the reasons behind the peaks and troughs of the usage of “California” and related phrases. For those new to computational corpus analysis, this article introduces the techniques and concepts of corpus analysis, explains the strengths and weaknesses of large-scale, longitudinal studies of language, and describes the specific methods applied in this analysis. Across all languages, the frequency of “California” increases steadily until the late 1990s. The article also examine the use of the notable but infrequently used phrase “California dream.” Visualizations of the analyses accompany the article, as well as additional graphs comparing the use of Los Angeles to San Francisco and Northern California to Southern California since 1800.


2018 ◽  
Vol 22 (1) ◽  
pp. e12724 ◽  
Author(s):  
Elika Bergelson ◽  
Marisa Casillas ◽  
Melanie Soderstrom ◽  
Amanda Seidl ◽  
Anne S. Warlaumont ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document