Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English

Mark Davies

doi:10.3366/cor.2012.0024

Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English

Corpora ◽

10.3366/cor.2012.0024 ◽

2012 ◽

Vol 7 (2) ◽

pp. 121-157 ◽

Cited By ~ 38

Author(s):

Mark Davies

Keyword(s):

Language Change ◽

American Culture ◽

Historical Linguistics ◽

American English ◽

Part Of Speech ◽

Non Fiction ◽

Wide Range ◽

Google Books ◽

Popular Magazines

The Corpus of Historical American English (COHA) contains 400 million words in more than 100,000 texts which date from the 1810s to the 2000s. The corpus contains texts from fiction, popular magazines, newspapers and non-fiction books, and is balanced by genre from decade to decade. It has been carefully lemmatised and tagged for part-of-speech, and uses the same architecture as the Corpus of Contemporary American English (COCA), BYU-BNC, the TIME Corpus and other corpora. COHA allows for a wide range of research on changes in lexis, morphology, syntax, semantics, and American culture and society (as viewed through language change), in ways that are probably not possible with any text archive (e.g., Google Books) or any other corpus of historical American English.

Download Full-text

Making Google Books n-grams useful for a wide range of research on language change

International Journal of Corpus Linguistics ◽

10.1075/ijcl.19.3.04dav ◽

2014 ◽

Vol 19 (3) ◽

pp. 401-416 ◽

Cited By ~ 13

Author(s):

Mark Davies

Keyword(s):

Language Change ◽

American English ◽

Wide Range ◽

Google Books ◽

Standard Interface

The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper, I discuss an alternative “advanced” architecture and interface for these datasets, which is freely available at googlebooks.byu.edu. This resource allows for a wide range of research on lexical, phraseological, syntactic, and semantic changes in English, in ways that would not be possible with the standard interface. With this new resource, researchers now have access to hundreds of billions of words of data, and can map out changes in English in ways that were not previously possible.

Download Full-text

The 385+ million word Corpus of Contemporary American English (1990–2008+)

International Journal of Corpus Linguistics ◽

10.1075/ijcl.14.2.02dav ◽

2009 ◽

Vol 14 (2) ◽

pp. 159-190 ◽

Cited By ~ 169

Author(s):

Mark Davies

Keyword(s):

Relational Databases ◽

American English ◽

Linguistic Variation ◽

Academic Journals ◽

Phrasal Verbs ◽

Wide Range ◽

Extended Analysis ◽

Popular Magazines

The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year), balanced between spoken, fiction, popular magazines, newspapers, and academic journals. We also discuss the unique relational databases architecture, which allows for a wide range of queries that are not available (or are quite difficult) with other architectures and interfaces. To conclude, we consider insights from the corpus on a number of cases of genre-based variation and recent linguistic variation, including an extended analysis of phrasal verbs in contemporary American English.

Download Full-text

Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English

Corpora ◽

10.3366/corp.2012.0024 ◽

2012 ◽

Vol 7 (2) ◽

Keyword(s):

Historical Linguistics ◽

American English

Download Full-text

An historical analysis of species references in American English

Corpora ◽

10.3366/cor.2019.0177 ◽

2019 ◽

Vol 14 (3) ◽

pp. 327-349

Author(s):

Craig Frayne

Keyword(s):

Environmental Change ◽

Sentiment Analysis ◽

Quantitative Methods ◽

English Language ◽

Language Use ◽

American English ◽

Historical Analysis ◽

The Past ◽

Corpus Studies ◽

Google Books

This study uses the two largest available American English language corpora, Google Books and the Corpus of Historical American English (coha), to investigate relations between ecology and language. The paper introduces ecolinguistics as a promising theme for corpus research. While some previous ecolinguistic research has used corpus approaches, there is a case to be made for quantitative methods that draw on larger datasets. Building on other corpus studies that have made connections between language use and environmental change, this paper investigates whether linguistic references to other species have changed in the past two centuries and, if so, how. The methodology consists of two main parts: an examination of the frequency of common names of species followed by aspect-level sentiment analysis of concordance lines. Results point to both opportunities and challenges associated with applying corpus methods to ecolinguistc research.

Download Full-text

Scales and Units

10.1093/oso/9780190657543.003.0011 ◽

2018 ◽

Author(s):

Kathryn M. de Luna

Keyword(s):

Language Change ◽

Historical Linguistics ◽

Bantu Languages ◽

The North ◽

First Case ◽

Linguistic Evidence ◽

River Region ◽

History Of ◽

Alternative Approaches

This chapter uses two case studies to explore how historians study language movement and change through comparative historical linguistics. The first case study stands as a short chapter in the larger history of the expansion of Bantu languages across eastern, central, and southern Africa. It focuses on the expansion of proto-Kafue, ca. 950–1250, from a linguistic homeland in the middle Kafue River region to lands beyond the Lukanga swamps to the north and the Zambezi River to the south. This expansion was made possible by a dramatic reconfiguration of ties of kinship. The second case study explores linguistic evidence for ridicule along the Lozi-Botatwe frontier in the mid- to late 19th century. Significantly, the units and scales of language movement and change in precolonial periods rendered visible through comparative historical linguistics bring to our attention alternative approaches to language change and movement in contemporary Africa.

Download Full-text

Language Change and Movement as Seen by Historical Linguistics

10.1093/oso/9780190657543.003.0002 ◽

2018 ◽

Author(s):

Derek Nurse

Keyword(s):

West Africa ◽

20Th Century ◽

Language Change ◽

Historical Linguistics ◽

Linguistic Features ◽

Time And Space ◽

Change Over Time ◽

Comparative Linguistics ◽

Language Boundaries ◽

Over Time

The focus of this chapter is on how languages move and change over time and space. The perceptions of historical linguists have been shaped by what they were observing. During the flowering of comparative linguistics, from the late 19th into the 20th century, the dominant view was that in earlier times when people moved, their languages moved with them, often over long distances, sometimes fast, and that language change was largely internal. That changed in the second half of the 20th century. We now recognize that in recent centuries and millennia, most movements of communities and individuals have been local and shorter. Constant contact between communities resulted in features flowing across language boundaries, especially in crowded and long-settled locations such as most of Central and West Africa. Although communities did mix and people did cross borders, it became clear that language and linguistic features could also move without communities moving.

Download Full-text

Modernism and American Ballet

10.1093/oso/9780199342235.003.0002 ◽

2017 ◽

Author(s):

Andrea Harris

Keyword(s):

Mass Culture ◽

American Culture ◽

Modern Dance ◽

The Political ◽

Cultural Conditions ◽

The Arts ◽

Wide Range ◽

Political Economic

This chapter explores the international and interdisciplinary backdrop of Lincoln Kirstein’s efforts to form an American ballet in the early 1930s. The political, economic, and cultural conditions of the Depression reinvigorated the search for an “American” culture. In this context, new openings for a modernist theory of ballet were created as intellectuals and artists from a wide range of disciplines endeavored to define the role of the arts in protecting against the dangerous effects of mass culture. Chapter 1 sheds new light on well-known critical debates in dance history between Kirstein and John Martin over whether ballet, with its European roots, could truly become “American” in contrast to modern dance. Was American dance going to be conceived in nationalist or transnationalist terms? That was the deeper conflict that underlay the ballet vs. modern dance debates of the early 1930s.

Download Full-text

Millennia of Language Change: Sociolinguistic Studies in Deep Historical Linguistics. PeterTrudgill. Cambridge: Cambridge University Press. 2020. 170 pp. Hb (9781108477390) £59.99, Pb (9781108708647) £19.99, Ebk (9781108852166) £14.50

Journal of Sociolinguistics ◽

10.1111/josl.12463 ◽

2021 ◽

Author(s):

Eline Lismont

Keyword(s):

Language Change ◽

Historical Linguistics ◽

Cambridge University

Download Full-text

Using the Eighteenth-Century English Phonology Database (ECEP) as a teaching resource

English Language and Linguistics ◽

10.1017/s1360674320000143 ◽

2020 ◽

Vol 24 (3) ◽

pp. 591-606

Author(s):

CHRISTINE WALLIS

Keyword(s):

Eighteenth Century ◽

Language Change ◽

Historical Linguistics ◽

Language Attitudes ◽

Source Material ◽

Historical Sociolinguistics ◽

Teaching Resource ◽

The University ◽

Late Modern

This article reports on the use of the Eighteenth-Century English Phonology Database (ECEP) as a teaching resource in historical sociolinguistics and historical linguistics courses at the University of Sheffield. Pronouncing dictionaries are an invaluable resource for students learning about processes of standardisation and language attitudes during the Late Modern English period (1700–1900), however they are not easy to use in their original format. Each author uses their own notation system to indicate their recommended pronunciation, while the terminology used to describe the quality of the vowels and consonants differs from that used today, and provides an additional obstacle to the student wishing to interrogate such sources. ECEP thus provides a valuable intermediary between the students and the source material, as it includes IPA equivalents for the recommended pronunciations, as well as any metalinguistic commentary offered by the authors about a particular pronunciation. This article demonstrates a teaching approach that not only uses ECEP as a tool in its own right, but also explores how it can be usefully combined with other materials covering language change in the Late Modern English period to enable students to undertake their own investigations in research-led courses.

Download Full-text

Personality traits, adjectives and gender

Journal of Language and Discrimination ◽

10.1558/jld.40370 ◽

2020 ◽

Vol 4 (1) ◽

pp. 16-50

Author(s):

Heiko Motschenbacher ◽

Eka Roivainen

Keyword(s):

Personality Traits ◽

Personality Trait ◽

Corpus Linguistics ◽

American English ◽

Psychological Analysis ◽

Usage Patterns ◽

Google Books ◽

And Gender ◽

Interdisciplinary Study ◽

The Relationship

There have been linguistic studies on the gendering mechanisms of adjectives and psychological studies on the relationship between personality traits and gender, but the two fields have never entered into a dialogue on these issues. This article seeks to address this gap by presenting an interdisciplinary study that explores the gendering mechanisms associated with personality traits and personality trait-denoting adjectives. The findings of earlier work in this area and basic gendering mechanisms relevant to adjectives and personality traits are outlined. This is followed by a linguistic and a psychological analysis of the usage patterns of a set of personality trait adjectives. The linguistic section draws on corpus linguistics to explore the distribution of these adjectives with female, male and gender-neutral personal nouns in the Corpus of Contemporary American English. The psychological analysis relates the usage frequencies of personality trait adjectives with the nouns man, woman and person in the Google Books corpus to desirability ratings of the adjectives.

Download Full-text