Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English
The Corpus of Historical American English (COHA) contains 400 million words in more than 100,000 texts which date from the 1810s to the 2000s. The corpus contains texts from fiction, popular magazines, newspapers and non-fiction books, and is balanced by genre from decade to decade. It has been carefully lemmatised and tagged for part-of-speech, and uses the same architecture as the Corpus of Contemporary American English (COCA), BYU-BNC, the TIME Corpus and other corpora. COHA allows for a wide range of research on changes in lexis, morphology, syntax, semantics, and American culture and society (as viewed through language change), in ways that are probably not possible with any text archive (e.g., Google Books) or any other corpus of historical American English.