On the Weak form of Zipf's law

1980 ◽  
Vol 17 (3) ◽  
pp. 611-622 ◽  
Author(s):  
Wen-Chen Chen

Zipf's laws are probability distributions on the positive integers which decay algebraically. Such laws have been shown empirically to describe a large class of phenomena, including frequency of words usage, populations of cities, distributions of personal incomes, and distributions of biological genera and species, to mention only a few. In this paper we present a Dirichlet–multinomial urn model for describing the above phenomena from a stochastic point of view.We derive the Zipf's law under certain regularity conditions; some limit theorems are also obtained for the urn model under consideration.

1980 ◽  
Vol 17 (03) ◽  
pp. 611-622 ◽  
Author(s):  
Wen-Chen Chen

Zipf's laws are probability distributions on the positive integers which decay algebraically. Such laws have been shown empirically to describe a large class of phenomena, including frequency of words usage, populations of cities, distributions of personal incomes, and distributions of biological genera and species, to mention only a few. In this paper we present a Dirichlet–multinomial urn model for describing the above phenomena from a stochastic point of view. We derive the Zipf's law under certain regularity conditions; some limit theorems are also obtained for the urn model under consideration.


1975 ◽  
Vol 12 (3) ◽  
pp. 425-434 ◽  
Author(s):  
Michael Woodroofe ◽  
Bruce Hill

A Zipf's law is a probability distribution on the positive integers which decays algebraically. Such laws describe (approximately) a large class of phenomena. We formulate a model for such phenomena and, in terms of our model, give necessary and sufficient conditions for a Zipf's law to hold.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 334 ◽  
Author(s):  
Steven A. Frank

In a language corpus, the probability that a word occurs n times is often proportional to 1/n2. Assigning rank, s, to words according to their abundance, log s vs log n typically has a slope of minus one. That simple Zipf's law pattern also arises in the population sizes of cities, the sizes of corporations, and other patterns of abundance. By contrast, for the abundances of different biological species, the probability of a population of size n is typically proportional to 1/n, declining exponentially for larger n, the log series pattern. This article shows that the differing patterns of Zipf's law and the log series arise as the opposing endpoints of a more general theory. The general theory follows from the generic form of all probability patterns as a consequence of conserved average values and the associated invariances of scale. To understand the common patterns of abundance, the generic form of probability distributions plus the conserved average abundance is sufficient. The general theory includes cases that are between the Zipf and log series endpoints, providing a broad framework for analyzing widely observed abundance patterns.


1975 ◽  
Vol 12 (03) ◽  
pp. 425-434 ◽  
Author(s):  
Michael Woodroofe ◽  
Bruce Hill

A Zipf's law is a probability distribution on the positive integers which decays algebraically. Such laws describe (approximately) a large class of phenomena. We formulate a model for such phenomena and, in terms of our model, give necessary and sufficient conditions for a Zipf's law to hold.


Glottotheory ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 113-129
Author(s):  
Victor Davis

Abstract Heap’s Law https://dl.acm.org/citation.cfm?id=539986 Heaps, H S 1978 Information Retrieval: Computational and Theoretical Aspects (Academic Press). states that in a large enough text corpus, the number of types as a function of tokens grows as N = K{M^\beta } for some free parameters K, \beta . Much has been written http://iopscience.iop.org/article/10.1088/1367-2630/15/9/093033 Font-Clos, Francesc 2013 A scaling law beyond Zipf’s law and its relation to Heaps’ law (New Journal of Physics 15 093033)., http://iopscience.iop.org/article/10.1088/1367-2630/11/12/123015 Bernhardsson S, da Rocha L E C and Minnhagen P 2009 The meta book and size-dependent properties of written language (New Journal of Physics 11 123015)., http://iopscience.iop.org/article/10.1088/1742-5468/2011/07/P07013 Bernhardsson S, Ki Baek and Minnhagen 2011 A paradoxical property of the monkey book (Journal of Statistical Mechanics: Theory and Experiment, Volume 2011)., http://milicka.cz/kestazeni/type-token_relation.pdf Milička, Jiří 2009 Type-token & Hapax-token Relation: A Combinatorial Model (Glottotheory. International Journal of Theoretical Linguistics 2 (1), 99–110)., https://www.nature.com/articles/srep00943 Petersen, Alexander 2012 Languages cool as they expand: Allometric scaling and the decreasing need for new words (Scientific Reports volume 2, Article number: 943). about how this result and various generalizations can be derived from Zipf’s Law. http://dx.doi.org/10.1037/h0052442 Zipf, George 1949 Human behavior and the principle of least effort (Reading: Addison-Wesley). Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher n-legomena.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Giordano De Marzo ◽  
Andrea Gabrielli ◽  
Andrea Zaccaria ◽  
Luciano Pietronero

2021 ◽  
Vol 7 (s3) ◽  
Author(s):  
Matthew Stave ◽  
Ludger Paschen ◽  
François Pellegrino ◽  
Frank Seifart

Abstract Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.


1987 ◽  
Vol 23 (3) ◽  
pp. 171-182 ◽  
Author(s):  
Ye-Sho Chen ◽  
Ferdinand F. Leimkuhler

Sign in / Sign up

Export Citation Format

Share Document