word frequency distribution Latest Research Papers

From Boltzmann to Zipf through Shannon and Jaynes

Entropy ◽

10.3390/e22020179 ◽

2020 ◽

Vol 22 (2) ◽

pp. 179 ◽

Cited By ~ 1

Author(s):

Álvaro Corral ◽

Montserrat García del Muro

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Statistical Physics ◽

Maximum Entropy Principle ◽

Building Blocks ◽

Boltzmann Distribution ◽

Zipf’S Law ◽

Zipf's Law ◽

Entropy Principle ◽

Word Frequency Distribution

The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf’s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes’ maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek’s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf’s law, but with some limitations: the general Zipf’s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.

Download Full-text

Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval - NLPIR 2019 ◽

10.1145/3342827.3342856 ◽

2019 ◽

Author(s):

Hajime Tsubaki

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Native Speakers ◽

Word Frequency Distribution

Download Full-text

An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information

Proceedings of the 3rd International Conference on Intelligent Information Processing - ICIIP '18 ◽

10.1145/3232116.3232152 ◽

2018 ◽

Cited By ~ 1

Author(s):

Haoying Wu ◽

Na Yuan

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Word Frequency Distribution

Download Full-text

Bayesian analysis of Word frequency distribution in context of Indian literature

Journal of Ultra Scientist of Physical Sciences Section A ◽

10.22147/jusps-a/300503 ◽

2018 ◽

Vol 30 (05) ◽

pp. 283-290

Author(s):

VASTOSHPATI SHASTRI ◽

◽

RAKESH RANJAN ◽

PRAVEEN KUMAR TRIPATHI ◽

S.K UPADHYAY, ◽

...

Keyword(s):

Bayesian Analysis ◽

Word Frequency ◽

Frequency Distribution ◽

Indian Literature ◽

Word Frequency Distribution

Download Full-text

MINERAL SPECIES FREQUENCY DISTRIBUTION AND WORD FREQUENCY DISTRIBUTION COMPARISON

10.1130/abs/2018am-321482 ◽

2018 ◽

Author(s):

Myriam Paola Changoluisa Toapanta ◽

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Mineral Species ◽

Species Frequency ◽

Word Frequency Distribution

Download Full-text

Word Frequency Distribution of Literature Information: Zipf’s Law

Informetrics ◽

10.1007/978-981-10-4032-0_5 ◽

2017 ◽

pp. 121-143 ◽

Cited By ~ 1

Author(s):

Junping Qiu ◽

Rongying Zhao ◽

Siluo Yang ◽

Ke Dong

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Zipf’S Law ◽

Literature Information ◽

Zipf's Law ◽

Word Frequency Distribution

Download Full-text

Statistical Analysis of Word Frequency Distribution in Lithuanian Texts of Different Genres

Lietuvos statistikos darbai ◽

10.15388/ljs.2016.13868 ◽

2016 ◽

Vol 55 (1) ◽

pp. 61-69

Author(s):

Neringa Bružaitė ◽

Tomas Rekašius

Keyword(s):

Statistical Analysis ◽

Word Frequency ◽

Frequency Distribution ◽

Hierarchical Clustering ◽

Distance Measure ◽

Structural Type ◽

Clustering Method ◽

Jaccard Distance ◽

Word Frequencies ◽

Word Frequency Distribution

The paper examines Lithuanian texts of different authors and genres. The main points ofinterest – the number of words, the number of different words and word frequencies. Structural type distributionand Zipf’s law are applied for describing the frequency distribution of words in the text. It is obvious that thelexical diversity of any text can be defined by different words that are used in the text, also called vocabulary.It is shown that the information contained in a reduced vocabulary is enough for dividing the texts analyzedin this article into groups by genre and author using a hierarchical clustering method. In this case, distancesbetween clusters are measured using the Jaccard distance measure, and clusters are aggregated using the Wardmethod.

Download Full-text

The Small-World of ‘Le Petit Prince’: Revisiting the Word Frequency Distribution

Digital Scholarship in the Humanities ◽

10.1093/llc/fqw005 ◽

2016 ◽

pp. fqw005

Author(s):

Daniel Gamermann ◽

Carmen Moret-Tatay ◽

Esperanza Navarro-Pardo ◽

Pedro Fernandez de Córdoba Castellá

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Small World ◽

Word Frequency Distribution

Download Full-text

Quantitative analysis of Zamenhof’s Esenco kaj estonteco

Language Problems & Language Planning ◽

10.1075/lplp.35.1.04liu ◽

2011 ◽

Vol 35 (1) ◽

pp. 57-81 ◽

Cited By ~ 2

Author(s):

Haitao Liu

Keyword(s):

Length Distribution ◽

Small World ◽

Scale Free ◽

International Language ◽

Order Preference ◽

Normal Human ◽

The Mean ◽

Model Texts ◽

Word Frequency Distribution ◽

Syntactical Analysis

The formation of a language community in a planned language is one of the most important steps in its development. Esperanto is the only fully functional language with relative success among more than one thousand planned language projects. Model texts have played a very significant role in the development of Esperanto. Esenco kaj estonteco de la ideo de lingvo internacia (Essence and future of the international language idea) by the founder of Esperanto, L. L. Zamenhof, is an important document on Esperanto. Taking the essay as the research object, this paper carries through lexical and syntactical analysis on Esperanto by adopting the research methods of quantitative linguistics and complex networks. The results show that the morpheme distribution of Esperanto follows a power law, word frequency distribution fits Zipf’s Law, word length distribution is an exponential curve, and word class distribution obeys linear law. It is also clear that Esperanto is a language with SVO word order preference, the mean dependency distance of Esperanto is 3.85, and the distribution of the dependency distance tends to a minimum, making it typologically a head-middle language. A 43.6% dependency relation appears in adjacent words. The complex syntactic networks of Esperanto display the characteristics of small-world, scale-free networks. All of these quantitative characteristics of Esperanto demonstrate that it is structurally a normal human language.

Download Full-text

Word frequency distribution in Japanese text*

Journal of Quantitative Linguistics ◽

10.1080/09296179408590019 ◽

1994 ◽

Vol 1 (3) ◽

pp. 212-223 ◽

Cited By ~ 3

Author(s):

Koichi Ejiri ◽

Niklaus Staeheli ◽

Shiori Ooaku

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Japanese Text ◽

Word Frequency Distribution

Download Full-text

word frequency distribution
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

From Boltzmann to Zipf through Shannon and Jaynes

Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics

An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information

Bayesian analysis of Word frequency distribution in context of Indian literature

MINERAL SPECIES FREQUENCY DISTRIBUTION AND WORD FREQUENCY DISTRIBUTION COMPARISON

Word Frequency Distribution of Literature Information: Zipf’s Law

Statistical Analysis of Word Frequency Distribution in Lithuanian Texts of Different Genres

The Small-World of ‘Le Petit Prince’: Revisiting the Word Frequency Distribution

Quantitative analysis of Zamenhof’s Esenco kaj estonteco

Word frequency distribution in Japanese text*

Export Citation Format

word frequency distributionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

From Boltzmann to Zipf through Shannon and Jaynes

Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics

An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information

Bayesian analysis of Word frequency distribution in context of Indian literature

MINERAL SPECIES FREQUENCY DISTRIBUTION AND WORD FREQUENCY DISTRIBUTION COMPARISON

Word Frequency Distribution of Literature Information: Zipf’s Law

Statistical Analysis of Word Frequency Distribution in Lithuanian Texts of Different Genres

The Small-World of ‘Le Petit Prince’: Revisiting the Word Frequency Distribution

Quantitative analysis of Zamenhof’s Esenco kaj estonteco

Word frequency distribution in Japanese text*

word frequency distribution
Recently Published Documents