average information content
Recently Published Documents


TOTAL DOCUMENTS

13
(FIVE YEARS 2)

H-INDEX

1
(FIVE YEARS 0)

Author(s):  
Solomon Kozlov

In this article, we define the Set Shaping Theory whose goal is the study of the bijection functions that transform a set of strings into a set of equal size made up of strings of greater length. The functions that meet this condition are many but since the goal of this theory is the transmission of data, we have analyzed the function that minimizes the average information content. The results obtained show how this type of function can be useful in data compression.


2021 ◽  
Author(s):  
Stephan Meylan ◽  
Tom Griffiths

Language research has come to rely heavily on large-scale, web-based datasets. These datasets can present significant methodological challenges, requiring researchers to make a number of decisions about how they are collected, represented, and analyzed. These decisions often concern long-standing challenges in corpus-based language research, including determining what counts as a word, deciding which words should be analyzed, and matching sets of words across languages. We illustrate these challenges by revisiting "Word lengths are optimized for efficient communication" (Piantadosi, Tily, & Gibson,2011), which found that word lengths in 11 languages are more strongly correlated with their average predictability (or average information content) than their frequency. Using what we argue to be best practices for large-scale corpus analyses, we find significantly attenuated support for this result, and demonstrate that a stronger relationship obtains between word frequency and length for a majority of the languages in the sample. We consider the implications of the results for language research more broadly and provide several recommendations to researchers regarding best practices.


Author(s):  
Majid Zarie ◽  
Jafar Khalilpour ◽  
Farhad Sadeghi Almaloo

The purpose of this paper is to design and implement the electronic sector of satellite camera on the basis of systematic calculations as well as presenting the general and detailed block diagram. The main parts of the designed camera consist of four units named “optics, detector, processors, and memory” that can work in one of three modes: real-time, storage and storage-send. Verification of the simulation and practical results are shown completely by receiving images.  According to the conditions of the satellite imaging, in most cases, there is a need to improve the image quality. A very important feature in the satellite images is contrast enhancement. In the following, a powerful contrast enhancement algorithm is proposed based on histogram equalization method called the “entropy-based triple dynamic clipped histogram equalization” (ETDCHE). One of the strengths of this paper is using images with diverse variations in brightness. Such that by producing clear images and by preserving maximum details this overcomes the adverse lighting conditions and leads to natural enhancement of the output images. Also, the performance assessment of the proposed method in terms of the average information content reflects the considerable superiority of the proposed algorithm compared to the previously presented methods based on histogram equalization.


2020 ◽  
Vol 8 (5) ◽  
pp. 14-19
Author(s):  
Robert Mayer

The article discusses the problem of evaluating the differential didactic complexity (DDC) of educational texts, which characterizes the difficulty of their perception and assimilation by pupil. It is shown that DDC is determined by: 1) the density of semantic information, depending on the abstraction degree of the terms used and their presence in the pupil’s thesaurus; 2) the complexity level of mathematical, chemical and other formulas; 3) the structural complexity of the text, depending on the average length of its constituent words and sentences. Multiplying the DDC of the text by its volume, you can find the integral didactic complexity of the text. For the evaluation of the textbook DDC expert selects one page fragments of text randomly, identifies the key concepts, “measures” their average information content, determines the share of formulas and their average complexity. In this case, the classification of concepts according to the abstraction degree is used, which takes into account the occurrence of a particular word in the thesaurus of a preschool, fifth-grader, ninth-grader and school graduate. The structural complexity of the text is also taken into account, depending on the average length of words and sentences. The analysis of textbooks for school graduates has shown that the most difficult disciplines to understand are biology, physics, chemistry, mathematics. As a result of evaluating computer science textbooks for 3rd, 5th, 9th and 11th grades it was found that their semantic information density and differential semantic complexity monotonically increase from 5.3 to 8.1 and from 5.7 to 10.4 respectively.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i210-i218 ◽  
Author(s):  
Alex Warwick Vesztrocy ◽  
Christophe Dessimoz

Abstract Motivation With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. Results This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. Availability and Implementation All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Andrew B Wedel ◽  
Adam Ussishkin ◽  
Adam King

We report a statistical test of a long-standing hypothesis in the literature: that phonological neutralization rules are more common at the ends of lexical domains than the beginnings (Houlihan 1975 et seq.). We collected descriptive grammars for an areally and genetically diverse set of fifty languages, identified all active phonological rules that target the edge of a lexical domain (root, stem, word, phrase or utterance), and further coded each rule for whether it was phonemically neutralizing, that is, able to create surface homophony. We find that such neutralizing rules are strongly, significantly less common at the beginning of lexical domains relative to ends, and that this pattern is strikingly consistent across all languages within the dataset. We show that this pattern is not an artifact of a tendency for syllable codas to be a target for phonological neutralization, nor is associated with a suffixing or prefixing preference. Consistent with previous accounts, we argue that this pattern may be ultimately based in the greater average information content of phonological categories early in the word, which itself is a consequence of incremental processing in lexical access.


2019 ◽  
Vol 20 (2) ◽  
pp. 54-66
Author(s):  
Michael Richter ◽  
Giuseppe G. A. Celano

Abstract The topic of this paper is the interaction of aspectual verb coding, information content and lengths of verbs, as generally stated in Shannon’s source coding theorem on the interaction between the coding and length of a message. We hypothesize that, based on this interaction, lengths of aspectual verb forms can be predicted from both their aspectual coding and their information. The point of departure is the assumption that each verb has a default aspectual value and that this value can be estimated based on frequency – which has, according to Zipf’s law, a negative correlation with length. Employing a linear mixed-effects model fitted with a random effect for LEMMA, effects of the predictors’ DEFAULT – i.e. the default aspect value of verbs, the Zipfian predictor FREQUENCY and the entropy-based predictor AVERAGE INFORMATION CONTENT – are compared with average aspectual verb form lengths. Data resources are 18 UD treebanks. Significantly differing impacts of the predictors on verb lengths across our test set of languages have come to light and, in addition, the hypothesis of coding asymmetry does not turn out to be true for all languages in focus.


Sign in / Sign up

Export Citation Format

Share Document