average information content Latest Research Papers

Introduction to Set Shaping Theory

10.14293/s2199-1006.1.sor-.pppwxay.v1 ◽

2021 ◽

Author(s):

Solomon Kozlov

Keyword(s):

Data Compression ◽

Information Content ◽

Equal Size ◽

Average Information Content ◽

Average Information

In this article, we define the Set Shaping Theory whose goal is the study of the bijection functions that transform a set of strings into a set of equal size made up of strings of greater length. The functions that meet this condition are many but since the goal of this theory is the transmission of data, we have analyzed the function that minimizes the average information content. The results obtained show how this type of function can be useful in data compression.

Download Full-text

The Challenges of Large-Scale, Web-based Language Datasets: Word Length and Predictability Revisited

10.31234/osf.io/6832r ◽

2021 ◽

Author(s):

Stephan Meylan ◽

Tom Griffiths

Keyword(s):

Best Practices ◽

Word Frequency ◽

Word Length ◽

Large Scale ◽

Strongly Correlated ◽

Web Based ◽

Language Research ◽

Efficient Communication ◽

Average Information Content ◽

Average Information

Language research has come to rely heavily on large-scale, web-based datasets. These datasets can present significant methodological challenges, requiring researchers to make a number of decisions about how they are collected, represented, and analyzed. These decisions often concern long-standing challenges in corpus-based language research, including determining what counts as a word, deciding which words should be analyzed, and matching sets of words across languages. We illustrate these challenges by revisiting "Word lengths are optimized for efficient communication" (Piantadosi, Tily, & Gibson,2011), which found that word lengths in 11 languages are more strongly correlated with their average predictability (or average information content) than their frequency. Using what we argue to be best practices for large-scale corpus analyses, we find significantly attenuated support for this result, and demonstrate that a stronger relationship obtains between word frequency and length for a majority of the languages in the sample. We consider the implications of the results for language research more broadly and provide several recommendations to researchers regarding best practices.

Download Full-text

Implementation of the electronic sector of the satellite camera and image contrast enhancement

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i2.pp690-703 ◽

2020 ◽

Vol 20 (2) ◽

pp. 690

Author(s):

Majid Zarie ◽

Jafar Khalilpour ◽

Farhad Sadeghi Almaloo

Keyword(s):

Contrast Enhancement ◽

Block Diagram ◽

Histogram Equalization ◽

Satellite Imaging ◽

Lighting Conditions ◽

Image Contrast Enhancement ◽

Average Information Content ◽

And Storage ◽

Average Information ◽

Enhancement Algorithm

The purpose of this paper is to design and implement the electronic sector of satellite camera on the basis of systematic calculations as well as presenting the general and detailed block diagram. The main parts of the designed camera consist of four units named “optics, detector, processors, and memory” that can work in one of three modes: real-time, storage and storage-send. Verification of the simulation and practical results are shown completely by receiving images. According to the conditions of the satellite imaging, in most cases, there is a need to improve the image quality. A very important feature in the satellite images is contrast enhancement. In the following, a powerful contrast enhancement algorithm is proposed based on histogram equalization method called the “entropy-based triple dynamic clipped histogram equalization” (ETDCHE). One of the strengths of this paper is using images with diverse variations in brightness. Such that by producing clear images and by preserving maximum details this overcomes the adverse lighting conditions and leads to natural enhancement of the output images. Also, the performance assessment of the proposed method in terms of the average information content reflects the considerable superiority of the proposed algorithm compared to the previously presented methods based on histogram equalization.

Download Full-text

Estimation Method of the Didactic Complexity of the School Textbooks on Various Disciplines

Standards and Monitoring in Education ◽

10.12737/1998-1740-2020-14-19 ◽

2020 ◽

Vol 8 (5) ◽

pp. 14-19

Author(s):

Robert Mayer

Keyword(s):

Semantic Information ◽

Average Length ◽

Estimation Method ◽

Structural Complexity ◽

Average Complexity ◽

Ninth Grader ◽

Semantic Complexity ◽

Average Information Content ◽

Average Information

The article discusses the problem of evaluating the differential didactic complexity (DDC) of educational texts, which characterizes the difficulty of their perception and assimilation by pupil. It is shown that DDC is determined by: 1) the density of semantic information, depending on the abstraction degree of the terms used and their presence in the pupil’s thesaurus; 2) the complexity level of mathematical, chemical and other formulas; 3) the structural complexity of the text, depending on the average length of its constituent words and sentences. Multiplying the DDC of the text by its volume, you can find the integral didactic complexity of the text. For the evaluation of the textbook DDC expert selects one page fragments of text randomly, identifies the key concepts, “measures” their average information content, determines the share of formulas and their average complexity. In this case, the classification of concepts according to the abstraction degree is used, which takes into account the occurrence of a particular word in the thesaurus of a preschool, fifth-grader, ninth-grader and school graduate. The structural complexity of the text is also taken into account, depending on the average length of words and sentences. The analysis of textbooks for school graduates has shown that the most difficult disciplines to understand are biology, physics, chemistry, mathematics. As a result of evaluating computer science textbooks for 3rd, 5th, 9th and 11th grades it was found that their semantic information density and differential semantic complexity monotonically increase from 5.3 to 8.1 and from 5.7 to 10.4 respectively.

Download Full-text

Benchmarking gene ontology function predictions using negative annotations

Bioinformatics ◽

10.1093/bioinformatics/btaa466 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i210-i218 ◽

Cited By ~ 1

Author(s):

Alex Warwick Vesztrocy ◽

Christophe Dessimoz

Keyword(s):

Protein Function ◽

Phylogenetic Trees ◽

Supplementary Information ◽

Function Annotation ◽

Open World ◽

Relative Paucity ◽

Average Information Content ◽

Definition Of ◽

Open World Assumption ◽

Average Information

Abstract Motivation With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. Results This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. Availability and Implementation All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Crosslinguistic evidence for a strong statistical universal: Phonological neutralization targets word-ends over beginnings

10.31234/osf.io/gvnak ◽

2019 ◽

Author(s):

Andrew B Wedel ◽

Adam Ussishkin ◽

Adam King

Keyword(s):

Information Content ◽

Lexical Access ◽

Statistical Test ◽

Incremental Processing ◽

Phonological Categories ◽

Average Information Content ◽

Average Information ◽

Phonological Rules

We report a statistical test of a long-standing hypothesis in the literature: that phonological neutralization rules are more common at the ends of lexical domains than the beginnings (Houlihan 1975 et seq.). We collected descriptive grammars for an areally and genetically diverse set of fifty languages, identified all active phonological rules that target the edge of a lexical domain (root, stem, word, phrase or utterance), and further coded each rule for whether it was phonemically neutralizing, that is, able to create surface homophony. We find that such neutralizing rules are strongly, significantly less common at the beginning of lexical domains relative to ends, and that this pattern is strikingly consistent across all languages within the dataset. We show that this pattern is not an artifact of a tendency for syllable codas to be a target for phonological neutralization, nor is associated with a suffixing or prefixing preference. Consistent with previous accounts, we argue that this pattern may be ultimately based in the greater average information content of phonological categories early in the word, which itself is a consequence of incremental processing in lexical access.

Download Full-text

Aspectual coding asymmetries: Predicting aspectual verb lengths by the effects frequency and information content

Topics in Linguistics ◽

10.2478/topling-2019-0009 ◽

2019 ◽

Vol 20 (2) ◽

pp. 54-66

Author(s):

Michael Richter ◽

Giuseppe G. A. Celano

Keyword(s):

Information Content ◽

Source Coding ◽

Random Effect ◽

Mixed Effects ◽

Linear Mixed Effects Model ◽

Linear Mixed Effects ◽

Verb Forms ◽

Average Information Content ◽

Point Of Departure ◽

Average Information

Abstract The topic of this paper is the interaction of aspectual verb coding, information content and lengths of verbs, as generally stated in Shannon’s source coding theorem on the interaction between the coding and length of a message. We hypothesize that, based on this interaction, lengths of aspectual verb forms can be predicted from both their aspectual coding and their information. The point of departure is the assumption that each verb has a default aspectual value and that this value can be estimated based on frequency – which has, according to Zipf’s law, a negative correlation with length. Employing a linear mixed-effects model fitted with a random effect for LEMMA, effects of the predictors’ DEFAULT – i.e. the default aspect value of verbs, the Zipfian predictor FREQUENCY and the entropy-based predictor AVERAGE INFORMATION CONTENT – are compared with average aspectual verb form lengths. Data resources are 18 UD treebanks. Significantly differing impacts of the predictors on verb lengths across our test set of languages have come to light and, in addition, the hypothesis of coding asymmetry does not turn out to be true for all languages in focus.

Download Full-text

Estimation of Average Information Content: Comparison of Impact of Contexts

Advances in Intelligent Systems and Computing - Intelligent Systems and Applications ◽

10.1007/978-3-030-29513-4_91 ◽

2019 ◽

pp. 1251-1257

Author(s):

Michael Richter ◽

Yuki Kyogoku ◽

Max Kölbl

Keyword(s):

Information Content ◽

Average Information Content ◽

Average Information

Download Full-text

Practical application of the Average Information Content Maximization (AIC-MAX) algorithm: selection of the most important structural features for serotonin receptor ligands

Molecular Diversity ◽

10.1007/s11030-017-9729-8 ◽

2017 ◽

Vol 21 (2) ◽

pp. 407-412 ◽

Cited By ~ 1

Author(s):

Dawid Warszycki ◽

Marek Śmieja ◽

Rafał Kafel

Keyword(s):

Information Content ◽

Serotonin Receptor ◽

Structural Features ◽

Practical Application ◽

Receptor Ligands ◽

Algorithm Selection ◽

Average Information Content ◽

Average Information ◽

Selection Of

Download Full-text

Average Information Content Maximization—A New Approach for Fingerprint Hybridization and Reduction

PLoS ONE ◽

10.1371/journal.pone.0146666 ◽

2016 ◽

Vol 11 (1) ◽

pp. e0146666 ◽

Cited By ~ 10

Author(s):

Marek Śmieja ◽

Dawid Warszycki

Keyword(s):

Information Content ◽

New Approach ◽

Average Information Content ◽

Average Information

Download Full-text

average information content
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Introduction to Set Shaping Theory

The Challenges of Large-Scale, Web-based Language Datasets: Word Length and Predictability Revisited

Implementation of the electronic sector of the satellite camera and image contrast enhancement

Estimation Method of the Didactic Complexity of the School Textbooks on Various Disciplines

Benchmarking gene ontology function predictions using negative annotations

Crosslinguistic evidence for a strong statistical universal: Phonological neutralization targets word-ends over beginnings

Aspectual coding asymmetries: Predicting aspectual verb lengths by the effects frequency and information content

Estimation of Average Information Content: Comparison of Impact of Contexts

Practical application of the Average Information Content Maximization (AIC-MAX) algorithm: selection of the most important structural features for serotonin receptor ligands

Average Information Content Maximization—A New Approach for Fingerprint Hybridization and Reduction

Export Citation Format

average information contentRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Introduction to Set Shaping Theory

The Challenges of Large-Scale, Web-based Language Datasets: Word Length and Predictability Revisited

Implementation of the electronic sector of the satellite camera and image contrast enhancement

Estimation Method of the Didactic Complexity of the School Textbooks on Various Disciplines

Benchmarking gene ontology function predictions using negative annotations

Crosslinguistic evidence for a strong statistical universal: Phonological neutralization targets word-ends over beginnings

Aspectual coding asymmetries: Predicting aspectual verb lengths by the effects frequency and information content

Estimation of Average Information Content: Comparison of Impact of Contexts

Practical application of the Average Information Content Maximization (AIC-MAX) algorithm: selection of the most important structural features for serotonin receptor ligands

Average Information Content Maximization—A New Approach for Fingerprint Hybridization and Reduction

average information content
Recently Published Documents