Probability Mass Exclusions and the Directed Components of Mutual Information

Conor Finn; Joseph Lizier

doi:10.3390/e20110826

Probability Mass Exclusions and the Directed Components of Mutual Information

Entropy ◽

10.3390/e20110826 ◽

2018 ◽

Vol 20 (11) ◽

pp. 826 ◽

Cited By ~ 2

Author(s):

Conor Finn ◽

Joseph Lizier

Keyword(s):

Information Theory ◽

Mutual Information ◽

Complex Systems ◽

Formal Treatment ◽

Directed Information ◽

Probability Mass ◽

Foundational Work ◽

Key Barrier ◽

Reduction Of Uncertainty ◽

Insight Into

Information is often described as a reduction of uncertainty associated with a restriction of possible choices. Despite appearing in Hartley’s foundational work on information theory, there is a surprising lack of a formal treatment of this interpretation in terms of exclusions. This paper addresses the gap by providing an explicit characterisation of information in terms of probability mass exclusions. It then demonstrates that different exclusions can yield the same amount of information and discusses the insight this provides about how information is shared amongst random variables—lack of progress in this area is a key barrier preventing us from understanding how information is distributed in complex systems. The paper closes by deriving a decomposition of the mutual information which can distinguish between differing exclusions; this provides surprising insight into the nature of directed information.

Download Full-text

A Path-Based Partial Information Decomposition

Entropy ◽

10.3390/e22090952 ◽

2020 ◽

Vol 22 (9) ◽

pp. 952

Author(s):

David Sigtermans

Keyword(s):

Information Theory ◽

Mutual Information ◽

Graphical Model ◽

Partial Information ◽

Communication Channels ◽

Negative Information ◽

Information Measure ◽

Probability Mass ◽

Information Components ◽

Mass Functions

Based on the conceptual basis of information theory, we propose a novel mutual information measure—‘path-based mutual information’. This information measure results from the representation of a set of random variables as a probabilistic graphical model. The edges in this graph are modeled as discrete memoryless communication channels, that is, the underlying data is ergodic, stationary, and the Markov condition is assumed to be applicable. The associated multilinear stochastic maps, tensors, transform source probability mass functions into destination probability mass functions. This allows for an exact expression of the resulting tensor of a cascade of discrete memoryless communication channels in terms of the tensors of the constituting communication channels in the paths. The resulting path-based information measure gives rise to intuitive, non-negative, and additive path-based information components—redundant, unique, and synergistic information—as proposed by Williams and Beer. The path-based redundancy satisfies the axioms postulated by Williams and Beer, the identity axiom postulated by Harder, and the left monotonicity axiom postulated Bertschinger. The ordering relations between redundancies of different joint collections of sources, as captured in the redundancy lattices of Williams and Beer, follow from the data processing inequality. Although negative information components can arise, we speculate that these either result from unobserved variables, or from adding additional sources that are statistically independent from all other sources to a system containing only non-negative information components. This path-based approach illustrates that information theory provides the concepts and measures for a partial information decomposition.

Download Full-text

Directed Data-Processing Inequalities for Systems with Feedback

Entropy ◽

10.3390/e23050533 ◽

2021 ◽

Vol 23 (5) ◽

pp. 533

Author(s):

Milan S. Derpich ◽

Jan Østergaard

Keyword(s):

Mutual Information ◽

Data Processing ◽

Channel Coding ◽

Communication Channel ◽

Closed Loop ◽

Transmission Rate ◽

Information Rate ◽

Random Input ◽

Directed Information ◽

Stochastic Mapping

We present novel data-processing inequalities relating the mutual information and the directed information in systems with feedback. The internal deterministic blocks within such systems are restricted only to be causal mappings, but are allowed to be non-linear and time varying, and randomized by their own external random input, can yield any stochastic mapping. These randomized blocks can for example represent source encoders, decoders, or even communication channels. Moreover, the involved signals can be arbitrarily distributed. Our first main result relates mutual and directed information and can be interpreted as a law of conservation of information flow. Our second main result is a pair of data-processing inequalities (one the conditional version of the other) between nested pairs of random sequences entirely within the closed loop. Our third main result introduces and characterizes the notion of in-the-loop (ITL) transmission rate for channel coding scenarios in which the messages are internal to the loop. Interestingly, in this case the conventional notions of transmission rate associated with the entropy of the messages and of channel capacity based on maximizing the mutual information between the messages and the output turn out to be inadequate. Instead, as we show, the ITL transmission rate is the unique notion of rate for which a channel code attains zero error probability if and only if such an ITL rate does not exceed the corresponding directed information rate from messages to decoded messages. We apply our data-processing inequalities to show that the supremum of achievable (in the usual channel coding sense) ITL transmission rates is upper bounded by the supremum of the directed information rate across the communication channel. Moreover, we present an example in which this upper bound is attained. Finally, we further illustrate the applicability of our results by discussing how they make possible the generalization of two fundamental inequalities known in networked control literature.

Download Full-text

Understanding the patient voice in gout: a quantitative study conducted in Europe

BJGP Open ◽

10.3399/bjgpopen20x101003 ◽

2020 ◽

Vol 4 (1) ◽

pp. bjgpopen20X101003 ◽

Cited By ~ 2

Author(s):

Marc De Meulemeester ◽

Elsa Mateus ◽

Hilda Wieberneit-Tolman ◽

Neil Betteridge ◽

Lucy Ireland ◽

...

Keyword(s):

Severe Pain ◽

Online Survey ◽

Online Questionnaire ◽

Social Burden ◽

Long Term Effect ◽

Key Barrier ◽

Treatment Expectation ◽

Insight Into

BackgroundAlthough commonly diagnosed, gout often remains a poorly managed disease. This is partially due to a lack of awareness of the long-term effect of gout among patients and healthcare professionals.AimTo understand unmet needs for patients and provide insight into achieving better treatment.Design & settingA quantitative online questionnaire collected from 1100 people with gout from 14 countries within Europe.MethodPatients were recruited to complete an online survey via healthcare professional (HCP) referral, patient associations, or market research panels. Patients were included if they had been diagnosed with gout by a physician. Prior to commencement, patients were made aware that this study was sponsored by Grünenthal. The responses collected were collated and analyses were performed.ResultsPatients had an average of 2.9 gout flares within a 12-month period. Although 79% of patients were satisfied with treatment, inadequate gout control was also reported by 71% of patients. Furthermore, 84% experienced moderate-to-severe pain with their most recent flare. Of those who acknowledged treatment dissatisfaction, only 24% discussed other options with their GP. Most patients reported irregular follow-up and serum uric acid (sUA) monitoring. In addition, loss of belief that more can be done was a key barrier for patients.ConclusionPatients reported severe pain and social burden, coupled with low treatment expectation and lack of awareness of target sUA. Education around knowing and reaching sUA target is needed so that patients can receive and GPs can deliver higher quality management.

Download Full-text

Information Measures

Formalized Probability Theory and Applications Using Theorem Proving ◽

10.4018/978-1-4666-8315-0.ch009 ◽

2015 ◽

pp. 129-142

Keyword(s):

Information Theory ◽

Mutual Information ◽

Order Logic ◽

Unified Framework ◽

Higher Order Logic ◽

Absolutely Continuous ◽

Information Measures ◽

Finite Spaces ◽

Nikodym Derivative ◽

Lebesgue Integration

This chapter presents a higher-order-logic formalization of the main concepts of information theory (Cover & Thomas, 1991), such as the Shannon entropy and mutual information, using the formalization of the foundational theories of measure, Lebesgue integration, and probability. The main results of the chapter include the formalizations of the Radon-Nikodym derivative and the Kullback-Leibler (KL) divergence (Coble, 2010). The latter provides a unified framework based on which most of the commonly used measures of information can be defined. The chapter then provides the general definitions that are valid for both discrete and continuous cases and then proves the corresponding reduced expressions where the measures considered are absolutely continuous over finite spaces.

Download Full-text

On directed information theory and Granger causality graphs

Journal of Computational Neuroscience ◽

10.1007/s10827-010-0231-x ◽

2010 ◽

Vol 30 (1) ◽

pp. 7-16 ◽

Cited By ~ 80

Author(s):

Pierre-Olivier Amblard ◽

Olivier J. J. Michel

Keyword(s):

Information Theory ◽

Granger Causality ◽

Directed Information

Download Full-text

Predictability and Information Theory. Part II: Imperfect Forecasts

Journal of the Atmospheric Sciences ◽

10.1175/jas3522.1 ◽

2005 ◽

Vol 62 (9) ◽

pp. 3368-3381 ◽

Cited By ~ 43

Author(s):

Timothy DelSole

Keyword(s):

Information Theory ◽

Mutual Information ◽

Dimensional Space ◽

Nonlinear Behavior ◽

Potential Predictability ◽

Low Dimensional ◽

Lower Dimensional Space ◽

True System ◽

Normally Distributed

Abstract This paper presents a framework for quantifying predictability based on the behavior of imperfect forecasts. The critical quantity in this framework is not the forecast distribution, as used in many other predictability studies, but the conditional distribution of the state given the forecasts, called the regression forecast distribution. The average predictability of the regression forecast distribution is given by a quantity called the mutual information. Standard inequalities in information theory show that this quantity is bounded above by the average predictability of the true system and by the average predictability of the forecast system. These bounds clarify the role of potential predictability, of which many incorrect statements can be found in the literature. Mutual information has further attractive properties: it is invariant with respect to nonlinear transformations of the data, cannot be improved by manipulating the forecast, and reduces to familiar measures of correlation skill when the forecast and verification are joint normally distributed. The concept of potential predictable components is shown to define a lower-dimensional space that captures the full predictability of the regression forecast without loss of generality. The predictability of stationary, Gaussian, Markov systems is examined in detail. Some simple numerical examples suggest that imperfect forecasts are not always useful for joint normally distributed systems since greater predictability often can be obtained directly from observations. Rather, the usefulness of imperfect forecasts appears to lie in the fact that they can identify potential predictable components and capture nonstationary and/or nonlinear behavior, which are difficult to capture by low-dimensional, empirical models estimated from short historical records.

Download Full-text

Mutual information as a tool for identifying phase transitions in dynamical complex systems with limited data

Physical Review E ◽

10.1103/physreve.75.051125 ◽

2007 ◽

Vol 75 (5) ◽

Cited By ~ 30

Author(s):

R. T. Wicks ◽

S. C. Chapman ◽

R. O. Dendy

Keyword(s):

Phase Transitions ◽

Mutual Information ◽

Complex Systems ◽

Limited Data

Download Full-text

New forms of structure in ecosystems revealed with the Kuramoto model

Royal Society Open Science ◽

10.1098/rsos.210122 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

John Vandermeer ◽

Zachary Hajian-Forooshani ◽

Nicholas Medina ◽

Ivette Perfecto

Keyword(s):

Natural History ◽

Complex Systems ◽

Coupled Oscillators ◽

Ecological Systems ◽

Kuramoto Model ◽

Chaotic Orbits ◽

The Kuramoto Model ◽

Convection Rolls ◽

Analytical Tools ◽

Insight Into

Ecological systems, as is often noted, are complex. Equally notable is the generalization that complex systems tend to be oscillatory, whether Huygens' simple patterns of pendulum entrainment or the twisted chaotic orbits of Lorenz’ convection rolls. The analytics of oscillators may thus provide insight into the structure of ecological systems. One of the most popular analytical tools for such study is the Kuramoto model of coupled oscillators. We apply this model as a stylized vision of the dynamics of a well-studied system of pests and their enemies, to ask whether its actual natural history is reflected in the dynamics of the qualitatively instantiated Kuramoto model. Emerging from the model is a series of synchrony groups generally corresponding to subnetworks of the natural system, with an overlying chimeric structure, depending on the strength of the inter-oscillator coupling. We conclude that the Kuramoto model presents a novel window through which interesting questions about the structure of ecological systems may emerge.

Download Full-text

Estimating mutual information under measurement error

10.1101/852384 ◽

2019 ◽

Author(s):

Cong Ma ◽

Carl Kingsford

Keyword(s):

Measurement Error ◽

Mutual Information ◽

Simulated Data ◽

Mass Function ◽

Accurate Estimation ◽

Limiting Factor ◽

Probability Mass Function ◽

Information Measurement ◽

Biological Signals ◽

Probability Mass

AbstractMutual information is widely used to characterize dependence between biological signals, such as co-expression between genes or co-evolution between amino acids. However, measurement error of the biological signals is rarely considered in estimating mutual information. Measurement error is widespread and non-negligible in some cases. As a result, the distribution of the signals is blurred, and the mutual information may be biased when estimated using the blurred measurements. We derive a corrected estimator for mutual information that accounts for the distribution of measurement error. Our corrected estimator is based on the correction of the probability mass function (PMF) or probability density function (PDF, based on kernel density estimation). We prove that the corrected estimator is asymptotically unbiased in the (semi-) discrete case when the distribution of measurement error is known. We show that it reduces the estimation bias in the continuous case under certain assumptions. On simulated data, our corrected estimator leads to a more accurate estimation for mutual information when the sample size is not the limiting factor for estimating PMF or PDF accurately. We compare the uncorrected and corrected estimator on the gene expression data of TCGA breast cancer samples and show a difference in both the value and the ranking of estimated mutual information between the two estimators.

Download Full-text

Mutual Information Scaling for Tensor Network Machine Learning

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ac44a9 ◽

2021 ◽

Author(s):

Ian Convy ◽

William Huggins ◽

Haoran Liao ◽

K Birgitta Whaley

Keyword(s):

Machine Learning ◽

Mutual Information ◽

Many Body ◽

Learning Tasks ◽

Tensor Networks ◽

Tensor Network ◽

Logistic Regression Algorithm ◽

Entanglement Structure ◽

Insight Into ◽

Specific Learning

Abstract Tensor networks have emerged as promising tools for machine learning, inspired by their widespread use as variational ansatze in quantum many-body physics. It is well known that the success of a given tensor network ansatz depends in part on how well it can reproduce the underlying entanglement structure of the target state, with different network designs favoring different scaling patterns. We demonstrate here how a related correlation analysis can be applied to tensor network machine learning, and explore whether classical data possess correlation scaling patterns similar to those found in quantum states which might indicate the best network to use for a given dataset. We utilize mutual information as measure of correlations in classical data, and show that it can serve as a lower-bound on the entanglement needed for a probabilistic tensor network classifier. We then develop a logistic regression algorithm to estimate the mutual information between bipartitions of data features, and verify its accuracy on a set of Gaussian distributions designed to mimic different correlation patterns. Using this algorithm, we characterize the scaling patterns in the MNIST and Tiny Images datasets, and find clear evidence of boundary-law scaling in the latter. This quantum-inspired classical analysis offers insight into the design of tensor networks which are best suited for specific learning tasks.

Download Full-text