Conditional Rényi Divergences and Horse Betting

Cédric Bleuler; Amos Lapidoth; Christoph Pfister

doi:10.3390/e22030316

Conditional Rényi Divergences and Horse Betting

Entropy ◽

10.3390/e22030316 ◽

2020 ◽

Vol 22 (3) ◽

pp. 316 ◽

Cited By ~ 1

Author(s):

Cédric Bleuler ◽

Amos Lapidoth ◽

Christoph Pfister

Keyword(s):

Mutual Information ◽

Data Processing ◽

Utility Function ◽

Side Information ◽

Conditional Entropy ◽

Utility Functions ◽

Power Mean ◽

Rényi Divergence ◽

Dependence Measures ◽

Measure Of Dependence

Motivated by a horse betting problem, a new conditional Rényi divergence is introduced. It is compared with the conditional Rényi divergences that appear in the definitions of the dependence measures by Csiszár and Sibson, and the properties of all three are studied with emphasis on their behavior under data processing. In the same way that Csiszár’s and Sibson’s conditional divergence lead to the respective dependence measures, so does the new conditional divergence lead to the Lapidoth–Pfister mutual information. Moreover, the new conditional divergence is also related to the Arimoto–Rényi conditional entropy and to Arimoto’s measure of dependence. In the second part of the paper, the horse betting problem is analyzed where, instead of Kelly’s expected log-wealth criterion, a more general family of power-mean utility functions is considered. The key role in the analysis is played by the Rényi divergence, and in the setting where the gambler has access to side information, the new conditional Rényi divergence is key. The setting with side information also provides another operational meaning to the Lapidoth–Pfister mutual information. Finally, a universal strategy for independent and identically distributed races is presented that—without knowing the winning probabilities or the parameter of the utility function—asymptotically maximizes the gambler’s utility function.

Download Full-text

Two Measures of Dependence

Entropy ◽

10.3390/e21080778 ◽

2019 ◽

Vol 21 (8) ◽

pp. 778 ◽

Cited By ~ 5

Author(s):

Amos Lapidoth ◽

Christoph Pfister

Keyword(s):

Mutual Information ◽

Data Processing ◽

Optimal Error ◽

Error Exponents ◽

Rényi Divergence ◽

Measures Of Dependence ◽

Data Processing Inequality ◽

Two Measures ◽

Task Encoding ◽

Dependence Measures

Two families of dependence measures between random variables are introduced. They are based on the Rényi divergence of order α and the relative α -entropy, respectively, and both dependence measures reduce to Shannon’s mutual information when their order α is one. The first measure shares many properties with the mutual information, including the data-processing inequality, and can be related to the optimal error exponents in composite hypothesis testing. The second measure does not satisfy the data-processing inequality, but appears naturally in the context of distributed task encoding.

Download Full-text

Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures

Entropy ◽

10.3390/e24010067 ◽

2021 ◽

Vol 24 (1) ◽

pp. 67

Author(s):

Xiyu Shi ◽

Varuna De-Silva ◽

Yusuf Aslan ◽

Erhan Ekmekcioglu ◽

Ahmet Kondoz

Keyword(s):

Neural Network ◽

Deep Learning ◽

Mutual Information ◽

Data Processing ◽

Conditional Entropy ◽

Dropout Rate ◽

Entropy Inequality ◽

Sensor Data ◽

Resource Utilisation ◽

Data Generation

Deep learning has proven to be an important element of modern data processing technology, which has found its application in many areas such as multimodal sensor data processing and understanding, data generation and anomaly detection. While the use of deep learning is booming in many real-world tasks, the internal processes of how it draws results is still uncertain. Understanding the data processing pathways within a deep neural network is important for transparency and better resource utilisation. In this paper, a method utilising information theoretic measures is used to reveal the typical learning patterns of convolutional neural networks, which are commonly used for image processing tasks. For this purpose, training samples, true labels and estimated labels are considered to be random variables. The mutual information and conditional entropy between these variables are then studied using information theoretical measures. This paper shows that more convolutional layers in the network improve its learning and unnecessarily higher numbers of convolutional layers do not improve the learning any further. The number of convolutional layers that need to be added to a neural network to gain the desired learning level can be determined with the help of theoretic information quantities including entropy, inequality and mutual information among the inputs to the network. The kernel size of convolutional layers only affects the learning speed of the network. This study also shows that where the dropout layer is applied to has no significant effects on the learning of networks with a lower dropout rate, and it is better placed immediately after the last convolutional layer with higher dropout rates.

Download Full-text

Data Screening Methods – Application to Differential Diagnosis in Pancreatic Pathology from Radiological Signs

Methods of Information in Medicine ◽

10.1055/s-0038-1636606 ◽

1978 ◽

Vol 17 (01) ◽

pp. 36-40 ◽

Cited By ~ 4

Author(s):

J.-P. Durbec ◽

Jaqueline Cornée ◽

P. Berthezene

Keyword(s):

Differential Diagnosis ◽

Mutual Information ◽

Data Processing ◽

Screening Methods ◽

Automatic Data ◽

Chronic Calcifying Pancreatitis ◽

Number Of Patients ◽

Calcifying Pancreatitis ◽

Pancreatic Pathology ◽

Data Screening

The practice of systematic examinations in hospitals and the increasing development of automatic data processing permits the storing of a great deal of information about a large number of patients belonging to different diagnosis groups.To predict or to characterize these diagnosis groups some descriptors are particularly useful, others carry no information. Data screening based on the properties of mutual information and on the log cross products ratios in contingency tables is developed. The most useful descriptors are selected. For each one the characterized groups are specified.This approach has been performed on a set of binary (presence—absence) radiological variables. Four diagnoses groups are concerned: cancer of pancreas, chronic calcifying pancreatitis, non-calcifying pancreatitis and probable pancreatitis. Only twenty of the three hundred and forty initial radiological variables are selected. The presence of each corresponding sign is associated with one or more diagnosis groups.

Download Full-text

Directed Data-Processing Inequalities for Systems with Feedback

Entropy ◽

10.3390/e23050533 ◽

2021 ◽

Vol 23 (5) ◽

pp. 533

Author(s):

Milan S. Derpich ◽

Jan Østergaard

Keyword(s):

Mutual Information ◽

Data Processing ◽

Channel Coding ◽

Communication Channel ◽

Closed Loop ◽

Transmission Rate ◽

Information Rate ◽

Random Input ◽

Directed Information ◽

Stochastic Mapping

We present novel data-processing inequalities relating the mutual information and the directed information in systems with feedback. The internal deterministic blocks within such systems are restricted only to be causal mappings, but are allowed to be non-linear and time varying, and randomized by their own external random input, can yield any stochastic mapping. These randomized blocks can for example represent source encoders, decoders, or even communication channels. Moreover, the involved signals can be arbitrarily distributed. Our first main result relates mutual and directed information and can be interpreted as a law of conservation of information flow. Our second main result is a pair of data-processing inequalities (one the conditional version of the other) between nested pairs of random sequences entirely within the closed loop. Our third main result introduces and characterizes the notion of in-the-loop (ITL) transmission rate for channel coding scenarios in which the messages are internal to the loop. Interestingly, in this case the conventional notions of transmission rate associated with the entropy of the messages and of channel capacity based on maximizing the mutual information between the messages and the output turn out to be inadequate. Instead, as we show, the ITL transmission rate is the unique notion of rate for which a channel code attains zero error probability if and only if such an ITL rate does not exceed the corresponding directed information rate from messages to decoded messages. We apply our data-processing inequalities to show that the supremum of achievable (in the usual channel coding sense) ITL transmission rates is upper bounded by the supremum of the directed information rate across the communication channel. Moreover, we present an example in which this upper bound is attained. Finally, we further illustrate the applicability of our results by discussing how they make possible the generalization of two fundamental inequalities known in networked control literature.

Download Full-text

A rough set algorithm for attribute reduction via mutual information and conditional entropy

2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) ◽

10.1109/fskd.2013.6816261 ◽

2013 ◽

Cited By ~ 1

Author(s):

Jing Tian ◽

Quan Wang ◽

Bing Yu ◽

Dan Yu

Keyword(s):

Mutual Information ◽

Rough Set ◽

Attribute Reduction ◽

Conditional Entropy

Download Full-text

DECISION ANALYSIS WITH MULTIPLE OBJECTIVES IN A FRAMEWORK FOR EVALUATING IMPRECISION

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848850500362x ◽

2005 ◽

Vol 13 (05) ◽

pp. 495-509 ◽

Cited By ~ 16

Author(s):

ARON LARSSON ◽

JIM JOHANSSON ◽

LOVE EKENBERG ◽

MATS DANIELSON

Keyword(s):

Decision Tree ◽

Utility Function ◽

Evaluation Method ◽

Convex Sets ◽

Utility Functions ◽

Probability Measures ◽

Decisions Under Risk ◽

Closed Intervals ◽

Pros And Cons ◽

Tree Evaluation

We present a decision tree evaluation method for analyzing multi-attribute decisions under risk, where information is numerically imprecise. The approach extends the use of additive and multiplicative utility functions for supporting evaluation of imprecise statements, relaxing requirements for precise estimates of decision parameters. Information is modeled in convex sets of utility and probability measures restricted by closed intervals. Evaluation is done relative to a set of rules, generalizing the concept of admissibility, computationally handled through optimization of aggregated utility functions. Pros and cons of two approaches, and tradeoffs in selecting a utility function, are discussed.

Download Full-text

Estimation of Mutual Information and Conditional Entropy for Surveillance Optimization

10.2118/163638-ms ◽

2013 ◽

Cited By ~ 3

Author(s):

Duc H. Le ◽

Albert C. Reynolds

Keyword(s):

Mutual Information ◽

Conditional Entropy

Download Full-text

Comparing utility functions between risky and riskless choice in rhesus monkeys

10.1101/2021.01.12.426382 ◽

2021 ◽

Author(s):

Philipe M. Bujold ◽

Simone Ferrari-Toniolo ◽

Leo Chi U Seak ◽

Wolfram Schultz

Keyword(s):

Utility Function ◽

Utility Theory ◽

Utility Maximization ◽

Choice Behavior ◽

Expected Utility Theory ◽

Utility Functions ◽

Random Utility Maximization ◽

Risky Choices ◽

Neuronal Mechanisms ◽

Subjective Value

AbstractDecisions can be risky or riskless, depending on the outcomes of the choice. Expected Utility Theory describes risky choices as a utility maximization process: we choose the option with the highest subjective value (utility), which we compute considering both the option’s value and its associated risk. According to the random utility maximization framework, riskless choices could also be based on a utility measure. Neuronal mechanisms of utility-based choice may thus be common to both risky and riskless choices. This assumption would require the existence of a utility function that accounts for both risky and riskless decisions. Here, we investigated whether the choice behavior of macaque monkeys in riskless and risky decisions could be described by a common underlying utility function. We found that the utility functions elicited in the two choice scenarios were different from each other, even after taking into account the contribution of subjective probability weighting. Our results suggest that distinct utility representations exist for riskless and risky choices, which could reflect distinct neuronal representations of the utility quantities, or distinct brain mechanisms for risky and riskless choices. The different utility functions should be taken into account in neuronal investigations of utility-based choice.

Download Full-text

Dependence-Maximization Clustering with Least-Squares Mutual Information

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p0800 ◽

2011 ◽

Vol 15 (7) ◽

pp. 800-805 ◽

Cited By ~ 10

Author(s):

Manabu Kimura ◽

◽

Masashi Sugiyama

Keyword(s):

Mutual Information ◽

Least Squares ◽

Cross Validation ◽

Unsupervised Clustering ◽

Statistical Dependence ◽

Clustering Method ◽

Regularization Parameters ◽

Loss Variant ◽

Dependence Measures ◽

Kernel Parameters

Recently, statistical dependence measures such as mutual information and kernelized covariance have been successfully applied to clustering. In this paper, we follow this line of research and propose a novel dependence-maximization clustering method based on least-squares mutual information, which is an estimator of a squared-loss variant of mutual information. A notable advantage of the proposed method over existing approaches is that hyperparameters such as kernel parameters and regularization parameters can be objectively optimized based on cross-validation. Thus, subjective manual-tuning of hyperparameters is not necessary in the proposed method, which is a highly useful property in unsupervised clustering scenarios. Through experiments, we illustrate the usefulness of the proposed approach.

Download Full-text

Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Cross-Disciplinary Models and Applications of Database Management ◽

10.4018/978-1-61350-471-0.ch017 ◽

2012 ◽

pp. 417-437

Author(s):

Huimin Zhao

Keyword(s):

Mutual Information ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources ◽

Attribute Matching ◽

Measure Of Dependence

Identifying matching attributes across heterogeneous data sources is a critical and time-consuming step in integrating the data sources. In this paper, the author proposes a method for matching the most frequently encountered types of attributes across overlapping heterogeneous data sources. The author uses mutual information as a unified measure of dependence on various types of attributes. An example is used to demonstrate the utility of the proposed method, which is useful in developing practical attribute matching tools.

Download Full-text