scholarly journals Conditional Rényi Divergences and Horse Betting

Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 316 ◽  
Author(s):  
Cédric Bleuler ◽  
Amos Lapidoth ◽  
Christoph Pfister

Motivated by a horse betting problem, a new conditional Rényi divergence is introduced. It is compared with the conditional Rényi divergences that appear in the definitions of the dependence measures by Csiszár and Sibson, and the properties of all three are studied with emphasis on their behavior under data processing. In the same way that Csiszár’s and Sibson’s conditional divergence lead to the respective dependence measures, so does the new conditional divergence lead to the Lapidoth–Pfister mutual information. Moreover, the new conditional divergence is also related to the Arimoto–Rényi conditional entropy and to Arimoto’s measure of dependence. In the second part of the paper, the horse betting problem is analyzed where, instead of Kelly’s expected log-wealth criterion, a more general family of power-mean utility functions is considered. The key role in the analysis is played by the Rényi divergence, and in the setting where the gambler has access to side information, the new conditional Rényi divergence is key. The setting with side information also provides another operational meaning to the Lapidoth–Pfister mutual information. Finally, a universal strategy for independent and identically distributed races is presented that—without knowing the winning probabilities or the parameter of the utility function—asymptotically maximizes the gambler’s utility function.

Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 778 ◽  
Author(s):  
Amos Lapidoth ◽  
Christoph Pfister

Two families of dependence measures between random variables are introduced. They are based on the Rényi divergence of order α and the relative α -entropy, respectively, and both dependence measures reduce to Shannon’s mutual information when their order α is one. The first measure shares many properties with the mutual information, including the data-processing inequality, and can be related to the optimal error exponents in composite hypothesis testing. The second measure does not satisfy the data-processing inequality, but appears naturally in the context of distributed task encoding.


Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 67
Author(s):  
Xiyu Shi ◽  
Varuna De-Silva ◽  
Yusuf Aslan ◽  
Erhan Ekmekcioglu ◽  
Ahmet Kondoz

Deep learning has proven to be an important element of modern data processing technology, which has found its application in many areas such as multimodal sensor data processing and understanding, data generation and anomaly detection. While the use of deep learning is booming in many real-world tasks, the internal processes of how it draws results is still uncertain. Understanding the data processing pathways within a deep neural network is important for transparency and better resource utilisation. In this paper, a method utilising information theoretic measures is used to reveal the typical learning patterns of convolutional neural networks, which are commonly used for image processing tasks. For this purpose, training samples, true labels and estimated labels are considered to be random variables. The mutual information and conditional entropy between these variables are then studied using information theoretical measures. This paper shows that more convolutional layers in the network improve its learning and unnecessarily higher numbers of convolutional layers do not improve the learning any further. The number of convolutional layers that need to be added to a neural network to gain the desired learning level can be determined with the help of theoretic information quantities including entropy, inequality and mutual information among the inputs to the network. The kernel size of convolutional layers only affects the learning speed of the network. This study also shows that where the dropout layer is applied to has no significant effects on the learning of networks with a lower dropout rate, and it is better placed immediately after the last convolutional layer with higher dropout rates.


1978 ◽  
Vol 17 (01) ◽  
pp. 36-40 ◽  
Author(s):  
J.-P. Durbec ◽  
Jaqueline Cornée ◽  
P. Berthezene

The practice of systematic examinations in hospitals and the increasing development of automatic data processing permits the storing of a great deal of information about a large number of patients belonging to different diagnosis groups.To predict or to characterize these diagnosis groups some descriptors are particularly useful, others carry no information. Data screening based on the properties of mutual information and on the log cross products ratios in contingency tables is developed. The most useful descriptors are selected. For each one the characterized groups are specified.This approach has been performed on a set of binary (presence—absence) radiological variables. Four diagnoses groups are concerned: cancer of pancreas, chronic calcifying pancreatitis, non-calcifying pancreatitis and probable pancreatitis. Only twenty of the three hundred and forty initial radiological variables are selected. The presence of each corresponding sign is associated with one or more diagnosis groups.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 533
Author(s):  
Milan S. Derpich ◽  
Jan Østergaard

We present novel data-processing inequalities relating the mutual information and the directed information in systems with feedback. The internal deterministic blocks within such systems are restricted only to be causal mappings, but are allowed to be non-linear and time varying, and randomized by their own external random input, can yield any stochastic mapping. These randomized blocks can for example represent source encoders, decoders, or even communication channels. Moreover, the involved signals can be arbitrarily distributed. Our first main result relates mutual and directed information and can be interpreted as a law of conservation of information flow. Our second main result is a pair of data-processing inequalities (one the conditional version of the other) between nested pairs of random sequences entirely within the closed loop. Our third main result introduces and characterizes the notion of in-the-loop (ITL) transmission rate for channel coding scenarios in which the messages are internal to the loop. Interestingly, in this case the conventional notions of transmission rate associated with the entropy of the messages and of channel capacity based on maximizing the mutual information between the messages and the output turn out to be inadequate. Instead, as we show, the ITL transmission rate is the unique notion of rate for which a channel code attains zero error probability if and only if such an ITL rate does not exceed the corresponding directed information rate from messages to decoded messages. We apply our data-processing inequalities to show that the supremum of achievable (in the usual channel coding sense) ITL transmission rates is upper bounded by the supremum of the directed information rate across the communication channel. Moreover, we present an example in which this upper bound is attained. Finally, we further illustrate the applicability of our results by discussing how they make possible the generalization of two fundamental inequalities known in networked control literature.


Author(s):  
ARON LARSSON ◽  
JIM JOHANSSON ◽  
LOVE EKENBERG ◽  
MATS DANIELSON

We present a decision tree evaluation method for analyzing multi-attribute decisions under risk, where information is numerically imprecise. The approach extends the use of additive and multiplicative utility functions for supporting evaluation of imprecise statements, relaxing requirements for precise estimates of decision parameters. Information is modeled in convex sets of utility and probability measures restricted by closed intervals. Evaluation is done relative to a set of rules, generalizing the concept of admissibility, computationally handled through optimization of aggregated utility functions. Pros and cons of two approaches, and tradeoffs in selecting a utility function, are discussed.


2021 ◽  
Author(s):  
Philipe M. Bujold ◽  
Simone Ferrari-Toniolo ◽  
Leo Chi U Seak ◽  
Wolfram Schultz

AbstractDecisions can be risky or riskless, depending on the outcomes of the choice. Expected Utility Theory describes risky choices as a utility maximization process: we choose the option with the highest subjective value (utility), which we compute considering both the option’s value and its associated risk. According to the random utility maximization framework, riskless choices could also be based on a utility measure. Neuronal mechanisms of utility-based choice may thus be common to both risky and riskless choices. This assumption would require the existence of a utility function that accounts for both risky and riskless decisions. Here, we investigated whether the choice behavior of macaque monkeys in riskless and risky decisions could be described by a common underlying utility function. We found that the utility functions elicited in the two choice scenarios were different from each other, even after taking into account the contribution of subjective probability weighting. Our results suggest that distinct utility representations exist for riskless and risky choices, which could reflect distinct neuronal representations of the utility quantities, or distinct brain mechanisms for risky and riskless choices. The different utility functions should be taken into account in neuronal investigations of utility-based choice.


Author(s):  
Manabu Kimura ◽  
◽  
Masashi Sugiyama

Recently, statistical dependence measures such as mutual information and kernelized covariance have been successfully applied to clustering. In this paper, we follow this line of research and propose a novel dependence-maximization clustering method based on least-squares mutual information, which is an estimator of a squared-loss variant of mutual information. A notable advantage of the proposed method over existing approaches is that hyperparameters such as kernel parameters and regularization parameters can be objectively optimized based on cross-validation. Thus, subjective manual-tuning of hyperparameters is not necessary in the proposed method, which is a highly useful property in unsupervised clustering scenarios. Through experiments, we illustrate the usefulness of the proposed approach.


Author(s):  
Huimin Zhao

Identifying matching attributes across heterogeneous data sources is a critical and time-consuming step in integrating the data sources. In this paper, the author proposes a method for matching the most frequently encountered types of attributes across overlapping heterogeneous data sources. The author uses mutual information as a unified measure of dependence on various types of attributes. An example is used to demonstrate the utility of the proposed method, which is useful in developing practical attribute matching tools.


Sign in / Sign up

Export Citation Format

Share Document