scholarly journals A Comparison of Variational Bounds for the Information Bottleneck Functional

Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1229 ◽  
Author(s):  
Bernhard C. Geiger ◽  
Ian S. Fischer

In this short note, we relate the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional, respectively. Although the two functionals were shown to be equivalent, it was empirically observed that optimizing bounds on the CEB functional achieves better generalization performance and adversarial robustness than optimizing those on the IB functional. This work tries to shed light on this issue by showing that, in the most general setting, no ordering can be established between these variational bounds, while such an ordering can be enforced by restricting the feasible sets over which the optimizations take place. The absence of such an ordering in the general setup suggests that the variational bound on the CEB functional is either more amenable to optimization or a relevant cost function for optimization in its own regard, i.e., without justification from the IB or CEB functionals.

2021 ◽  
pp. 1-3
Author(s):  
Lawrence J. Bliquez

Abstract This short note attempts to shed light on some of the surgical procedures referred to in Martial's epigram 10.56 by consulting pertinent Graeco-Roman medical texts. A fuller understanding of one such intervention (treatment of infected/inflamed uvula) supports Martial's text as transmitted.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 492
Author(s):  
Gustavo Estrela ◽  
Marco Dimas Gubitoso ◽  
Carlos Eduardo Ferreira ◽  
Junior Barrera ◽  
Marcelo S. Reis

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.


2020 ◽  
Vol 39 (3) ◽  
pp. 4183-4196
Author(s):  
Fu-Ning Lin ◽  
Guang-Ji Yu ◽  
Guang-Ming Xue ◽  
Jiang-Feng Han

 Crisp antimatroid is a combinatorial abstraction of convexity. It also can be incorporated into the greedy algorithm in order to seek the optimal solutions. Nevertheless, this kind of significant classical structure has inherent limitations in addressing fuzzy optimization problems and abstracting fuzzy convexities. This paper introduces the concept of L-fuzzifying antimatroid associated with an L-fuzzifying family of feasible sets. Several relevant fundamental properties are obtained. We also propose the concept of L-fuzzifying rank functions for L-fuzzifying antimatroids, and then investigate their axiomatic characterizations. Finally, we shed light upon the bijective correspondence between an L-fuzzifying antimatroid and its L-fuzzifying rank function.


2021 ◽  
Vol 20 ◽  
pp. 170-177
Author(s):  
Wang Jianhong

In this short note, one data driven model predictive control is studied to design the optimal control sequence. The idea of data driven means the actual output value in cost function for model predictive control is identi_ed through input-output observed data in case of unknown but bounded noise and martingale di_erence sequence. After substituting the identi_ed actual output in cost function, the total cost function in model predictive control is reformulated as the other standard form, so that dynamic programming can be applied directly. As dynamic programming is only used in optimization theory, so to extend its advantage in control theory, dynamic programming algorithm is proposed to construct the optimal control sequence. Furthermore, stability analysis for data drive model predictive control is also given based on dynamic programming strategy. Generally, the goal of this short note is to bridge the dynamic programming, system identi_cation and model predictive control. Finally, one simulation example is used to prove the e_ciency of our proposed theory


Entropy ◽  
2020 ◽  
Vol 22 (1) ◽  
pp. 100 ◽  
Author(s):  
Giulio Franzese ◽  
Monica Visintin

We describe a classifier made of an ensemble of decision trees, designed using information theory concepts. In contrast to algorithms C4.5 or ID3, the tree is built from the leaves instead of the root. Each tree is made of nodes trained independently of the others, to minimize a local cost function (information bottleneck). The trained tree outputs the estimated probabilities of the classes given the input datum, and the outputs of many trees are combined to decide the class. We show that the system is able to provide results comparable to those of the tree classifier in terms of accuracy, while it shows many advantages in terms of modularity, reduced complexity, and memory requirements.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2203
Author(s):  
Ioannis S. Triantafyllou

In the present article, we introduce the m-consecutive-k-out-of-n:F structures with a single change point. The aforementioned system consists of n independent components, of which the first n1 units are identically distributed with common reliability p1, while the remaining ones share a different functioning probability p2. The general setup of the proposed reliability structures is presented in detail, while an explicit expression for determining the number of its path sets of a given size is derived. Additionally, closed formulae for the reliability function and mean time to failure of the aforementioned models are also provided. For illustration purposes, several numerical results and comparisons are presented in order to shed light on the performance of the proposed structure.


2021 ◽  
Vol 13 (3) ◽  
pp. 18035-18038
Author(s):  
Naren Sreenivasan ◽  
Joshua Barton

Fifty years after the first report of freshwater medusae (Limnocnida indica) from Cauvery River in Krishanrajasagar Reservoir, there has been only one other published report of its occurrence in the Cauvery Basin at Hemavathi Reservoir, Kodagu District.  Recent interest in freshwater photography has revealed three more locations in the Cauvery Basin where medusae are found.  Medusae are often observed at these locations but are erroneously identified as invasive species.  According to published literature, this is true of Craspedacusta sowerbii, a cosmopolitan species with only three confirmed reports from India.  All these reports were from artificial structures such as ponds and aquaria.  The native Limnocnida and exotic Craspedacusta can be distinguished from each other visually and with respect to temporal variation in the occurrence of their free swimming medusae.  This short note is intended to shed light on the status, distribution, and field identification of L. indica, a species endemic to the Western Ghats of India.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 999 ◽  
Author(s):  
Ian Fischer

Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization.


2017 ◽  
Vol 29 (6) ◽  
pp. 1611-1630 ◽  
Author(s):  
DJ Strouse ◽  
David J. Schwab

Lossy compression and clustering fundamentally involve a decision about which features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek ( 1999 ) formalized this notion as an information-theoretic optimization problem and proposed an optimal trade-off between throwing away as many bits as possible and selectively keeping those that are most important. In the IB, compression is measured by mutual information. Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB) and argue better captures this notion of compression. As suggested by its name, the solution to the DIB problem turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder, or soft clustering, that is optimal under the IB. We compare the IB and DIB on synthetic data, showing that the IB and DIB perform similarly in terms of the IB cost function, but that the DIB significantly outperforms the IB in terms of the DIB cost function. We also empirically find that the DIB offers a considerable gain in computational efficiency over the IB, over a range of convergence parameters. Our derivation of the DIB also suggests a method for continuously interpolating between the soft clustering of the IB and the hard clustering of the DIB.


Sign in / Sign up

Export Citation Format

Share Document