The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Athanasios Davvetas; Iraklis A. Klampanos; Spiros Skiadopoulos; Vangelis Karkaletsis

doi:10.3390/informatics6020017

The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Informatics ◽

10.3390/informatics6020017 ◽

2019 ◽

Vol 6 (2) ◽

pp. 17

Author(s):

Athanasios Davvetas ◽

Iraklis A. Klampanos ◽

Spiros Skiadopoulos ◽

Vangelis Karkaletsis

Keyword(s):

Mutual Information ◽

Ground Truth ◽

Original Data ◽

Information Theoretic ◽

Information Bottleneck ◽

Latent Space ◽

Before And After ◽

Feature Relevance ◽

Latent Representations ◽

Transfer Method

Evidence transfer for clustering is a deep learning method that manipulates the latent representations of an autoencoder according to external categorical evidence with the effect of improving a clustering outcome. Evidence transfer’s application on clustering is designed to be robust when introduced with a low quality of evidence, while increasing the effectiveness of the clustering accuracy during relevant corresponding evidence. We interpret the effects of evidence transfer on the latent representation of an autoencoder by comparing our method to the information bottleneck method. Information bottleneck is an optimisation problem of finding the best tradeoff between maximising the mutual information of data representations and a task outcome while at the same time being effective in compressing the original data source. We posit that the evidence transfer method has essentially the same objective regarding the latent representations produced by an autoencoder. We verify our hypothesis using information theoretic metrics from feature selection in order to perform an empirical analysis over the information that is carried through the bottleneck of the latent space. We use the relevance metric to compare the overall mutual information between the latent representations and the ground truth labels before and after their incremental manipulation, as well as, to study the effects of evidence transfer regarding the significance of each latent feature.

Download Full-text

Bottleneck Problems: An Information and Estimation-Theoretic View

Entropy ◽

10.3390/e22111325 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1325

Author(s):

Shahab Asoodeh ◽

Flavio P. Calmon

Keyword(s):

Mutual Information ◽

Closed Form ◽

Optimization Problems ◽

Source Coding ◽

Auxiliary Variable ◽

Information Theoretic ◽

Bottleneck Problems ◽

Information Bottleneck ◽

Discrete Random Variables ◽

Binary Case

Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber’s Lemma), and strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding, and dependence dilution. Leveraging these connections, we prove a new cardinality bound on the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed “bottleneck problems”, by replacing mutual information in IB and PF with other notions of mutual information, namely f-information and Arimoto’s mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantees in a variety of inference tasks with statistical constraints on accuracy and privacy. While the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to a binary case, we derive closed form expressions for several bottleneck problems.

Download Full-text

Variational Information Bottleneck for Semi-Supervised Classification

Entropy ◽

10.3390/e22090943 ◽

2020 ◽

Vol 22 (9) ◽

pp. 943 ◽

Cited By ~ 1

Author(s):

Slava Voloshynovskiy ◽

Olga Taran ◽

Mouad Kondah ◽

Taras Holotyak ◽

Danilo Rezende

Keyword(s):

Mutual Information ◽

Classification Accuracy ◽

Supervised Classification ◽

Classification Task ◽

Variational Model ◽

Space Representation ◽

Information Bottleneck ◽

Latent Space ◽

New Formulation

In this paper, we consider an information bottleneck (IB) framework for semi-supervised classification with several families of priors on latent space representation. We apply a variational decomposition of mutual information terms of IB. Using this decomposition we perform an analysis of several regularizers and practically demonstrate an impact of different components of variational model on the classification accuracy. We propose a new formulation of semi-supervised IB with hand crafted and learnable priors and link it to the previous methods such as semi-supervised versions of VAE (M1 + M2), AAE, CatGAN, etc. We show that the resulting model allows better understand the role of various previously proposed regularizers in semi-supervised classification task in the light of IB framework. The proposed IB semi-supervised model with hand-crafted and learnable priors is experimentally validated on MNIST under different amount of labeled data.

Download Full-text

The Deterministic Information Bottleneck

Neural Computation ◽

10.1162/neco_a_00961 ◽

2017 ◽

Vol 29 (6) ◽

pp. 1611-1630 ◽

Cited By ~ 20

Author(s):

DJ Strouse ◽

David J. Schwab

Keyword(s):

Mutual Information ◽

Cost Function ◽

Optimization Problem ◽

Synthetic Data ◽

Alternative Formulation ◽

Trade Off ◽

Information Theoretic ◽

Soft Clustering ◽

Information Bottleneck ◽

Hard Clustering

Lossy compression and clustering fundamentally involve a decision about which features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek ( 1999 ) formalized this notion as an information-theoretic optimization problem and proposed an optimal trade-off between throwing away as many bits as possible and selectively keeping those that are most important. In the IB, compression is measured by mutual information. Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB) and argue better captures this notion of compression. As suggested by its name, the solution to the DIB problem turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder, or soft clustering, that is optimal under the IB. We compare the IB and DIB on synthetic data, showing that the IB and DIB perform similarly in terms of the IB cost function, but that the DIB significantly outperforms the IB in terms of the DIB cost function. We also empirically find that the DIB offers a considerable gain in computational efficiency over the IB, over a range of convergence parameters. Our derivation of the DIB also suggests a method for continuously interpolating between the soft clustering of the IB and the hard clustering of the DIB.

Download Full-text

Do galactic bars depend on environment?: an information theoretic analysis of Galaxy Zoo 2

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3665 ◽

2020 ◽

Vol 501 (1) ◽

pp. 994-1001

Author(s):

Suman Sarkar ◽

Biswajit Pandey ◽

Snehasish Bhattacharjee

Keyword(s):

Spatial Distribution ◽

Mutual Information ◽

Local Density ◽

Statistical Significance ◽

Distribution Functions ◽

Cumulative Distribution ◽

Host Galaxy ◽

Data Sets ◽

Data Set ◽

Information Theoretic

ABSTRACT We use an information theoretic framework to analyse data from the Galaxy Zoo 2 project and study if there are any statistically significant correlations between the presence of bars in spiral galaxies and their environment. We measure the mutual information between the barredness of galaxies and their environments in a volume limited sample (Mr ≤ −21) and compare it with the same in data sets where (i) the bar/unbar classifications are randomized and (ii) the spatial distribution of galaxies are shuffled on different length scales. We assess the statistical significance of the differences in the mutual information using a t-test and find that both randomization of morphological classifications and shuffling of spatial distribution do not alter the mutual information in a statistically significant way. The non-zero mutual information between the barredness and environment arises due to the finite and discrete nature of the data set that can be entirely explained by mock Poisson distributions. We also separately compare the cumulative distribution functions of the barred and unbarred galaxies as a function of their local density. Using a Kolmogorov–Smirnov test, we find that the null hypothesis cannot be rejected even at $75{{\ \rm per\ cent}}$ confidence level. Our analysis indicates that environments do not play a significant role in the formation of a bar, which is largely determined by the internal processes of the host galaxy.

Download Full-text

SGAN4AbSum: A Semantic-Enhanced Generative Adversarial Network for Abstractive Text Summarization

10.21203/rs.3.rs-648146/v1 ◽

2021 ◽

Author(s):

Tham Vo

Keyword(s):

Ground Truth ◽

Text Summarization ◽

Generative Adversarial Network ◽

Convolutional Network ◽

Training Strategy ◽

Adversarial Network ◽

Deep Recurrent Neural Network ◽

Benchmark Datasets ◽

Latent Representations ◽

Abstractive Summarization

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.

Download Full-text

Bulk private curves require large conditional mutual information

Journal of High Energy Physics ◽

10.1007/jhep09(2021)042 ◽

2021 ◽

Vol 2021 (9) ◽

Author(s):

Alex May

Keyword(s):

Mutual Information ◽

Boundary Region ◽

Theoretic Approach ◽

Strong Correlations ◽

Conditional Mutual Information ◽

Information Theoretic ◽

Causal Curve ◽

Resource Requirements ◽

Theoretic Argument ◽

Information Theoretic Approach

Abstract We prove a theorem showing that the existence of “private” curves in the bulk of AdS implies two regions of the dual CFT share strong correlations. A private curve is a causal curve which avoids the entanglement wedge of a specified boundary region $$ \mathcal{U} $$ U . The implied correlation is measured by the conditional mutual information $$ I\left({\mathcal{V}}_1:\left.{\mathcal{V}}_2\right|\mathcal{U}\right) $$ I V 1 : V 2 U , which is O(1/GN) when a private causal curve exists. The regions $$ {\mathcal{V}}_1 $$ V 1 and $$ {\mathcal{V}}_2 $$ V 2 are specified by the endpoints of the causal curve and the placement of the region $$ \mathcal{U} $$ U . This gives a causal perspective on the conditional mutual information in AdS/CFT, analogous to the causal perspective on the mutual information given by earlier work on the connected wedge theorem. We give an information theoretic argument for our theorem, along with a bulk geometric proof. In the geometric perspective, the theorem follows from the maximin formula and entanglement wedge nesting. In the information theoretic approach, the theorem follows from resource requirements for sending private messages over a public quantum channel.

Download Full-text

Unsupervised Learning via Total Correlation Explanation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/740 ◽

2017 ◽

Cited By ~ 2

Author(s):

Greg Ver Steeg

Keyword(s):

Mutual Information ◽

Unsupervised Learning ◽

Supervised Learning ◽

Human Behavior ◽

Learning Problems ◽

Information Theoretic ◽

Sensory Environment ◽

Total Correlation ◽

Environment Dependence ◽

Multivariate Mutual Information

Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

Download Full-text

An Information-Theoretic Account of Semantic Interference in Word Production

Frontiers in Psychology ◽

10.3389/fpsyg.2021.672408 ◽

2021 ◽

Vol 12 ◽

Author(s):

Richard Futrell

Keyword(s):

Mutual Information ◽

Semantic Similarity ◽

Rate Distortion ◽

Word Production ◽

Interference Effects ◽

Semantic Interference ◽

Information Theoretic ◽

Human Data ◽

Level Model ◽

Computational Level

I present a computational-level model of semantic interference effects in online word production within a rate–distortion framework. I consider a bounded-rational agent trying to produce words. The agent's action policy is determined by maximizing accuracy in production subject to computational constraints. These computational constraints are formalized using mutual information. I show that semantic similarity-based interference among words falls out naturally from this setup, and I present a series of simulations showing that the model captures some of the key empirical patterns observed in Stroop and Picture–Word Interference paradigms, including comparisons to human data from previous experiments.

Download Full-text

CRMI: Confidence-rich Mutual Information for Information-theoretic Mapping

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3093023 ◽

2021 ◽

pp. 1-1

Author(s):

Yang Xu ◽

Ronghao Zheng ◽

Meiqin Liu ◽

Senlin Zhang

Keyword(s):

Mutual Information ◽

Information Theoretic

Download Full-text

Decomposing information into copying versus transformation

Journal of The Royal Society Interface ◽

10.1098/rsif.2019.0623 ◽

2020 ◽

Vol 17 (162) ◽

pp. 20190623 ◽

Cited By ~ 1

Author(s):

Artemy Kolchinsky ◽

Bernat Corominas-Murtra

Keyword(s):

Amino Acid ◽

Mutual Information ◽

Information Transfer ◽

World Systems ◽

Information Theoretic ◽

Versus Transformation ◽

Information Theoretic Measures ◽

Functional Consequences ◽

Standard Information ◽

Minimal Work

In many real-world systems, information can be transmitted in two qualitatively different ways: by copying or by transformation . Copying occurs when messages are transmitted without modification, e.g. when an offspring receives an unaltered copy of a gene from its parent. Transformation occurs when messages are modified systematically during transmission, e.g. when mutational biases occur during genetic replication. Standard information-theoretic measures do not distinguish these two modes of information transfer, although they may reflect different mechanisms and have different functional consequences. Starting from a few simple axioms, we derive a decomposition of mutual information into the information transmitted by copying versus the information transmitted by transformation. We begin with a decomposition that applies when the source and destination of the channel have the same set of messages and a notion of message identity exists. We then generalize our decomposition to other kinds of channels, which can involve different source and destination sets and broader notions of similarity. In addition, we show that copy information can be interpreted as the minimal work needed by a physical copying process, which is relevant for understanding the physics of replication. We use the proposed decomposition to explore a model of amino acid substitution rates. Our results apply to any system in which the fidelity of copying, rather than simple predictability, is of critical relevance.

Download Full-text