Session MA4b: Information theory and statistical learning

Towards a Unified Theory of Learning and Information

Entropy ◽

10.3390/e22040438 ◽

2020 ◽

Vol 22 (4) ◽

pp. 438

Author(s):

Ibrahim Alabdulmohsin

Keyword(s):

Information Theory ◽

Statistical Learning ◽

Learning Theory ◽

Communication Systems ◽

Differential Privacy ◽

Statistical Learning Theory ◽

Information Leakage ◽

Risk Minimization ◽

Open Problems ◽

Learning Capacity

In this paper, we introduce the notion of “learning capacity” for algorithms that learn from data, which is analogous to the Shannon channel capacity for communication systems. We show how “learning capacity” bridges the gap between statistical learning theory and information theory, and we will use it to derive generalization bounds for finite hypothesis spaces, differential privacy, and countable domains, among others. Moreover, we prove that under the Axiom of Choice, the existence of an empirical risk minimization (ERM) rule that has a vanishing learning capacity is equivalent to the assertion that the hypothesis space has a finite Vapnik–Chervonenkis (VC) dimension, thus establishing an equivalence relation between two of the most fundamental concepts in statistical learning theory and information theory. In addition, we show how the learning capacity of an algorithm provides important qualitative results, such as on the relation between generalization and algorithmic stability, information leakage, and data processing. Finally, we conclude by listing some open problems and suggesting future directions of research.

Download Full-text

Information Theory and Statistical Learning

10.1007/978-0-387-84816-7 ◽

2009 ◽

Cited By ~ 5

Keyword(s):

Information Theory ◽

Statistical Learning

Download Full-text

The Stochastic Complexity of Spin Models: Are Pairwise Models Really Simple?

Entropy ◽

10.3390/e20100739 ◽

2018 ◽

Vol 20 (10) ◽

pp. 739 ◽

Cited By ~ 2

Author(s):

Alberto Beretta ◽

Claudia Battistin ◽

Clélia de Mulatier ◽

Iacopo Mastromatteo ◽

Matteo Marsili

Keyword(s):

Information Theory ◽

Statistical Learning ◽

Arbitrary Order ◽

Equivalence Classes ◽

Computationally Efficient ◽

Spin Models ◽

Stochastic Complexity ◽

Pairwise Models ◽

Fully Connected ◽

Simple Models

Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g., in terms of pairwise dependencies)—as in statistical learning—or because they capture the laws of a specific phenomenon—as e.g., in physics—leading to non-trivial falsifiable predictions. In information theory, the simplicity of a model is quantified by the stochastic complexity, which measures the number of bits needed to encode its parameters. In order to understand how simple models look like, we study the stochastic complexity of spin models with interactions of arbitrary order. We show that bijections within the space of possible interactions preserve the stochastic complexity, which allows to partition the space of all models into equivalence classes. We thus found that the simplicity of a model is not determined by the order of the interactions, but rather by their mutual arrangements. Models where statistical dependencies are localized on non-overlapping groups of few variables are simple, affording predictions on independencies that are easy to falsify. On the contrary, fully connected pairwise models, which are often used in statistical learning, appear to be highly complex, because of their extended set of interactions, and they are hard to falsify.

Download Full-text

Approximation of Information Divergences for Statistical Learning with Applications

Mathematica Slovaca ◽

10.1515/ms-2017-0177 ◽

2018 ◽

Vol 68 (5) ◽

pp. 1149-1172

Author(s):

Milan Stehlík ◽

Ján Somorčík ◽

Luboš Střelec ◽

Jaromír Antoch

Keyword(s):

Information Theory ◽

Statistical Learning ◽

Fourier Transformation ◽

Exponential Family ◽

Real Data ◽

Statistical Information ◽

Saddle Point Approximation ◽

Optimal Tests ◽

Exponential Family Of Distributions ◽

Family Of Distributions

Abstract In this paper we give a partial response to one of the most important statistical questions, namely, what optimal statistical decisions are and how they are related to (statistical) information theory. We exemplify the necessity of understanding the structure of information divergences and their approximations, which may in particular be understood through deconvolution. Deconvolution of information divergences is illustrated in the exponential family of distributions, leading to the optimal tests in the Bahadur sense. We provide a new approximation of I-divergences using the Fourier transformation, saddle point approximation, and uniform convergence of the Euler polygons. Uniform approximation of deconvoluted parts of I-divergences is also discussed. Our approach is illustrated on a real data example.

Download Full-text

What Determines Visual Statistical Learning Performance? Insights From Information Theory

Cognitive Science ◽

10.1111/cogs.12803 ◽

2019 ◽

Vol 43 (12) ◽

Cited By ~ 1

Author(s):

Noam Siegelman ◽

Louisa Bogaerts ◽

Ram Frost

Keyword(s):

Information Theory ◽

Statistical Learning ◽

Learning Performance ◽

Visual Statistical Learning

Download Full-text

Quantum Information Theory

10.1017/9781316809976 ◽

2016 ◽

Cited By ~ 120

Author(s):

Mark M. Wilde

Keyword(s):

Information Theory ◽

Quantum Information ◽

Quantum Information Theory

Download Full-text

Network Information Theory

10.1017/cbo9781139030687 ◽

2011 ◽

Cited By ~ 1195

Author(s):

Abbas El Gamal ◽

Young-Han Kim

Keyword(s):

Information Theory ◽

Network Information Theory ◽

Network Information

Download Full-text

Information Theory and Coding by Example

10.1017/cbo9781139028448 ◽

2009 ◽

Cited By ~ 7

Author(s):

Mark Kelbert ◽

Yuri Suhov

Keyword(s):

Information Theory

Download Full-text

Rapid Serial Auditory Presentation

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000295 ◽

2015 ◽

Vol 62 (5) ◽

pp. 346-351 ◽

Cited By ~ 15

Author(s):

Ana Franco ◽

Julia Eberlen ◽

Arnaud Destrebecqz ◽

Axel Cleeremans ◽

Julie Bertels

Keyword(s):

Statistical Learning ◽

Reaction Times ◽

Visual Presentation ◽

Detection Task ◽

Speech Segmentation ◽

Auditory Presentation ◽

Indirect Measure ◽

Speech Stream ◽

Adult Participants ◽

Presentation Procedure

Abstract. The Rapid Serial Visual Presentation procedure is a method widely used in visual perception research. In this paper we propose an adaptation of this method which can be used with auditory material and enables assessment of statistical learning in speech segmentation. Adult participants were exposed to an artificial speech stream composed of statistically defined trisyllabic nonsense words. They were subsequently instructed to perform a detection task in a Rapid Serial Auditory Presentation (RSAP) stream in which they had to detect a syllable in a short speech stream. Results showed that reaction times varied as a function of the statistical predictability of the syllable: second and third syllables of each word were responded to faster than first syllables. This result suggests that the RSAP procedure provides a reliable and sensitive indirect measure of auditory statistical learning.

Download Full-text

Constructing and Deconstructing Concepts

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000337 ◽

2016 ◽

Vol 63 (5) ◽

pp. 249-262 ◽

Cited By ~ 2

Author(s):

Charles A. Doan ◽

Ronaldo Vigo

Keyword(s):

Information Theory ◽

Model Performance ◽

Unsupervised Classification ◽

Family Resemblance ◽

Category Membership ◽

Formal Models ◽

Human Beings ◽

One Dimensional ◽

Relational Patterns ◽

Optimal Classification

Abstract. Several empirical investigations have explored whether observers prefer to sort sets of multidimensional stimuli into groups by employing one-dimensional or family-resemblance strategies. Although one-dimensional sorting strategies have been the prevalent finding for these unsupervised classification paradigms, several researchers have provided evidence that the choice of strategy may depend on the particular demands of the task. To account for this disparity, we propose that observers extract relational patterns from stimulus sets that facilitate the development of optimal classification strategies for relegating category membership. We conducted a novel constrained categorization experiment to empirically test this hypothesis by instructing participants to either add or remove objects from presented categorical stimuli. We employed generalized representational information theory (GRIT; Vigo, 2011b , 2013a , 2014 ) and its associated formal models to predict and explain how human beings chose to modify these categorical stimuli. Additionally, we compared model performance to predictions made by a leading prototypicality measure in the literature.

Download Full-text