Session MA4b: Information theory and statistical learning

Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 438
Author(s):  
Ibrahim Alabdulmohsin

In this paper, we introduce the notion of “learning capacity” for algorithms that learn from data, which is analogous to the Shannon channel capacity for communication systems. We show how “learning capacity” bridges the gap between statistical learning theory and information theory, and we will use it to derive generalization bounds for finite hypothesis spaces, differential privacy, and countable domains, among others. Moreover, we prove that under the Axiom of Choice, the existence of an empirical risk minimization (ERM) rule that has a vanishing learning capacity is equivalent to the assertion that the hypothesis space has a finite Vapnik–Chervonenkis (VC) dimension, thus establishing an equivalence relation between two of the most fundamental concepts in statistical learning theory and information theory. In addition, we show how the learning capacity of an algorithm provides important qualitative results, such as on the relation between generalization and algorithmic stability, information leakage, and data processing. Finally, we conclude by listing some open problems and suggesting future directions of research.


Entropy ◽  
2018 ◽  
Vol 20 (10) ◽  
pp. 739 ◽  
Author(s):  
Alberto Beretta ◽  
Claudia Battistin ◽  
Clélia de Mulatier ◽  
Iacopo Mastromatteo ◽  
Matteo Marsili

Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g., in terms of pairwise dependencies)—as in statistical learning—or because they capture the laws of a specific phenomenon—as e.g., in physics—leading to non-trivial falsifiable predictions. In information theory, the simplicity of a model is quantified by the stochastic complexity, which measures the number of bits needed to encode its parameters. In order to understand how simple models look like, we study the stochastic complexity of spin models with interactions of arbitrary order. We show that bijections within the space of possible interactions preserve the stochastic complexity, which allows to partition the space of all models into equivalence classes. We thus found that the simplicity of a model is not determined by the order of the interactions, but rather by their mutual arrangements. Models where statistical dependencies are localized on non-overlapping groups of few variables are simple, affording predictions on independencies that are easy to falsify. On the contrary, fully connected pairwise models, which are often used in statistical learning, appear to be highly complex, because of their extended set of interactions, and they are hard to falsify.


2018 ◽  
Vol 68 (5) ◽  
pp. 1149-1172
Author(s):  
Milan Stehlík ◽  
Ján Somorčík ◽  
Luboš Střelec ◽  
Jaromír Antoch

Abstract In this paper we give a partial response to one of the most important statistical questions, namely, what optimal statistical decisions are and how they are related to (statistical) information theory. We exemplify the necessity of understanding the structure of information divergences and their approximations, which may in particular be understood through deconvolution. Deconvolution of information divergences is illustrated in the exponential family of distributions, leading to the optimal tests in the Bahadur sense. We provide a new approximation of I-divergences using the Fourier transformation, saddle point approximation, and uniform convergence of the Euler polygons. Uniform approximation of deconvoluted parts of I-divergences is also discussed. Our approach is illustrated on a real data example.


Author(s):  
Ana Franco ◽  
Julia Eberlen ◽  
Arnaud Destrebecqz ◽  
Axel Cleeremans ◽  
Julie Bertels

Abstract. The Rapid Serial Visual Presentation procedure is a method widely used in visual perception research. In this paper we propose an adaptation of this method which can be used with auditory material and enables assessment of statistical learning in speech segmentation. Adult participants were exposed to an artificial speech stream composed of statistically defined trisyllabic nonsense words. They were subsequently instructed to perform a detection task in a Rapid Serial Auditory Presentation (RSAP) stream in which they had to detect a syllable in a short speech stream. Results showed that reaction times varied as a function of the statistical predictability of the syllable: second and third syllables of each word were responded to faster than first syllables. This result suggests that the RSAP procedure provides a reliable and sensitive indirect measure of auditory statistical learning.


Author(s):  
Charles A. Doan ◽  
Ronaldo Vigo

Abstract. Several empirical investigations have explored whether observers prefer to sort sets of multidimensional stimuli into groups by employing one-dimensional or family-resemblance strategies. Although one-dimensional sorting strategies have been the prevalent finding for these unsupervised classification paradigms, several researchers have provided evidence that the choice of strategy may depend on the particular demands of the task. To account for this disparity, we propose that observers extract relational patterns from stimulus sets that facilitate the development of optimal classification strategies for relegating category membership. We conducted a novel constrained categorization experiment to empirically test this hypothesis by instructing participants to either add or remove objects from presented categorical stimuli. We employed generalized representational information theory (GRIT; Vigo, 2011b , 2013a , 2014 ) and its associated formal models to predict and explain how human beings chose to modify these categorical stimuli. Additionally, we compared model performance to predictions made by a leading prototypicality measure in the literature.


Sign in / Sign up

Export Citation Format

Share Document