Gaussian Mean Field Regularizes by Limiting Learned Information

Julius Kunze; Louis Kirsch; Hippolyt Ritter; David Barber

doi:10.3390/e21080758

Gaussian Mean Field Regularizes by Limiting Learned Information

Entropy ◽

10.3390/e21080758 ◽

2019 ◽

Vol 21 (8) ◽

pp. 758 ◽

Cited By ~ 1

Author(s):

Julius Kunze ◽

Louis Kirsch ◽

Hippolyt Ritter ◽

David Barber

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Hidden Variables ◽

Mean Field ◽

Variational Inference ◽

Generalization Error ◽

Maximum Capacity ◽

Kl Divergence ◽

Posterior Variance ◽

Posterior Estimate

Variational inference with a factorized Gaussian posterior estimate is a widely-used approach for learning parameters and hidden variables. Empirically, a regularizing effect can be observed that is poorly understood. In this work, we show how mean field inference improves generalization by limiting mutual information between learned parameters and the data through noise. We quantify a maximum capacity when the posterior variance is either fixed or learned and connect it to generalization error, even when the KL-divergence in the objective is scaled by a constant. Our experiments suggest that bounding information between parameters and data effectively regularizes neural networks on both supervised and unsupervised tasks.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text

Disentangled Representations by Pseudo-Maximum Mutual Information for Interpreting Multi-Layered Neural Networks

2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI) ◽

10.1109/iiai-aai50415.2020.00094 ◽

2020 ◽

Author(s):

Ryotaro Kamimura

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Maximum Mutual Information

Download Full-text

Biomimetic Optimal Tracking Control using Mean Field Games and Spiking Neural Networks

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2281 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8112-8117

Author(s):

Zejian Zhou ◽

M. Sami Fadali ◽

Hao Xu

Keyword(s):

Neural Networks ◽

Tracking Control ◽

Mean Field ◽

Spiking Neural Networks ◽

Mean Field Games ◽

Optimal Tracking ◽

Optimal Tracking Control

Download Full-text

Stochastic mean-field formulation of the dynamics of diluted neural networks

BMC Neuroscience ◽

10.1186/1471-2202-16-s1-p263 ◽

2015 ◽

Vol 16 (S1) ◽

Author(s):

David Angulo-Garcia ◽

Alessandro Torcini

Keyword(s):

Neural Networks ◽

Mean Field

Download Full-text

Explainable Link Prediction based on Mutual Information and Graph Neural Networks

KIISE Transactions on Computing Practices ◽

10.5626/ktcp.2021.27.9.407 ◽

2021 ◽

Vol 27 (9) ◽

pp. 407-412

Author(s):

Seolhee Jeon ◽

Kwang Hee Lee ◽

Myoung Ho Kim

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Link Prediction ◽

Graph Neural Networks

Download Full-text

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462620 ◽

2018 ◽

Cited By ~ 5

Author(s):

Shuai Wang ◽

Yanmin Qian ◽

Kai Yu

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Speaker Identification ◽

Kl Divergence

Download Full-text

Identifying graph clusters using variational inference and links to covariance parametrization

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2009.0117 ◽

2009 ◽

Vol 367 (1906) ◽

pp. 4407-4426

Author(s):

David Barber

Keyword(s):

Mean Field ◽

Matrix Decomposition ◽

Difficult Problem ◽

Positive Definite ◽

Variational Inference ◽

Field Theories ◽

Variational Approximation ◽

Positive Definite Matrices ◽

The Matrix ◽

Non Gaussian

Finding clusters of well-connected nodes in a graph is a problem common to many domains, including social networks, the Internet and bioinformatics. From a computational viewpoint, finding these clusters or graph communities is a difficult problem. We use a clique matrix decomposition based on a statistical description that encourages clusters to be well connected and few in number. The formal intractability of inferring the clusters is addressed using a variational approximation inspired by mean-field theories in statistical mechanics. Clique matrices also play a natural role in parametrizing positive definite matrices under zero constraints on elements of the matrix. We show that clique matrices can parametrize all positive definite matrices restricted according to a decomposable graph and form a structured factor analysis approximation in the non-decomposable case. Extensions to conjugate Bayesian covariance priors and more general non-Gaussian independence models are briefly discussed.

Download Full-text

Neutral Stability, Rate Propagation, and Critical Branching in Feedforward Networks

Neural Computation ◽

10.1162/neco_a_00461 ◽

2013 ◽

Vol 25 (7) ◽

pp. 1768-1806 ◽

Cited By ~ 5

Author(s):

N. Alex Cayco-Gajic ◽

Eric Shea-Brown

Keyword(s):

Neural Networks ◽

Markov Chain ◽

Mean Field ◽

Neutral Stability ◽

Feedforward Networks ◽

Firing Rates ◽

Wide Range ◽

Complex Models ◽

Dynamical Properties ◽

Parameter Ranges

Recent experimental and computational evidence suggests that several dynamical properties may characterize the operating point of functioning neural networks: critical branching, neutral stability, and production of a wide range of firing patterns. We seek the simplest setting in which these properties emerge, clarifying their origin and relationship in random, feedforward networks of McCullochs-Pitts neurons. Two key parameters are the thresholds at which neurons fire spikes and the overall level of feedforward connectivity. When neurons have low thresholds, we show that there is always a connectivity for which the properties in question all occur, that is, these networks preserve overall firing rates from layer to layer and produce broad distributions of activity in each layer. This fails to occur, however, when neurons have high thresholds. A key tool in explaining this difference is the eigenstructure of the resulting mean-field Markov chain, as this reveals which activity modes will be preserved from layer to layer. We extend our analysis from purely excitatory networks to more complex models that include inhibition and local noise, and find that both of these features extend the parameter ranges over which networks produce the properties of interest.

Download Full-text

Clinical Outcome Prediction in Aneurysmal Subarachnoid Hemorrhage Using Bayesian Neural Networks with Fuzzy Logic Inferences

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/904860 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 12

Author(s):

Benjamin W. Y. Lo ◽

R. Loch Macdonald ◽

Andrew Baker ◽

Mitchell A. H. Levine

Keyword(s):

Neural Networks ◽

Fuzzy Logic ◽

Subarachnoid Hemorrhage ◽

Aneurysmal Subarachnoid Hemorrhage ◽

Hidden Variables ◽

Decision Rules ◽

Bayesian Neural Networks ◽

Prognostic System ◽

History Of ◽

Prediction Approach

Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH).Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients).Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs). Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique) denoted cut-off points for poor prognosis at greater than 2.5 clusters.Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication.

Download Full-text

Flat Minima

Neural Computation ◽

10.1162/neco.1997.9.1.1 ◽

1997 ◽

Vol 9 (1) ◽

pp. 1-42 ◽

Cited By ~ 156

Author(s):

Sepp Hochreiter ◽

Jürgen Schmidhuber

Keyword(s):

Neural Networks ◽

Error Function ◽

Low Complexity ◽

Generalization Error ◽

Input Output ◽

Generalization Capability ◽

Training Set ◽

Weight Decay ◽

Optimal Brain Surgeon ◽

And Training

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”

Download Full-text