Statistical mechanics approach to early stopping and weight decay

Siegfried Bös

doi:10.1103/physreve.58.833

Statistical mechanics approach to early stopping and weight decay

Physical Review E ◽

10.1103/physreve.58.833 ◽

1998 ◽

Vol 58 (1) ◽

pp. 833-844 ◽

Cited By ~ 6

Author(s):

Siegfried Bös

Keyword(s):

Statistical Mechanics ◽

Early Stopping ◽

Weight Decay

Download Full-text

Noise injection for training artificial neural networks: A comparison with weight decay and early stopping

Medical Physics ◽

10.1118/1.3213517 ◽

2009 ◽

Vol 36 (10) ◽

pp. 4810-4818 ◽

Cited By ~ 62

Author(s):

Richard M. Zur ◽

Yulei Jiang ◽

Lorenzo L. Pesce ◽

Karen Drukker

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Early Stopping ◽

Weight Decay ◽

Noise Injection ◽

Artificial Neural

Download Full-text

Gradient Regularization as Approximate Variational Inference

Entropy ◽

10.3390/e23121629 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1629

Author(s):

Ali Unlu ◽

Laurence Aitchison

Keyword(s):

Test Performance ◽

Local Approximation ◽

Variational Inference ◽

A Posteriori ◽

Early Stopping ◽

Weight Decay ◽

Stochastic Sampling ◽

The Neural Network ◽

Standard Sampling ◽

Variance Parameters

We developed Variational Laplace for Bayesian neural networks (BNNs), which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is the log-likelihood plus weight-decay, plus a squared-gradient regularizer. Variational Laplace gave better test performance and expected calibration errors than maximum a posteriori inference and standard sampling-based variational inference, despite using the same variational approximate posterior. Finally, we emphasize the care needed in benchmarking standard VI, as there is a risk of stopping before the variance parameters have converged. We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.

Download Full-text