Global optimization for neural network training

Computer ◽  
1996 ◽  
Vol 29 (3) ◽  
pp. 45-54 ◽  
Author(s):  
Yi Shang ◽  
B.W. Wah
Author(s):  
D. Geraldine Bessie Amali ◽  
Dinakaran M.

Artificial Neural Networks have earned popularity in recent years because of their ability to approximate nonlinear functions. Training a neural network involves minimizing the mean square error between the target and network output. The error surface is nonconvex and highly multimodal. Finding the minimum of a multimodal function is a NP complete problem and cannot be solved completely. Thus application of heuristic global optimization algorithms that computes a good global minimum to neural network training is of interest. This paper reviews the various heuristic global optimization algorithms used for training feedforward neural networks and recurrent neural networks. The training algorithms are compared in terms of the learning rate, convergence speed and accuracy of the output produced by the neural network. The paper concludes by suggesting directions for novel ANN training algorithms based on recent advances in global optimization.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 711
Author(s):  
Mina Basirat ◽  
Bernhard C. Geiger ◽  
Peter M. Roth

Information plane analysis, describing the mutual information between the input and a hidden layer and between a hidden layer and the target over time, has recently been proposed to analyze the training of neural networks. Since the activations of a hidden layer are typically continuous-valued, this mutual information cannot be computed analytically and must thus be estimated, resulting in apparently inconsistent or even contradicting results in the literature. The goal of this paper is to demonstrate how information plane analysis can still be a valuable tool for analyzing neural network training. To this end, we complement the prevailing binning estimator for mutual information with a geometric interpretation. With this geometric interpretation in mind, we evaluate the impact of regularization and interpret phenomena such as underfitting and overfitting. In addition, we investigate neural network learning in the presence of noisy data and noisy labels.


Sign in / Sign up

Export Citation Format

Share Document