Analysis of gradient descent learning algorithms for multilayer feedforward neural networks

1991 ◽  
Vol 38 (8) ◽  
pp. 883-894 ◽  
Author(s):  
H. Guo ◽  
S.B. Gelfand
Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2761
Author(s):  
Vaios Ampelakiotis ◽  
Isidoros Perikos ◽  
Ioannis Hatzilygeroudis ◽  
George Tsihrintzis

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.


2018 ◽  
Author(s):  
Robert C. Wilson ◽  
Amitai Shenhav ◽  
Mark Straccia ◽  
Jonathan D. Cohen

AbstractResearchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.


Risks ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 58
Author(s):  
Banghee So ◽  
Jean-Philippe Boucher ◽  
Emiliano A. Valdez

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable.


Author(s):  
ROELOF K. BROUWER

Neural networks are good at representing functions or data transformations. However just as in the case of the biological brain the mathematical description of the data transformation is hidden. In the case of the human brain the transformation, in terms of rules, may be extracted by interviewing the person, In the case of the artificial neural network other approaches have to be utilized. In the case described here a second neural network that represents the transformation in terms of fuzzy rules is trained using gradient descent. The parameters that are learned are the parameters of the fuzzy sets and also the connection weights in [0,1] between the outputs of the membership function units and the final output units. There is an output unit for each rule and consequent membership function. The fuzzy output set with the highest membership value is taken to be the output fuzzy set. The extracted rules are of the form if x 0 is Small or x 0 is Medium and x 1 is Large or x 1 is Medium then y is Large. x 0 and x 1 are inputs and y is the output. The cost measure consists of several terms indicating how close the actual output is to a target output, how close the weights are to 0 and 1, and how close the output of membership values is to a 1 of n vector. The cost measure is a linear combination of these individual terms. By changing the constant multipliers the relative importance of the cost measures can be changed and studied. The method has been tried on randomly generated feedforward neural networks and also on data produced by functions with specific properties. The fizzy network is trained using data produced by the feedforward neural network or the known function. This method can also be used in extracting rules such as control rules implicitly used by a human if input and output data is gathered from the human.


1993 ◽  
Vol 115 (1) ◽  
pp. 38-43 ◽  
Author(s):  
H. S. M. Beigi ◽  
C. J. Li

Previous studies have suggested that, for moderate sized neural networks, the use of classical Quasi-Newton methods yields the best convergence properties among all the state-of-the-art [1]. This paper describes a set of even better learning algorithms based on a class of Quasi-Newton optimization techniques called Self-Scaling Variable Metric (SSVM) methods. One of the characteristics of SSVM methods is that they provide a set of search directions which are invariant under the scaling of the objective function. With an XOR benchmark and an encoder benchmark, simulations using the SSVM algorithms for the learning of general feedforward neural networks were carried out to study their performance. Compared to classical Quasi-Newton methods, it is shown that the SSVM method reduces the number of iterations required for convergence by 40 percent to 60 percent that of the classical Quasi-Newton methods which, in general, converge two to three orders of magnitude faster than the steepest descent techniques.


Sign in / Sign up

Export Citation Format

Share Document