Analysis of gradient descent learning algorithms for multilayer feedforward neural networks

H. Guo; S.B. Gelfand

doi:10.1109/31.85630

Analysis of gradient descent learning algorithms for multilayer feedforward neural networks

10.1109/cdc.1990.203921 ◽

1990 ◽

Cited By ~ 1

Author(s):

H. Guo ◽

S.B. Gelfand

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithms ◽

Feedforward Neural Networks

Download Full-text

Diffusion learning algorithms for feedforward neural networks

Cybernetics and Systems Analysis ◽

10.1007/s10559-013-9516-1 ◽

2013 ◽

Vol 49 (3) ◽

pp. 334-346 ◽

Cited By ~ 1

Author(s):

B. A. Skorohod

Keyword(s):

Neural Networks ◽

Learning Algorithms ◽

Feedforward Neural Networks

Download Full-text

Optical Recognition of Handwritten Logic Formulas Using Neural Networks

Electronics ◽

10.3390/electronics10222761 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2761

Author(s):

Vaios Ampelakiotis ◽

Isidoros Perikos ◽

Ioannis Hatzilygeroudis ◽

George Tsihrintzis

Keyword(s):

Neural Networks ◽

Character Recognition ◽

Gradient Descent ◽

Feedforward Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Gradient Descent Algorithm ◽

Two Stages ◽

And Training

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

Wirtinger Calculus Based Gradient Descent and Levenberg-Marquardt Learning Algorithms in Complex-Valued Neural Networks

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24955-6_66 ◽

2011 ◽

pp. 550-559 ◽

Cited By ~ 16

Author(s):

Md. Faijul Amin ◽

Muhammad Ilias Amin ◽

A. Y. H. Al-Nuaimi ◽

Kazuyuki Murase

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithms ◽

Wirtinger Calculus ◽

Levenberg Marquardt ◽

Complex Valued

Download Full-text

The Eighty Five Percent Rule for Optimal Learning

10.1101/255182 ◽

2018 ◽

Cited By ~ 2

Author(s):

Robert C. Wilson ◽

Amitai Shenhav ◽

Mark Straccia ◽

Jonathan D. Cohen

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Broad Class ◽

Binary Classification ◽

Learning Algorithms ◽

Sweet Spot ◽

Optimal Learning ◽

Rate Of Learning ◽

Classification Tasks

AbstractResearchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.

Download Full-text

A CONCISE PRESENTATION OF SUPERVISED LEARNING ALGORITHMS FOR FEEDFORWARD NEURAL NETWORKS

Advances in Control Education 1994 ◽

10.1016/b978-0-08-042230-5.50027-x ◽

1995 ◽

pp. 91-94

Author(s):

SYED MURTUZA

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Learning Algorithms ◽

Feedforward Neural Networks ◽

Supervised Learning Algorithms

Download Full-text

Multilayered Feedforward Neural Networks (MFNNs) and Backpropagation Learning Algorithms

Static and Dynamic Neural Networks ◽

10.1002/0471427950.ch4 ◽

2005 ◽

pp. 103-170

Keyword(s):

Neural Networks ◽

Learning Algorithms ◽

Feedforward Neural Networks ◽

Backpropagation Learning

Download Full-text

Synthetic Dataset Generation of Driver Telematics

Risks ◽

10.3390/risks9040058 ◽

2021 ◽

Vol 9 (4) ◽

pp. 58

Author(s):

Banghee So ◽

Jean-Philippe Boucher ◽

Emiliano A. Valdez

Keyword(s):

Neural Networks ◽

Regression Models ◽

Learning Algorithms ◽

Feedforward Neural Networks ◽

Machine Learning Algorithms ◽

Synthetic Dataset ◽

Data Summarization ◽

Second Stage ◽

Third Stage ◽

Stage Process

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable.

Download Full-text

FUZZY RULE EXTRACTION FROM A FEED FORWARD NEURAL NETWORK BY TRAINING A REPRESENTATIVE FUZZY NEURAL NETWORK USING GRADIENT DESCENT

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488505003746 ◽

2005 ◽

Vol 13 (06) ◽

pp. 673-698 ◽

Cited By ~ 3

Author(s):

ROELOF K. BROUWER

Keyword(s):

Neural Network ◽

Neural Networks ◽

Membership Function ◽

Gradient Descent ◽

Fuzzy Rule ◽

Feedforward Neural Networks ◽

Feed Forward Neural Network ◽

Membership Value ◽

Using Data ◽

The Cost

Neural networks are good at representing functions or data transformations. However just as in the case of the biological brain the mathematical description of the data transformation is hidden. In the case of the human brain the transformation, in terms of rules, may be extracted by interviewing the person, In the case of the artificial neural network other approaches have to be utilized. In the case described here a second neural network that represents the transformation in terms of fuzzy rules is trained using gradient descent. The parameters that are learned are the parameters of the fuzzy sets and also the connection weights in [0,1] between the outputs of the membership function units and the final output units. There is an output unit for each rule and consequent membership function. The fuzzy output set with the highest membership value is taken to be the output fuzzy set. The extracted rules are of the form if x 0 is Small or x 0 is Medium and x 1 is Large or x 1 is Medium then y is Large. x 0 and x 1 are inputs and y is the output. The cost measure consists of several terms indicating how close the actual output is to a target output, how close the weights are to 0 and 1, and how close the output of membership values is to a 1 of n vector. The cost measure is a linear combination of these individual terms. By changing the constant multipliers the relative importance of the cost measures can be changed and studied. The method has been tried on randomly generated feedforward neural networks and also on data produced by functions with specific properties. The fizzy network is trained using data produced by the feedforward neural network or the known function. This method can also be used in extracting rules such as control rules implicitly used by a human if input and output data is gathered from the human.

Download Full-text

Learning Algorithms for Neural Networks Based on Quasi-Newton Methods With Self-Scaling

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.2897405 ◽

1993 ◽

Vol 115 (1) ◽

pp. 38-43 ◽

Cited By ~ 16

Author(s):

H. S. M. Beigi ◽

C. J. Li

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Learning Algorithms ◽

Feedforward Neural Networks ◽

Optimization Techniques ◽

Newton Methods ◽

Convergence Properties ◽

Variable Metric ◽

Quasi Newton ◽

Number Of Iterations

Previous studies have suggested that, for moderate sized neural networks, the use of classical Quasi-Newton methods yields the best convergence properties among all the state-of-the-art [1]. This paper describes a set of even better learning algorithms based on a class of Quasi-Newton optimization techniques called Self-Scaling Variable Metric (SSVM) methods. One of the characteristics of SSVM methods is that they provide a set of search directions which are invariant under the scaling of the objective function. With an XOR benchmark and an encoder benchmark, simulations using the SSVM algorithms for the learning of general feedforward neural networks were carried out to study their performance. Compared to classical Quasi-Newton methods, it is shown that the SSVM method reduces the number of iterations required for convergence by 40 percent to 60 percent that of the classical Quasi-Newton methods which, in general, converge two to three orders of magnitude faster than the steepest descent techniques.

Download Full-text