An Enhanced Optimization Scheme Based on Gradient Descent Methods for Machine Learning

Dokkyun Yi; Sangmin Ji; Sunyoung Bu

doi:10.3390/sym11070942

An Enhanced Optimization Scheme Based on Gradient Descent Methods for Machine Learning

Symmetry ◽

10.3390/sym11070942 ◽

2019 ◽

Vol 11 (7) ◽

pp. 942 ◽

Cited By ~ 3

Author(s):

Dokkyun Yi ◽

Sangmin Ji ◽

Sunyoung Bu

Keyword(s):

Machine Learning ◽

Cost Function ◽

Gradient Descent ◽

Local Minimum ◽

Global Minimum ◽

Convergence Condition ◽

First Derivative ◽

Estimation Scheme ◽

The Cost ◽

Learning Data

A The learning process of machine learning consists of finding values of unknown weights in a cost function by minimizing the cost function based on learning data. However, since the cost function is not convex, it is conundrum to find the minimum value of the cost function. The existing methods used to find the minimum values usually use the first derivative of the cost function. When even the local minimum (but not a global minimum) is reached, since the first derivative of the cost function becomes zero, the methods give the local minimum values, so that the desired global minimum cannot be found. To overcome this problem, in this paper we modified one of the existing schemes—the adaptive momentum estimation scheme—by adding a new term, so that it can prevent the new optimizer from staying at local minimum. The convergence condition for the proposed scheme and the convergence value are also analyzed, and further explained through several numerical experiments whose cost function is non-convex.

Download Full-text

An Effective Optimization Method for Machine Learning Based on ADAM

Applied Sciences ◽

10.3390/app10031073 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1073 ◽

Cited By ~ 4

Author(s):

Dokkyun Yi ◽

Jaehyun Ahn ◽

Sangmin Ji

Keyword(s):

Machine Learning ◽

Cost Function ◽

Gradient Descent ◽

Local Minimum ◽

Optimization Method ◽

Activation Function ◽

Numerical Comparison ◽

First Derivative ◽

Artificial Neural Network Ann ◽

The Cost

A machine is taught by finding the minimum value of the cost function which is induced by learning data. Unfortunately, as the amount of learning increases, the non-liner activation function in the artificial neural network (ANN), the complexity of the artificial intelligence structures, and the cost function’s non-convex complexity all increase. We know that a non-convex function has local minimums, and that the first derivative of the cost function is zero at a local minimum. Therefore, the methods based on a gradient descent optimization do not undergo further change when they fall to a local minimum because they are based on the first derivative of the cost function. This paper introduces a novel optimization method to make machine learning more efficient. In other words, we construct an effective optimization method for non-convex cost function. The proposed method solves the problem of falling into a local minimum by adding the cost function in the parameter update rule of the ADAM method. We prove the convergence of the sequences generated from the proposed method and the superiority of the proposed method by numerical comparison with gradient descent (GD, ADAM, and AdaMax).

Download Full-text

An Adaptive Optimization Method Based on Learning Rate Schedule for Neural Networks

Applied Sciences ◽

10.3390/app11020850 ◽

2021 ◽

Vol 11 (2) ◽

pp. 850

Author(s):

Dokkyun Yi ◽

Sangmin Ji ◽

Jieun Park

Keyword(s):

Artificial Intelligence ◽

Cost Function ◽

Numerical Experiments ◽

Global Minimum ◽

Optimization Method ◽

Learning Method ◽

Adaptive Optimization ◽

The Cost ◽

Proof Of Convergence ◽

Learning Data

Artificial intelligence (AI) is achieved by optimizing the cost function constructed from learning data. Changing the parameters in the cost function is an AI learning process (or AI learning for convenience). If AI learning is well performed, then the value of the cost function is the global minimum. In order to obtain the well-learned AI learning, the parameter should be no change in the value of the cost function at the global minimum. One useful optimization method is the momentum method; however, the momentum method has difficulty stopping the parameter when the value of the cost function satisfies the global minimum (non-stop problem). The proposed method is based on the momentum method. In order to solve the non-stop problem of the momentum method, we use the value of the cost function to our method. Therefore, as the learning method processes, the mechanism in our method reduces the amount of change in the parameter by the effect of the value of the cost function. We verified the method through proof of convergence and numerical experiments with existing methods to ensure that the learning works well.

Download Full-text

Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smoother

Nonlinear Processes in Geophysics ◽

10.5194/npg-25-315-2018 ◽

2018 ◽

Vol 25 (2) ◽

pp. 315-334 ◽

Cited By ~ 2

Author(s):

Anthony Fillion ◽

Marc Bocquet ◽

Serge Gratton

Keyword(s):

Data Assimilation ◽

Cost Function ◽

Global Minimum ◽

Numerical Study ◽

Variational Data Assimilation ◽

Kalman Smoother ◽

Local Extrema ◽

Starting Point ◽

The Cost ◽

Temporal Extent

Abstract. The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss–Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.

Download Full-text

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Statistics and Computing ◽

10.1007/s11222-020-09964-4 ◽

2020 ◽

Vol 30 (6) ◽

pp. 1645-1663

Author(s):

Ömer Deniz Akyildiz ◽

Dan Crisan ◽

Joaquín Míguez

Keyword(s):

Monte Carlo ◽

Cost Function ◽

Global Minimum ◽

Sequential Monte Carlo ◽

Convergence Rates ◽

Optimization Problems ◽

Search Space ◽

Gradient Based ◽

Multiple Minima ◽

The Cost

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques.

Download Full-text

A Genetic Algorithm for Solving Single Level Lot–Sizing Problems

Jurnal Teknologi ◽

10.11113/jt.v38.499 ◽

2012 ◽

Author(s):

Nasaruddin Zenon ◽

Ab. Rahman Ahmad ◽

Rosmah Ali

Keyword(s):

Genetic Algorithm ◽

Dynamic Programming ◽

Key Words ◽

Local Minimum ◽

Global Minimum ◽

Lot Sizing ◽

Initial Population ◽

Single Level ◽

Lot Sizing Problem ◽

The Cost

Masalah pensaizan lot satu aras timbul apabila suatu syarikat pengeluar ingin menjanakan perancangan pengeluaran terperinci bagi produk berpandukan suatu perancangan agregat. Walaupun masalah ini telah dikaji dengan meluas, hanya pendekatan pengaturcaraan dinamik dapat menjamin penyelesaian yang minimum secara global. Maka heuristik-heuristik stokastik yang mampu melepasi minimum tempatan adalah diperlukan. Kajian ini mencadangkan kaedah algoritma genetik untuk menyelesaikan masalah-masalah pensaizan lot satu aras, serta membincangkan beberapa contoh aplikasi kaedah tersebut. Dalam pelaksanaan kaedah ini, heuristik penjanaan populasi pensaizan lot yang dapat menjanakan populasi awal digunakan untuk menyediakan kromosom. Kromosom ini digunakan sebagai input untuk algoritma genetik dengan operator-operator yang khusus bagi masalah pensaizan lot. Gabungan heuristik penjanaan populasi dengan algoritma genetik menghasilkan penumpuan yang lebih pantas dalam proses mendapatkan skim pensaizan lot yang optimum disebabkan oleh ketersauran populasi awal yang digunakan. Kata kunci: ALgorithm Genetik; Pensaizan lot The single level lot-sizing problem arises whenever a manufacturing company wishes to translate an aggregate plan for production of an end item into a detailed planning of its production. Although the cost driven problem is widely studied in the literature, only laborious dynamic programming approaches are known to guarantee global minimum. Thus, stochastically-based heuristics that have the mechanism to escape from local minimum are needed. In this paper a genetic algorithm for solving single level lot-sizing problems is proposed and the results of applying the algorithm to example problems are discussed. In our implementation, a lot-sizing population-generating heuristic is used to feed chromosomes to a genetic algorithm with operators specially designed for lot-sizing problems. The combination of the population-generating heuristic with genetic algorithm results in a faster convergence in finding the optimal lot-sizing scheme due to the guaranteed feasibility of the initial population. Key words: Genetic Algorithm; Lot-sizing

Download Full-text

Fitting Parametric Vortices to Aliased Doppler Velocities Scanned from Hurricanes

Monthly Weather Review ◽

10.1175/mwr-d-12-00362.1 ◽

2014 ◽

Vol 142 (1) ◽

pp. 94-106 ◽

Cited By ~ 5

Author(s):

Qin Xu ◽

Yuan Jiang ◽

Liping Liu

Keyword(s):

Cost Function ◽

Global Minimum ◽

Least Squares Method ◽

Tangential Velocity ◽

Radial Distance ◽

Initial Guess ◽

Radial Velocities ◽

Vortex Center ◽

Robust Least Squares ◽

The Cost

Abstract An alias-robust least squares method that produces less errors than established methods is developed to produce reference radial velocities for automatically correcting raw aliased Doppler velocities scanned from hurricanes. This method estimates the maximum tangential velocity VM and its radial distance RM from the hurricane vortex center by fitting a parametric vortex model directly to raw aliased velocities at and around each selected vertical level. In this method, aliasing-caused zigzag discontinuities in the relationship between the observed and true radial velocities are formulated into the cost function by applying an alias operator to the entire analysis-minus-observation term to ensure the cost function to be smooth and concave around the global minimum. Simulated radar velocity observations are used to examine the cost function geometry around the global minimum in the space of control parameters (VM, RM). The results show that the global minimum point can estimate the true (VM, RM) approximately if the hurricane vortex center location is approximately known and the hurricane core and vicinity areas are adequately covered by the radar scans, and the global minimum can be found accurately by an efficient descent algorithm as long as the initial guess is in the concave vicinity of the global minimum. The method is used with elaborated refinements for automated dealiasing, and this utility is highlighted by an example applied to severely aliased radial velocities scanned from a hurricane.

Download Full-text

CONSTRAINED HEBBIAN LEARNING: GRADIENT DESCENT TO GLOBAL MINIMA IN AN n-DIMENSIONAL LANDSCAPE

International Journal of Neural Systems ◽

10.1142/s0129065791000042 ◽

1991 ◽

Vol 02 (01n02) ◽

pp. 35-46

Author(s):

Yves Chauvin

Keyword(s):

Cost Function ◽

Principal Components ◽

Gradient Descent ◽

Global Minimum ◽

Hebbian Learning ◽

Saddle Points ◽

Local Maximum ◽

Principal Component ◽

Learning Trajectory ◽

Computing Unit

This behavior of a constrained linear computing unit is analysed during “Hebbian” learning by gradient descent of a cost function corresponding to the sum of a variance maximization and a weight normalization term. The n-dimensional landscape of this cost function is shown to be composed of one local maximum and of n saddle points plus one global minimum aligned with the principal components of the input patterns. Furthermore, the landscape can be described in terms of hyperspheres, hypercrests, and hypervalleys associated with each of these principal components. Using this description, it is shown that the learning trajectory will converge to the global minimum of the landscape corresponding to the main principal component of the input patterns, provided some conditions on the starting weights and on the learning rate of the descent procedure. Extensions and implications of the algorithm are discussed.

Download Full-text

CHARACTERIZING ONE-LAYER ASSOCIATIVE NEURAL NETWORKS WITH OPTIMAL NOISE-REDUCTION ABILITY

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001492000497 ◽

1992 ◽

Vol 06 (05) ◽

pp. 1009-1025 ◽

Cited By ~ 1

Author(s):

TAO WANG ◽

XIAOLIANG XING ◽

XINHUA ZHUANG

Keyword(s):

Neural Network ◽

Neural Networks ◽

Cost Function ◽

Noise Reduction ◽

Gradient Descent ◽

Storage Capacity ◽

Learning Algorithm ◽

Optimal Learning ◽

The Neural Network ◽

The Cost

In this paper, we describe an optimal learning algorithm for designing one-layer neural networks by means of global minimization. Taking the properties of a well-defined neural network into account, we derive a cost function to measure the goodness of the network quantitatively. The connection weights are determined by the gradient descent rule to minimize the cost function. The optimal learning algorithm is formed as either the unconstraint-based or the constraint-based minimization problem. It ensures the realization of each desired associative mapping with the best noise reduction ability in the sense of optimization. We also investigate the storage capacity of the neural network, the degree of noise reduction for a desired associative mapping, and the convergence of the learning algorithm in an analytic way. Finally, a large number of computer experimental results are presented.

Download Full-text

On Generating Optimal Signal Probabilities for Random Tests: A Genetic Approach

VLSI Design ◽

10.1155/1996/75798 ◽

1996 ◽

Vol 4 (3) ◽

pp. 207-215 ◽

Cited By ~ 1

Author(s):

M. Srinivas ◽

L. M. Patnaik

Keyword(s):

Genetic Algorithms ◽

Cost Function ◽

Gradient Descent ◽

Random Search ◽

Search Space ◽

Optimization Techniques ◽

Gradient Descent Methods ◽

Test Vectors ◽

The Cost ◽

Optimal Signal

Genetic Algorithms are robust search and optimization techniques. A Genetic Algorithm based approach for determining the optimal input distributions for generating random test vectors is proposed in the paper. A cost function based on the COP testability measure for determining the efficacy of the input distributions is discussed. A brief overview of Genetic Algorithms (GAs) and the specific details of our implementation are described. Experimental results based on ISCAS-85 benchmark circuits are presented. The performance of our GAbased approach is compared with previous results. While the GA generates more efficient input distributions than the previous methods which are based on gradient descent search, the overheads of the GA in computing the input distributions are larger.To account for the relatively quick convergence of the gradient descent methods, we analyze the landscape of the COP-based cost function. We prove that the cost function is unimodal in the search space. This feature makes the cost function amenable to optimization by gradient-descent techniques as compared to random search methods such as Genetic Algorithms.

Download Full-text

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/292 ◽

2020 ◽

Author(s):

Tuan Hoang ◽

Thanh-Toan Do ◽

Tam V. Nguyen ◽

Ngai-Man Cheung

Keyword(s):

Neural Networks ◽

Cost Function ◽

Image Classification ◽

Convolutional Neural Networks ◽

Gradient Descent ◽

Deep Neural Networks ◽

State Of The Art ◽

Deep Convolutional Neural Networks ◽

Novel Method ◽

The Cost

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it does not update the quantized weights. To address this issue, we propose a novel method that enables direct updating of quantized weights with learnable quantization levels to minimize the cost function using gradient descent. Second, to obtain low bit-width activations, existing works consider all channels equally. However, the activation quantizers could be biased toward a few channels with high-variance. To address this issue, we propose a method to take into account the quantization errors of individual channels. With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the image classification task, using AlexNet, ResNet and MobileNetV2 architectures on CIFAR-100 and ImageNet datasets.

Download Full-text