Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Shuoguang Yang; Mengdi Wang; Ethan X. Fang

doi:10.1137/18m1164846

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

SIAM Journal on Optimization ◽

10.1137/18m1164846 ◽

2019 ◽

Vol 29 (1) ◽

pp. 616-659 ◽

Cited By ~ 3

Author(s):

Shuoguang Yang ◽

Mengdi Wang ◽

Ethan X. Fang

Keyword(s):

Gradient Methods ◽

Stochastic Gradient ◽

Composition Optimization

Download Full-text

Stochastic Gradient Methods For Simulation Optimization

Wiley Encyclopedia of Operations Research and Management Science ◽

10.1002/9780470400531.eorms0742 ◽

2011 ◽

Author(s):

Michael C. Fu

Keyword(s):

Simulation Optimization ◽

Gradient Methods ◽

Stochastic Gradient

Download Full-text

Efficient Ensemble-Based Stochastic Gradient Methods for Optimization Under Geological Uncertainty

Frontiers in Earth Science ◽

10.3389/feart.2020.00108 ◽

2020 ◽

Vol 8 ◽

Cited By ~ 1

Author(s):

Hoonyoung Jeong ◽

Alexander Y. Sun ◽

Jonghyeon Jeon ◽

Baehyun Min ◽

Daein Jeong

Keyword(s):

Gradient Methods ◽

Stochastic Gradient ◽

Geological Uncertainty

Download Full-text

A sparsity preserving stochastic gradient methods for sparse regression

Computational Optimization and Applications ◽

10.1007/s10589-013-9633-9 ◽

2014 ◽

Vol 58 (2) ◽

pp. 455-482 ◽

Cited By ~ 5

Author(s):

Qihang Lin ◽

Xi Chen ◽

Javier Peña

Keyword(s):

Gradient Methods ◽

Stochastic Gradient ◽

Sparse Regression

Download Full-text

Optimization of Complex Simulation Models with Stochastic Gradient Methods

2018 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcs.2018.00131 ◽

2018 ◽

Author(s):

Alexei A. Gaivoronski

Keyword(s):

Gradient Methods ◽

Simulation Models ◽

Stochastic Gradient ◽

Complex Simulation

Download Full-text

On the Steplength Selection in Stochastic Gradient Methods

Lecture Notes in Computer Science - Numerical Computations: Theory and Algorithms ◽

10.1007/978-3-030-39081-5_17 ◽

2020 ◽

pp. 186-197 ◽

Cited By ~ 1

Author(s):

Giorgia Franchini ◽

Valeria Ruggiero ◽

Luca Zanni

Keyword(s):

Gradient Methods ◽

Stochastic Gradient

Download Full-text

STOCHASTIC GRADIENT METHODS FOR UNCONSTRAINED OPTIMIZATION

Pesquisa Operacional ◽

10.1590/0101-7438.2014.034.03.0373 ◽

2014 ◽

Vol 34 (3) ◽

pp. 373-393 ◽

Cited By ~ 2

Author(s):

Nataša Krejić ◽

Nataša Krklec Jerinkić

Keyword(s):

Unconstrained Optimization ◽

Gradient Methods ◽

Stochastic Gradient

Download Full-text

Nesterov-aided stochastic gradient methods using Laplace approximation for Bayesian design optimization

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2020.112909 ◽

2020 ◽

Vol 363 ◽

pp. 112909 ◽

Cited By ~ 1

Author(s):

André Gustavo Carlon ◽

Ben Mansour Dia ◽

Luis Espath ◽

Rafael Holdorf Lopez ◽

Raúl Tempone

Keyword(s):

Design Optimization ◽

Gradient Methods ◽

Laplace Approximation ◽

Stochastic Gradient ◽

Bayesian Design

Download Full-text

Accelerating variance-reduced stochastic gradient methods

Mathematical Programming ◽

10.1007/s10107-020-01566-2 ◽

2020 ◽

Cited By ~ 1

Author(s):

Derek Driggs ◽

Matthias J. Ehrhardt ◽

Carola-Bibiane Schönlieb

Keyword(s):

Variance Reduction ◽

Convergence Rates ◽

Mean Squared Error ◽

Gradient Methods ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Acceleration Techniques ◽

Squared Error ◽

Accelerated Gradient ◽

First Time

Abstract Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov’s acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on “negative momentum”, a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show for the first time that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions n, scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.

Download Full-text

Blind stochastic gradient methods for optimal minimum variance CDMA receivers

Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284) ◽

10.1109/acssc.1998.750945 ◽

2002 ◽

Author(s):

Zhengyuan Xu ◽

M.K. Tsatsanis

Keyword(s):

Gradient Methods ◽

Minimum Variance ◽

Stochastic Gradient

Download Full-text

pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/451 ◽

2020 ◽

Author(s):

Beitong Zhou ◽

Jun Liu ◽

Weigao Sun ◽

Ruijuan Chen ◽

Claire Tomlin ◽

...

Keyword(s):

Gradient Descent ◽

Gradient Methods ◽

Rates Of Convergence ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Power Exponent ◽

Additional Parameter ◽

Nonconvex Functions ◽

Rate Analysis ◽

Benchmark Datasets

We propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term pbSGD. The proposed pbSGD method simply raises the stochastic gradient to a certain power elementwise during iterations and introduces only one additional parameter, namely, the power exponent (when it equals to 1, pbSGD reduces to SGD). We further propose pbSGD with momentum, which we term pbSGDM. The main results of this paper present comprehensive experiments on popular deep learning models and benchmark datasets. Empirical results show that the proposed pbSGD and pbSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. pbSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization such as learning rate schedules. Finally, we show convergence rate analysis for both pbSGD and pbSGDM methods. The theoretical rates of convergence match the best known theoretical rates of convergence for SGD and SGDM methods on nonconvex functions.

Download Full-text