approximate hessian
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 2)

H-INDEX

5
(FIVE YEARS 0)

Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1775
Author(s):  
Árpád Bűrmen ◽  
Tadej Tuma ◽  
Jernej Olenšek

Recently, a derivative-free optimization algorithm was proposed that utilizes a minimum Frobenius norm (MFN) Hessian update for estimating the second derivative information, which in turn is used for accelerating the search. The proposed update formula relies only on computed function values and is a closed-form expression for a special case of a more general approach first published by Powell. This paper analyzes the convergence of the proposed update formula under the assumption that the points from Rn where the function value is known are random. The analysis assumes that the N+2 points used by the update formula are obtained by adding N+1 vectors to a central point. The vectors are obtained by transforming a prototype set of N+1 vectors with a random orthogonal matrix from the Haar measure. The prototype set must positively span a N≤n dimensional subspace. Because the update is random by nature we can estimate a lower bound on the expected improvement of the approximate Hessian. This lower bound was derived for a special case of the proposed update by Leventhal and Lewis. We generalize their result and show that the amount of improvement greatly depends on N as well as the choice of the vectors in the prototype set. The obtained result is then used for analyzing the performance of the update based on various commonly used prototype sets. One of the results obtained by this analysis states that a regular n-simplex is a bad choice for a prototype set because it does not guarantee any improvement of the approximate Hessian.


Geophysics ◽  
2020 ◽  
Vol 85 (4) ◽  
pp. R325-R337 ◽  
Author(s):  
Yuzhu Liu ◽  
Zheng Wu ◽  
Hao Kang ◽  
Jizhong Yang

The truncated Newton method uses information contained in the exact Hessian in full-waveform inversion (FWI). The exact Hessian physically contains information regarding doubly scattered waves, especially prismatic events. These waves are mainly caused by the scattering at steeply dipping structures, such as salt flanks and vertical or nearly vertical faults. We have systematically investigated the properties and applications of the exact Hessian. We begin by giving the formulas for computing each term in the exact Hessian and numerically analyzing their characteristics. We show that the second term in the exact Hessian may be comparable in magnitude to the first term. In particular, when there are apparent doubly scattered waves in the observed data, the influence of the second term may be dominant in the exact Hessian and the second term cannot be neglected. Next, we adopt a migration/demigration approach to compute the Gauss-Newton-descent direction and the Newton-descent direction using the approximate Hessian and the exact Hessian, respectively. In addition, we determine from the forward and the inverse perspectives that the second term in the exact Hessian not only contributes to the use of doubly scattered waves, but it also compensates for the use of single-scattering waves in FWI. Finally, we use three numerical examples to prove that by considering the second term in the exact Hessian, the role of prismatic waves in the observed data can be effectively revealed and steeply dipping structures can be reconstructed with higher accuracy.


2020 ◽  
Vol 34 (04) ◽  
pp. 4723-4730
Author(s):  
Xiang Li ◽  
Shusen Wang ◽  
Zhihua Zhang

Subsampled Newton methods approximate Hessian matrices through subsampling techniques to alleviate the per-iteration cost. Previous results require Ω (d) samples to approximate Hessians, where d is the dimension of data points, making it less practical for high-dimensional data. The situation is deteriorated when d is comparably as large as the number of data points n, which requires to take the whole dataset into account, making subsampling not useful. This paper theoretically justifies the effectiveness of subsampled Newton methods on strongly convex empirical risk minimization with high dimensional data. Specifically, we provably require only Θ˜(deffγ) samples for approximating the Hessian matrices, where deffγ is the γ-ridge leverage and can be much smaller than d as long as nγ ≫ 1. Our theories work for three types of Newton methods: subsampled Netwon, distributed Newton, and proximal Newton.


Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 208 ◽  
Author(s):  
Xinyi Wang ◽  
Xianfeng Ding ◽  
Quan Qu

In this paper, a new filter nonmonotone adaptive trust region with fixed step length for unconstrained optimization is proposed. The trust region radius adopts a new adaptive strategy to overcome additional computational costs at each iteration. A new nonmonotone trust region ratio is introduced. When a trial step is not successful, a multidimensional filter is employed to increase the possibility of the trial step being accepted. If the trial step is still not accepted by the filter set, it is possible to find a new iteration point along the trial step and the step length is computed by a fixed formula. The positive definite symmetric matrix of the approximate Hessian matrix is updated using the MBFGS method. The global convergence and superlinear convergence of the proposed algorithm is proven by some classical assumptions. The efficiency of the algorithm is tested by numerical results.


2019 ◽  
Author(s):  
Eric Hermes ◽  
Khachik Sargsyan ◽  
Habib Najm ◽  
Judit Zádor

Identification and refinement of first order saddle point (FOSP) structures on the potential energy surface (PES) of chemical systems is a computational bottleneck in the characterization of reaction pathways. Leading FOSP refinement strategies require calculation of the full Hessian matrix, which is not feasible for larger systems such as those encountered in heterogeneous catalysis. For these systems, the standard approach to FOSP refinement involves iterative diagonalization of the Hessian, but this comes at the cost of longer refinement trajectories due to the lack of accurate curvature information. We present a method for incorporating information obtained by an iterative diagonalization algorithm into the construction of an approximate Hessian matrix that accelerates FOSP refinement. We measure the performance of our method with two established FOSP refinement benchmarks and find a 50% reduction on average in the number of gradient evaluations required to converge to a FOSP for one benchmark, and a 25% reduction on average for the second benchmark.


2019 ◽  
Author(s):  
Eric Hermes ◽  
Khachik Sargsyan ◽  
Habib Najm ◽  
Judit Zádor

Identification and refinement of first order saddle point (FOSP) structures on the potential energy surface (PES) of chemical systems is a computational bottleneck in the characterization of reaction pathways. Leading FOSP refinement strategies require calculation of the full Hessian matrix, which is not feasible for larger systems such as those encountered in heterogeneous catalysis. For these systems, the standard approach to FOSP refinement involves iterative diagonalization of the Hessian, but this comes at the cost of longer refinement trajectories due to the lack of accurate curvature information. We present a method for incorporating information obtained by an iterative diagonalization algorithm into the construction of an approximate Hessian matrix that accelerates FOSP refinement. We measure the performance of our method with two established FOSP refinement benchmarks and find a 50% reduction on average in the number of gradient evaluations required to converge to a FOSP for one benchmark, and a 25% reduction on average for the second benchmark.


Author(s):  
Sheng-Wei Chen ◽  
Chun-Nan Chou ◽  
Edward Y. Chang

For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kroneckerfactored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.


Author(s):  
Yasutoshi Ida ◽  
Yasuhiro Fujiwara ◽  
Sotetsu Iwamura

Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.


Sign in / Sign up

Export Citation Format

Share Document