Polynomial Representation of the Gaussian Process

Author(s):  
Jesper Kristensen ◽  
Isaac Asher ◽  
Liping Wang

Gaussian Process (GP) regression is a well-established probabilistic meta-modeling and data analysis tool. The posterior distribution of the GP parameters can be estimated using, e.g., Markov Chain Monte Carlo (MCMC). The ability to make predictions is a key aspect of using such surrogate models. To make a GP prediction, the MCMC chain as well as the training data are required. For some applications, GP predictions can require too much computational time and/or memory, especially for many training data points. This motivates the present work to represent the GP in an equivalent polynomial (or other global functional) form called a portable GP. The portable GP inherits many benefits of the GP including feature ranking via Sobol indices, robust fitting to non-linear and high-dimensional data, accurate uncertainty estimates, etc. The framework expands the GP in a high-dimensional model representation (HDMR). After fitting each HDMR basis function with a polynomial, they are all added together to form the portable GP. A ranking of which basis functions to use in the fitting process is automatically provided via Sobol indices. The uncertainty from the fitting process can be propagated to the final GP polynomial estimate. In applications where speed and accuracy are paramount, spline fits to the basis functions give very good results. Finally, portable BHM provides an alternative set of assumptions with regards to extrapolation behavior which may be more appropriate than the assumptions inherent in GPs.

2017 ◽  
Vol 34 (6) ◽  
pp. 1807-1828 ◽  
Author(s):  
Enying Li ◽  
Fan Ye ◽  
Hu Wang

Purpose The purpose of study is to overcome the error estimation of standard deviation derived from Expected improvement (EI) criterion. Compared with other popular methods, a quantitative model assessment and analysis tool, termed high-dimensional model representation (HDMR), is suggested to be integrated with an EI-assisted sampling strategy. Design/methodology/approach To predict standard deviation directly, Kriging is imported. Furthermore, to compensate for the underestimation of error in the Kriging predictor, a Pareto frontier (PF)-EI (PFEI) criterion is also suggested. Compared with other surrogate-assisted optimization methods, the distinctive characteristic of HDMR is to disclose the correlations among component functions. If only low correlation terms are considered, the number of function evaluations for HDMR grows only polynomially with the number of input variables and correlative terms. Findings To validate the suggested method, various nonlinear and high-dimensional mathematical functions are tested. The results show the suggested method is potential for solving complicated real engineering problems. Originality/value In this study, the authors hope to integrate superiorities of PFEI and HDMR to improve optimization performance.


Author(s):  
Ankur Srivastava ◽  
Arun K. Subramaniyan ◽  
Liping Wang

AbstractMethods for efficient variance-based global sensitivity analysis of complex high-dimensional problems are presented and compared. Variance decomposition methods rank inputs according to Sobol indices that can be computationally expensive to evaluate. Main and interaction effect Sobol indices can be computed analytically in the Kennedy and O'Hagan framework with Gaussian processes. These methods use the high-dimensional model representation concept for variance decomposition that presents a unique model representation when inputs are uncorrelated. However, when the inputs are correlated, multiple model representations may be possible leading to ambiguous sensitivity ranking with Sobol indices. In this work, we present the effect of input correlation on sensitivity analysis and discuss the methods presented by Li and Rabitz in the context of Kennedy and O'Hagan's framework with Gaussian processes. Results are demonstrated on simulated and real problems for correlated and uncorrelated inputs and demonstrate the utility of variance decomposition methods for sensitivity analysis.


2015 ◽  
Vol 32 (3) ◽  
pp. 643-667 ◽  
Author(s):  
Zhiyuan Huang ◽  
Haobo Qiu ◽  
Ming Zhao ◽  
Xiwen Cai ◽  
Liang Gao

Purpose – Popular regression methodologies are inapplicable to obtain accurate metamodels for high dimensional practical problems since the computational time increases exponentially as the number of dimensions rises. The purpose of this paper is to use support vector regression with high dimensional model representation (SVR-HDMR) model to obtain accurate metamodels for high dimensional problems with a few sampling points. Design/methodology/approach – High-dimensional model representation (HDMR) is a general set of quantitative model assessment and analysis tools for improving the efficiency of deducing high dimensional input-output system behavior. Support vector regression (SVR) method can approximate the underlying functions with a small subset of sample points. Dividing Rectangles (DIRECT) algorithm is a deterministic sampling method. Findings – This paper proposes a new form of HDMR by integrating the SVR, termed as SVR-HDMR. And an intelligent sampling strategy, namely, DIRECT method, is adopted to improve the efficiency of SVR-HDMR. Originality/value – Compared to other metamodeling techniques, the accuracy and efficiency of SVR-HDMR were significantly improved. The SVR-HDMR helped engineers understand the essence of underlying problems visually.


2013 ◽  
Vol 136 (1) ◽  
Author(s):  
Kambiz Haji Hajikolaei ◽  
G. Gary Wang

In engineering design, spending excessive amount of time on physical experiments or expensive simulations makes the design costly and lengthy. This issue exacerbates when the design problem has a large number of inputs, or of high dimension. High dimensional model representation (HDMR) is one powerful method in approximating high dimensional, expensive, black-box (HEB) problems. One existing HDMR implementation, random sampling HDMR (RS-HDMR), can build an HDMR model from random sample points with a linear combination of basis functions. The most critical issue in RS-HDMR is that calculating the coefficients for the basis functions includes integrals that are approximated by Monte Carlo summations, which are error prone with limited samples and especially with nonuniform sampling. In this paper, a new approach based on principal component analysis (PCA), called PCA-HDMR, is proposed for finding the coefficients that provide the best linear combination of the bases with minimum error and without using any integral. Several benchmark problems of different dimensionalities and one engineering problem are modeled using the method and the results are compared with RS-HDMR results. In all problems with both uniform and nonuniform sampling, PCA-HDMR built more accurate models than RS-HDMR for a given set of sample points.


Author(s):  
Kambiz Haji Hajikolaei ◽  
G. Gary Wang

In engineering design, spending excessive amount of time on physical experiments or expensive simulations makes the design costly and lengthy. This issue exacerbates when the design problem has a large number of inputs, or of high dimension. High Dimensional Model Representation (HDMR) is one powerful method in approximating high dimensional, expensive, black-box (HEB) problems. One existing HDMR implementation, Random Sampling HDMR (RS-HDMR), can build a HDMR model from random sample points with a linear combination of basis functions. The most critical issue in RS-HDMR is that calculating the coefficients for the basis functions includes integrals that are approximated by Monte Carlo summations, which are error prone with limited samples and especially with non-uniform sampling. In this paper, a new approach based on Principal Component Analysis (PCA), called PCA-HDMR, is proposed for finding the coefficients that provide the best linear combination of the bases with minimum error and without using any integral. Benchmark problems are modeled using the method and the results are compared with RS-HDMR results. With both uniform and non-uniform sampling, PCA-HDMR built more accurate models than RS-HDMR for a given set of sample points.


Author(s):  
Kambiz Haji Hajikolaei ◽  
G. Gary Wang

High Dimensional Model Representation (HDMR) is a tool for generating an approximation of an input-output model for a multivariate function. It can be used to model a black-box function for metamodel-based optimization. Recently the authors’ team has developed a radial basis function based HDMR (RBF-HDMR) model that can efficiently model a high dimensional black-box function and, moreover, to uncover inner variable structures of the black-box function. This approach, however, requests a complete new, although optimized, set of sample points, as dictated by the methodology, while in engineering design practice one often has many existing sample data. How to utilize the existing data to efficiently construct a HDMR model is the focus of this paper. We first identify the Random-Sampling HDMR (RS-HDMR), which uses orthonormal basis functions as HDMR component functions and existing sample points can be used to calculate the coefficients of the basis functions. One of the important issues related to the RS-HDMR is that in theory the basis functions are obtained based on the continuous integrations related to the orthonormality conditions. In practice, however, the integrations are approximated by Monte Carlo summation and thus the basis functions may not satisfy the orthonormality conditions. In this paper, we propose new and adaptive orthonormal basis functions with respect to a given set of sample points for RS-HDMR approximation. RS-HDMR models are built for different test functions using the standard and new adaptive basis functions for different number of sample points. The relative errors for both models are calculated and compared. The results show that the models that are built using the new basis functions are more accurate.


Author(s):  
Ankur Srivastava ◽  
Arun K. Subramaniyan ◽  
Liping Wang

Methods for efficient variance based global sensitivity analysis of complex high-dimensional problems are presented and compared. Variance decomposition methods rank inputs according to Sobol indices which can be computationally expensive to evaluate. Main and interaction effect Sobol indices can be computed efficiently in the Kennedy & O’Hagan framework with Gaussian Processes (GPs). These methods use the High Dimensional Model Representation (HDMR) concept for variance decomposition which presents a unique model representation when inputs are uncorrelated. However, when the inputs are correlated, multiple model representations may be possible leading to ambiguous sensitivity ranking with Sobol indices. In this work we present the effect of input correlation on sensitivity analysis and discuss the methods presented by Li & Rabitz in the context of Kennedy & O’ Hagan framework with GPs. Results are demonstrated on simulated and real problems for correlated and uncorrelated inputs and demonstrate the utility of variance decomposition methods for sensitivity analysis.


2020 ◽  
Author(s):  
Marc Philipp Bahlke ◽  
Natnael Mogos ◽  
Jonny Proppe ◽  
Carmen Herrmann

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 645
Author(s):  
Muhammad Farooq ◽  
Sehrish Sarfraz ◽  
Christophe Chesneau ◽  
Mahmood Ul Hassan ◽  
Muhammad Ali Raza ◽  
...  

Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k-nearest neighbours approach, together with the asymmetric least squares loss function, called ex-kNN, is proposed for computing expectiles. Firstly, the effect of various distance measures on ex-kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of (L1,L∞) lead to a low computational cost. Secondly, the performance of ex-kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex-kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex-kNN is found two to five times faster than ex-svm and much faster than er-boost, particularly, in the case of high dimensional data.


Sign in / Sign up

Export Citation Format

Share Document