scholarly journals Discrete natural neighbour interpolation with uncertainty using cross-validation error-distance fields

2020 ◽  
Vol 6 ◽  
pp. e282
Author(s):  
Thomas R. Etherington

Interpolation techniques provide a method to convert point data of a geographic phenomenon into a continuous field estimate of that phenomenon, and have become a fundamental geocomputational technique of spatial and geographical analysts. Natural neighbour interpolation is one method of interpolation that has several useful properties: it is an exact interpolator, it creates a smooth surface free of any discontinuities, it is a local method, is spatially adaptive, requires no statistical assumptions, can be applied to small datasets, and is parameter free. However, as with any interpolation method, there will be uncertainty in how well the interpolated field values reflect actual phenomenon values. Using a method based on natural neighbour distance based rates of error calculated for data points via cross-validation, a cross-validation error-distance field can be produced to associate uncertainty with the interpolation. Virtual geography experiments demonstrate that given an appropriate number of data points and spatial-autocorrelation of the phenomenon being interpolated, the natural neighbour interpolation and cross-validation error-distance fields provide reliable estimates of value and error within the convex hull of the data points. While this method does not replace the need for analysts to use sound judgement in their interpolations, for those researchers for whom natural neighbour interpolation is the best interpolation option the method presented provides a way to assess the uncertainty associated with natural neighbour interpolations.

Author(s):  
Felipe A. C. Viana ◽  
Raphael T. Haftka

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. Previous work has shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross validation errors may also be used to create a weighted surrogate. In this paper, we discuss whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We propose the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We find that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We find that the cross validation error vectors provide an excellent estimate of the RMS errors when the number of data points is high. Hence the use of cross validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions. However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions.


2008 ◽  
Vol 9 (6) ◽  
pp. 1523-1534 ◽  
Author(s):  
Jinyoung Rhee ◽  
Gregory J. Carbone ◽  
James Hussey

Abstract This paper investigates the influence of spatial interpolation and aggregation of data to depict drought at different spatial units relevant to and often required for drought management. Four different methods for drought index mapping were explored, and comparisons were made between two spatial operation methods (simple unweighted average versus spatial interpolation plus aggregation) and two calculation procedures (whether spatial operations are performed before or after the calculations of drought index values). Deterministic interpolation methods including Thiessen polygons, inverse distance weighted, and thin-plate splines as well as a stochastic and geostatistical interpolation method of ordinary kriging were compared for the two methods that use interpolation. The inverse distance weighted method was chosen based on the cross-validation error. After obtaining drought index values for different spatial units using each method in turn, differences in the empirical binned frequency distributions were tested between the methods and spatial units. The two methods using interpolation and aggregation introduced fewer errors in cross validation than the two simple unweighted average methods. Whereas the method performing spatial interpolation and aggregation before calculating drought index values generally provided consistent drought information between various spatial units, the method performing spatial interpolation and aggregation after calculating drought index values reduced errors related to the calculations of precipitation data.


2020 ◽  
Author(s):  
Alessandro Fassò ◽  
Michael Sommer ◽  
Christoph von Rohden

Abstract. This paper is motivated by the fact that, although temperature readings made by Vaisala RS41 radiosondes at GRUAN sites (http://www.gruan.org) are given at 1 s resolution, for various reasons, missing data are spread along the atmospheric profile. Such a problem is quite common in radiosonde data and other profile data. Hence, (linear) interpolation is often used to fill the gaps in published data products. In this perspective, the present paper considers interpolation uncertainty. To do this, a statistical approach is introduced giving some understanding of the consequences of substituting missing data by interpolated ones. In particular, a general frame for the computation of interpolation uncertainty based on a Gaussian process (GP) set-up is developed. Using the GP characteristics, a simple formula for computing the linear interpolation standard error is given. Moreover, the GP interpolation is proposed as it provides an alternative interpolation method with its standard error. For the Vaisala RS41, the two approaches are shown to give similar interpolation performances using an extensive cross-validation approach based on the block-bootstrap technique. Statistical results about interpolation uncertainties at various GRUAN sites and for various missing gap lengths are provided. Since both provide an underestimation of the cross-validation interpolation uncertainty, a bootstrap-based correction formula is proposed. Using the root mean square error, it is found that, for short gaps, with an average length of 5 s, the average uncertainty is smaller than 0.10 K. For larger gaps, it increases up to 0.35 K for an average gap length of 30 s, and up to 0.58 K for a gap of 60 s.


1996 ◽  
Vol 8 (7) ◽  
pp. 1391-1420 ◽  
Author(s):  
David H. Wolpert

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).


Author(s):  
Ginés Rubio ◽  
Héctor Pomares ◽  
Ignacio Rojas ◽  
Luis Javier Herrera ◽  
Alberto Guillén

Author(s):  
Dohyun Park ◽  
Yongbin Lee ◽  
Dong-Hoon Choi

Many meta-models have been developed to approximate true responses. These meta-models are often used for optimization instead of computer simulations which require high computational cost. However, designers do not know which meta-model is the best one in advance because the accuracy of each meta-model becomes different from problem to problem. To address this difficulty, research on the ensemble of meta-models that combines stand-alone meta-models has recently been pursued with the expectation of improving the prediction accuracy. In this study, we propose a selection method of weight factors for the ensemble of meta-models based on v-nearest neighbors’ cross-validation error (CV). The four stand-alone meta-models we employed in this study are polynomial regression, Kriging, radial basis function, and support vector regression. Each method is applied to five 1-D mathematical examples and ten 2-D mathematical examples. The prediction accuracy of each stand-alone meta-model and the existing ensemble of meta-models is compared. Ensemble of meta-models shows higher accuracy than the worst stand-alone model among the four stand-alone meta-models at all test examples (30 cases). In addition, the ensemble of meta-models shows the highest accuracy for the 5 test cases. Although it has lower accuracy than the best stand-alone meta-model, it has almost same RMSE values (less than 1.1) as the best standalone model in 16 out of 30 test cases. From the results, we can conclude that proposed method is effective and robust.


Author(s):  
Reza Alizadeh ◽  
Liangyue Jia ◽  
Anand Balu Nellippallil ◽  
Guoxin Wang ◽  
Jia Hao ◽  
...  

AbstractIn engineering design, surrogate models are often used instead of costly computer simulations. Typically, a single surrogate model is selected based on the previous experience. We observe, based on an analysis of the published literature, that fitting an ensemble of surrogates (EoS) based on cross-validation errors is more accurate but requires more computational time. In this paper, we propose a method to build an EoS that is both accurate and less computationally expensive. In the proposed method, the EoS is a weighted average surrogate of response surface models, kriging, and radial basis functions based on overall cross-validation error. We demonstrate that created EoS is accurate than individual surrogates even when fewer data points are used, so computationally efficient with relatively insensitive predictions. We demonstrate the use of an EoS using hot rod rolling as an example. Finally, we include a rule-based template which can be used for other problems with similar requirements, for example, the computational time, required accuracy, and the size of the data.


Sign in / Sign up

Export Citation Format

Share Document