Error Metrics and the Sequential Refinement of Kriging Metamodels

2015 ◽  
Vol 137 (1) ◽  
Author(s):  
David A. Romero ◽  
Veronica E. Marin ◽  
Cristina H. Amon

Metamodels, or surrogate models, have been proposed in the literature to reduce the resources (time/cost) invested in the design and optimization of engineering systems whose behavior is modeled using complex computer codes, in an area commonly known as simulation-based design optimization. Following the seminal paper of Sacks et al. (1989, “Design and Analysis of Computer Experiments,” Stat. Sci., 4(4), pp. 409–435), researchers have developed the field of design and analysis of computer experiments (DACE), focusing on different aspects of the problem such as experimental design, approximation methods, model fitting, model validation, and metamodeling-based optimization methods. Among these, model validation remains a key issue, as the reliability and trustworthiness of the results depend greatly on the quality of approximation of the metamodel. Typically, model validation involves calculating prediction errors of the metamodel using a data set different from the one used to build the model. Due to the high cost associated with computer experiments with simulation codes, validation approaches that do not require additional data points (samples) are preferable. However, it is documented that methods based on resampling, e.g., cross validation (CV), can exhibit oscillatory behavior during sequential/adaptive sampling and model refinement, thus making it difficult to quantify the approximation capabilities of the metamodels and/or to define rational stopping criteria for the metamodel refinement process. In this work, we present the results of a simulation experiment conducted to study the evolution of several error metrics during sequential model refinement, to estimate prediction errors, and to define proper stopping criteria without requiring additional samples beyond those used to build the metamodels. Our results show that it is possible to accurately estimate the predictive performance of Kriging metamodels without additional samples, and that leave-one-out CV errors perform poorly in this context. Based on our findings, we propose guidelines for choosing the sample size of computer experiments that use sequential/adaptive model refinement paradigm. We also propose a stopping criterion for sequential model refinement that does not require additional samples.

Author(s):  
Andrei M. Bandalouski ◽  
Natalja G. Egorova ◽  
Mikhail Y. Kovalyov ◽  
Erwin Pesch ◽  
S. Armagan Tarim

AbstractIn this paper we present a novel approach to the dynamic pricing problem for hotel businesses. It includes disaggregation of the demand into several categories, forecasting, elastic demand simulation, and a mathematical programming model with concave quadratic objective function and linear constraints for dynamic price optimization. The approach is computationally efficient and easy to implement. In computer experiments with a hotel data set, the hotel revenue is increased by about 6% on average in comparison with the actual revenue gained in a past period, where the fixed price policy was employed, subject to an assumption that the demand can deviate from the suggested elastic model. The approach and the developed software can be a useful tool for small hotels recovering from the economic consequences of the COVID-19 pandemic.


Geophysics ◽  
2014 ◽  
Vol 79 (1) ◽  
pp. IM1-IM9 ◽  
Author(s):  
Nathan Leon Foks ◽  
Richard Krahenbuhl ◽  
Yaoguo Li

Compressive inversion uses computational algorithms that decrease the time and storage needs of a traditional inverse problem. Most compression approaches focus on the model domain, and very few, other than traditional downsampling focus on the data domain for potential-field applications. To further the compression in the data domain, a direct and practical approach to the adaptive downsampling of potential-field data for large inversion problems has been developed. The approach is formulated to significantly reduce the quantity of data in relatively smooth or quiet regions of the data set, while preserving the signal anomalies that contain the relevant target information. Two major benefits arise from this form of compressive inversion. First, because the approach compresses the problem in the data domain, it can be applied immediately without the addition of, or modification to, existing inversion software. Second, as most industry software use some form of model or sensitivity compression, the addition of this adaptive data sampling creates a complete compressive inversion methodology whereby the reduction of computational cost is achieved simultaneously in the model and data domains. We applied the method to a synthetic magnetic data set and two large field magnetic data sets; however, the method is also applicable to other data types. Our results showed that the relevant model information is maintained after inversion despite using 1%–5% of the data.


Author(s):  
David A. Romero ◽  
Cristina H. Amon ◽  
Susan Finger

In order to reduce the time and resources devoted to design-space exploration during simulation-based design and optimization, the use of surrogate models, or metamodels, has been proposed in the literature. Key to the success of metamodeling efforts are the experimental design techniques used to generate the combinations of input variables at which the computer experiments are conducted. Several adaptive sampling techniques have been proposed to tailor the experimental designs to the specific application at hand, using the already-acquired data to guide further exploration of the input space, instead of using a fixed sampling scheme defined a priori. Though mixed results have been reported, it has been argued that adaptive sampling techniques can be more efficient, yielding better surrogate models with less sampling points. In this paper, we address the problem of adaptive sampling for single and multi-response metamodels, with a focus on Multi-stage Multi-response Bayesian Surrogate Models (MMBSM). We compare distance-optimal latin hypercube sampling, an entropy-based criterion and the maximum cross-validation variance criterion, originally proposed for one-dimensional output spaces and implemented in this paper for multi-dimensional output spaces. Our results indicate that, both for single and multi-response surrogate models, the entropy-based adaptive sampling approach leads to models that are more robust to the initial experimental design and at least as accurate (or better) when compared with other sampling techniques using the same number of sampling points.


2020 ◽  
Author(s):  
Robson Borges de Lima ◽  
Cinthia Pereira de Oliveira ◽  
Rinaldo Luiz Caraciolo Ferreira ◽  
José Antônio Aleixo da Silva ◽  
Emanuel Araújo Silva ◽  
...  

Abstract Background: Dry tropical forests in arid lands cover large areas in Brazil, but few studies report the total biomass stock showing the importance of height measurements, in addition to applying and comparing local and pan-tropical models of biomass prediction for the domain of trees and shrubs found in that environment. Here, we use a biomass data set of 500 trees and shrubs, covering 15 species harvested in a management plan in the state of Pernambuco, in Brazil. We seek to develop local models and compare them with the equations traditionally applied to dry forests - showing the importance of tree height measurements. Due to the non-linear relationships with the independent variables of the tree, we used a nonlinear least squares modeling technique when adjusting models, we adopted the cross-validation procedure. The selection of the models was based on the likelihood measures (AIC), total explained variation (R2) and forecast error (RSE, RMSE and Bias). Results: In summary, our above-ground biomass data set is best represented by the Schumacher-Hall equation: exp [3.5336 + 1.9126 × log (D) + 1.2438 × log (Ht)], which shows that height measurements are essential to estimate accurately biomass. The biggest prediction errors observed when testing pan-tropical models in our data demonstrated the importance of developing new local models and indicated that careful considerations should be made if generic “pantropical” models without height measurements are planned for application in dry forests in Brazil. Conclusions: Thus, local equations can be used for carbon accounting in REDD + and sustainable incentive projects that initiate the development of dry forests and assess ecosystem services.


2008 ◽  
Vol 71 (2) ◽  
pp. 279-285 ◽  
Author(s):  
M. J. STASIEWICZ ◽  
B. P. MARKS ◽  
A. ORTA-RAMIREZ ◽  
D. M. SMITH

Traditional models for predicting the thermal inactivation rate of bacteria are state dependent, considering only the current state of the product. In this study, the potential for previous sublethal thermal history to increase the thermotolerance of Salmonella in ground turkey was determined, a path-dependent model for thermal inactivation was developed, and the path-dependent predictions were tested against independent data. Weibull-Arrhenius parameters for Salmonella inactivation in ground turkey thigh were determined via isothermal tests at 55, 58, 61, and 63°C. Two sets of nonisothermal heating tests also were conducted. The first included five linear heating rates (0.4, 0.9, 1.7, 3.5, and 7.0 K/min) and three holding temperatures (55, 58, and 61°C); the second also included sublethal holding periods at 40, 45, and 50°C. When the standard Weibull-Arrhenius model was applied to the nonisothermal validation data sets, the root mean squared error of prediction was 2.5 log CFU/g, with fail-dangerous residuals as large as 4.7 log CFU/g when applied to the complete nonisothermal data set. However, by using a modified path-dependent model for inactivation, the prediction errors for independent data were reduced by 56%. Under actual thermal processing conditions, use of the path-dependant model would reduce error in thermal lethality predictions for slowly cooked products.


Author(s):  
Wilfried Mirschel ◽  
Karl-Otto Wenkel ◽  
Martin Wegehenkel ◽  
Kurt Christian Kersebaum ◽  
Uwe Schindler ◽  
...  

2012 ◽  
Vol 36 (4) ◽  
pp. 81-94 ◽  
Author(s):  
Emmanouil Benetos ◽  
Simon Dixon

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.


2004 ◽  
Vol 65 (3) ◽  
pp. 273-288
Author(s):  
Dimosthenis Anagnostopoulos ◽  
Vassilis Dalakas ◽  
Mara Nikolaidou

1998 ◽  
Vol 52 (10) ◽  
pp. 1339-1347 ◽  
Author(s):  
Mark R. Riley ◽  
Mark A. Arnold ◽  
David W. Murhammer

A novel method is introduced for developing calibration models for the spectroscopic measurement of chemical concentrations in an aqueous environment. To demonstrate this matrix-enhanced calibration procedure, we developed calibration models to quantitate glucose and glutamine concentrations in an insect cell culture medium that is a complex mixture of more than 20 components, with three components that manifest significant concentration changes. Accurate calibration models were generated for glucose and glutamine by using a calibration data set composed of 60 samples containing the analytes dissolved in an aqueous buffer along with as few as two samples of the analytes dissolved in culture medium. Standard errors of prediction were 1.0 mM for glucose and 0.35 mM for glutamine. The matrix-enhanced method was also applied to culture medium samples collected during the course of a second bioreactor run. Addition of three culture medium samples to a buffer calibration reduced glucose prediction errors from 3.8 mM to 1.0 mM; addition of two culture medium samples reduced glutamine prediction errors from 1.6 mM to 0.76 mM. Results from this study suggest that spectroscopic calibration models can be developed from a relatively simple set of samples provided that some account for variations in the sample matrix.


Sign in / Sign up

Export Citation Format

Share Document