Error Metrics and the Sequential Refinement of Kriging Metamodels

David A. Romero; Veronica E. Marin; Cristina H. Amon

doi:10.1115/1.4028883

Error Metrics and the Sequential Refinement of Kriging Metamodels

Journal of Mechanical Design ◽

10.1115/1.4028883 ◽

2015 ◽

Vol 137 (1) ◽

Cited By ~ 7

Author(s):

David A. Romero ◽

Veronica E. Marin ◽

Cristina H. Amon

Keyword(s):

Model Validation ◽

Adaptive Sampling ◽

Computer Experiments ◽

Oscillatory Behavior ◽

Prediction Errors ◽

Model Refinement ◽

Stopping Criteria ◽

Data Set ◽

Sequential Model ◽

Error Metrics

Metamodels, or surrogate models, have been proposed in the literature to reduce the resources (time/cost) invested in the design and optimization of engineering systems whose behavior is modeled using complex computer codes, in an area commonly known as simulation-based design optimization. Following the seminal paper of Sacks et al. (1989, “Design and Analysis of Computer Experiments,” Stat. Sci., 4(4), pp. 409–435), researchers have developed the field of design and analysis of computer experiments (DACE), focusing on different aspects of the problem such as experimental design, approximation methods, model fitting, model validation, and metamodeling-based optimization methods. Among these, model validation remains a key issue, as the reliability and trustworthiness of the results depend greatly on the quality of approximation of the metamodel. Typically, model validation involves calculating prediction errors of the metamodel using a data set different from the one used to build the model. Due to the high cost associated with computer experiments with simulation codes, validation approaches that do not require additional data points (samples) are preferable. However, it is documented that methods based on resampling, e.g., cross validation (CV), can exhibit oscillatory behavior during sequential/adaptive sampling and model refinement, thus making it difficult to quantify the approximation capabilities of the metamodels and/or to define rational stopping criteria for the metamodel refinement process. In this work, we present the results of a simulation experiment conducted to study the evolution of several error metrics during sequential model refinement, to estimate prediction errors, and to define proper stopping criteria without requiring additional samples beyond those used to build the metamodels. Our results show that it is possible to accurately estimate the predictive performance of Kriging metamodels without additional samples, and that leave-one-out CV errors perform poorly in this context. Based on our findings, we propose guidelines for choosing the sample size of computer experiments that use sequential/adaptive model refinement paradigm. We also propose a stopping criterion for sequential model refinement that does not require additional samples.

Download Full-text

Dynamic pricing with demand disaggregation for hotel revenue management

Journal of Heuristics ◽

10.1007/s10732-021-09480-2 ◽

2021 ◽

Author(s):

Andrei M. Bandalouski ◽

Natalja G. Egorova ◽

Mikhail Y. Kovalyov ◽

Erwin Pesch ◽

S. Armagan Tarim

Keyword(s):

Dynamic Pricing ◽

Programming Model ◽

Computer Experiments ◽

Price Policy ◽

Linear Constraints ◽

Economic Consequences ◽

Elastic Model ◽

Computationally Efficient ◽

Data Set ◽

Dynamic Price

AbstractIn this paper we present a novel approach to the dynamic pricing problem for hotel businesses. It includes disaggregation of the demand into several categories, forecasting, elastic demand simulation, and a mathematical programming model with concave quadratic objective function and linear constraints for dynamic price optimization. The approach is computationally efficient and easy to implement. In computer experiments with a hotel data set, the hotel revenue is increased by about 6% on average in comparison with the actual revenue gained in a past period, where the fixed price policy was employed, subject to an assumption that the demand can deviate from the suggested elastic model. The approach and the developed software can be a useful tool for small hotels recovering from the economic consequences of the COVID-19 pandemic.

Download Full-text

Adaptive sampling of potential-field data: A direct approach to compressive inversion

Geophysics ◽

10.1190/geo2013-0087.1 ◽

2014 ◽

Vol 79 (1) ◽

pp. IM1-IM9 ◽

Cited By ~ 20

Author(s):

Nathan Leon Foks ◽

Richard Krahenbuhl ◽

Yaoguo Li

Keyword(s):

Potential Field ◽

Adaptive Sampling ◽

Field Data ◽

Computational Cost ◽

Large Field ◽

Magnetic Data ◽

Data Sampling ◽

Data Types ◽

Data Set ◽

Potential Field Data

Compressive inversion uses computational algorithms that decrease the time and storage needs of a traditional inverse problem. Most compression approaches focus on the model domain, and very few, other than traditional downsampling focus on the data domain for potential-field applications. To further the compression in the data domain, a direct and practical approach to the adaptive downsampling of potential-field data for large inversion problems has been developed. The approach is formulated to significantly reduce the quantity of data in relatively smooth or quiet regions of the data set, while preserving the signal anomalies that contain the relevant target information. Two major benefits arise from this form of compressive inversion. First, because the approach compresses the problem in the data domain, it can be applied immediately without the addition of, or modification to, existing inversion software. Second, as most industry software use some form of model or sensitivity compression, the addition of this adaptive data sampling creates a complete compressive inversion methodology whereby the reduction of computational cost is achieved simultaneously in the model and data domains. We applied the method to a synthetic magnetic data set and two large field magnetic data sets; however, the method is also applicable to other data types. Our results showed that the relevant model information is maintained after inversion despite using 1%–5% of the data.

Download Full-text

On Adaptive Sampling for Single and Multi-Response Bayesian Surrogate Models

Volume 1: 32nd Design Automation Conference, Parts A and B ◽

10.1115/detc2006-99210 ◽

2006 ◽

Cited By ~ 9

Author(s):

David A. Romero ◽

Cristina H. Amon ◽

Susan Finger

Keyword(s):

Experimental Design ◽

Adaptive Sampling ◽

Design Space Exploration ◽

A Priori ◽

Computer Experiments ◽

Latin Hypercube Sampling ◽

Surrogate Models ◽

Sampling Techniques ◽

Design And Optimization ◽

Sampling Points

In order to reduce the time and resources devoted to design-space exploration during simulation-based design and optimization, the use of surrogate models, or metamodels, has been proposed in the literature. Key to the success of metamodeling efforts are the experimental design techniques used to generate the combinations of input variables at which the computer experiments are conducted. Several adaptive sampling techniques have been proposed to tailor the experimental designs to the specific application at hand, using the already-acquired data to guide further exploration of the input space, instead of using a fixed sampling scheme defined a priori. Though mixed results have been reported, it has been argued that adaptive sampling techniques can be more efficient, yielding better surrogate models with less sampling points. In this paper, we address the problem of adaptive sampling for single and multi-response metamodels, with a focus on Multi-stage Multi-response Bayesian Surrogate Models (MMBSM). We compare distance-optimal latin hypercube sampling, an entropy-based criterion and the maximum cross-validation variance criterion, originally proposed for one-dimensional output spaces and implemented in this paper for multi-dimensional output spaces. Our results indicate that, both for single and multi-response surrogate models, the entropy-based adaptive sampling approach leads to models that are more robust to the initial experimental design and at least as accurate (or better) when compared with other sampling techniques using the same number of sampling points.

Download Full-text

Prediction of Biomass in Dry Tropical Forests: an Approach on the Importance of Total Height in the Development of Local and Pan-Tropical Models

10.21203/rs.3.rs-52272/v1 ◽

2020 ◽

Author(s):

Robson Borges de Lima ◽

Cinthia Pereira de Oliveira ◽

Rinaldo Luiz Caraciolo Ferreira ◽

José Antônio Aleixo da Silva ◽

Emanuel Araújo Silva ◽

...

Keyword(s):

Tropical Forests ◽

Forecast Error ◽

Carbon Accounting ◽

Total Biomass ◽

Prediction Errors ◽

Local Models ◽

Dry Forests ◽

Data Set ◽

Dry Tropical Forests ◽

Trees And Shrubs

Abstract Background: Dry tropical forests in arid lands cover large areas in Brazil, but few studies report the total biomass stock showing the importance of height measurements, in addition to applying and comparing local and pan-tropical models of biomass prediction for the domain of trees and shrubs found in that environment. Here, we use a biomass data set of 500 trees and shrubs, covering 15 species harvested in a management plan in the state of Pernambuco, in Brazil. We seek to develop local models and compare them with the equations traditionally applied to dry forests - showing the importance of tree height measurements. Due to the non-linear relationships with the independent variables of the tree, we used a nonlinear least squares modeling technique when adjusting models, we adopted the cross-validation procedure. The selection of the models was based on the likelihood measures (AIC), total explained variation (R2) and forecast error (RSE, RMSE and Bias). Results: In summary, our above-ground biomass data set is best represented by the Schumacher-Hall equation: exp [3.5336 + 1.9126 × log (D) + 1.2438 × log (Ht)], which shows that height measurements are essential to estimate accurately biomass. The biggest prediction errors observed when testing pan-tropical models in our data demonstrated the importance of developing new local models and indicated that careful considerations should be made if generic “pantropical” models without height measurements are planned for application in dry forests in Brazil. Conclusions: Thus, local equations can be used for carbon accounting in REDD + and sustainable incentive projects that initiate the development of dry forests and assess ecosystem services.

Download Full-text

Modeling the Effect of Prior Sublethal Thermal History on the Thermal Inactivation Rate of Salmonella in Ground Turkey

Journal of Food Protection ◽

10.4315/0362-028x-71.2.279 ◽

2008 ◽

Vol 71 (2) ◽

pp. 279-285 ◽

Cited By ~ 22

Author(s):

M. J. STASIEWICZ ◽

B. P. MARKS ◽

A. ORTA-RAMIREZ ◽

D. M. SMITH

Keyword(s):

Thermal History ◽

Thermal Inactivation ◽

Mean Squared Error ◽

Inactivation Rate ◽

Prediction Errors ◽

Independent Data ◽

Validation Data ◽

Data Set ◽

Dependent Model ◽

Path Dependent

Traditional models for predicting the thermal inactivation rate of bacteria are state dependent, considering only the current state of the product. In this study, the potential for previous sublethal thermal history to increase the thermotolerance of Salmonella in ground turkey was determined, a path-dependent model for thermal inactivation was developed, and the path-dependent predictions were tested against independent data. Weibull-Arrhenius parameters for Salmonella inactivation in ground turkey thigh were determined via isothermal tests at 55, 58, 61, and 63°C. Two sets of nonisothermal heating tests also were conducted. The first included five linear heating rates (0.4, 0.9, 1.7, 3.5, and 7.0 K/min) and three holding temperatures (55, 58, and 61°C); the second also included sublethal holding periods at 40, 45, and 50°C. When the standard Weibull-Arrhenius model was applied to the nonisothermal validation data sets, the root mean squared error of prediction was 2.5 log CFU/g, with fail-dangerous residuals as large as 4.7 log CFU/g when applied to the complete nonisothermal data set. However, by using a modified path-dependent model for inactivation, the prediction errors for independent data were reduced by 56%. Under actual thermal processing conditions, use of the path-dependant model would reduce error in thermal lethality predictions for slowly cooked products.

Download Full-text

Müncheberg field trial data set for agro-ecosystem model validation

Modelling water and nutrient dynamics in soil–crop systems ◽

10.1007/978-1-4020-4479-3_16 ◽

2007 ◽

pp. 219-243 ◽

Cited By ~ 4

Author(s):

Wilfried Mirschel ◽

Karl-Otto Wenkel ◽

Martin Wegehenkel ◽

Kurt Christian Kersebaum ◽

Uwe Schindler ◽

...

Keyword(s):

Model Validation ◽

Field Trial ◽

Ecosystem Model ◽

Trial Data ◽

Data Set

Download Full-text

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Computer Music Journal ◽

10.1162/comj_a_00146 ◽

2012 ◽

Vol 36 (4) ◽

pp. 81-94 ◽

Cited By ~ 36

Author(s):

Emmanouil Benetos ◽

Simon Dixon

Keyword(s):

Latent Variable ◽

Markov Models ◽

Variable Model ◽

Data Set ◽

Music Transcription ◽

Transcription System ◽

Automatic Music Transcription ◽

Error Metrics ◽

Frequency Modulations ◽

Time Varying Pitch

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.

Download Full-text

A neural net-air dispersion model validation study using the Indianapolis urban data set

International Journal of Environment and Pollution ◽

10.1504/ijep.2010.030884 ◽

2010 ◽

Vol 40 (1/2/3) ◽

pp. 70 ◽

Cited By ~ 3

Author(s):

A. Pelliccioni ◽

C. Gariazzo ◽

T. Tirabassi

Keyword(s):

Model Validation ◽

Validation Study ◽

Dispersion Model ◽

Neural Net ◽

Data Set ◽

Air Dispersion ◽

Air Dispersion Model ◽

Urban Data

Download Full-text

A m-fold-decimation-based technique for model validation using a single system output data set

Mathematics and Computers in Simulation ◽

10.1016/j.matcom.2004.01.018 ◽

2004 ◽

Vol 65 (3) ◽

pp. 273-288

Author(s):

Dimosthenis Anagnostopoulos ◽

Vassilis Dalakas ◽

Mara Nikolaidou

Keyword(s):

Model Validation ◽

Single System ◽

Output Data ◽

Data Set ◽

System Output

Download Full-text

Matrix-Enhanced Calibration Procedure for Multivariate Calibration Models with Near-Infrared Spectra

Applied Spectroscopy ◽

10.1366/0003702981942672 ◽

1998 ◽

Vol 52 (10) ◽

pp. 1339-1347 ◽

Cited By ~ 29

Author(s):

Mark R. Riley ◽

Mark A. Arnold ◽

David W. Murhammer

Keyword(s):

Culture Medium ◽

Complex Mixture ◽

Calibration Procedure ◽

Aqueous Environment ◽

Prediction Errors ◽

Calibration Data ◽

Data Set ◽

Calibration Models ◽

Two Samples ◽

Concentration Changes

A novel method is introduced for developing calibration models for the spectroscopic measurement of chemical concentrations in an aqueous environment. To demonstrate this matrix-enhanced calibration procedure, we developed calibration models to quantitate glucose and glutamine concentrations in an insect cell culture medium that is a complex mixture of more than 20 components, with three components that manifest significant concentration changes. Accurate calibration models were generated for glucose and glutamine by using a calibration data set composed of 60 samples containing the analytes dissolved in an aqueous buffer along with as few as two samples of the analytes dissolved in culture medium. Standard errors of prediction were 1.0 mM for glucose and 0.35 mM for glutamine. The matrix-enhanced method was also applied to culture medium samples collected during the course of a second bioreactor run. Addition of three culture medium samples to a buffer calibration reduced glucose prediction errors from 3.8 mM to 1.0 mM; addition of two culture medium samples reduced glutamine prediction errors from 1.6 mM to 0.76 mM. Results from this study suggest that spectroscopic calibration models can be developed from a relatively simple set of samples provided that some account for variations in the sample matrix.

Download Full-text