A Design-Driven Validation Approach Using Bayesian Prediction Models

Wei Chen; Ying Xiong; Kwok-Leung Tsui; Shuchun Wang

doi:10.1115/1.2809439

A Design-Driven Validation Approach Using Bayesian Prediction Models

Journal of Mechanical Design ◽

10.1115/1.2809439 ◽

2007 ◽

Vol 130 (2) ◽

Cited By ~ 68

Author(s):

Wei Chen ◽

Ying Xiong ◽

Kwok-Leung Tsui ◽

Shuchun Wang

Keyword(s):

Prediction Model ◽

Computer Model ◽

Bayesian Approach ◽

Prediction Models ◽

Bayesian Prediction ◽

Good Resolution ◽

Validation Metrics ◽

Physical Experiments ◽

Combining Data ◽

Work Model

In most of the existing work, model validation is viewed as verifying the model accuracy, measured by the agreement between computational and experimental results. Due to the lack of resource, accuracy can only be assessed at very limited test points. However, from the design perspective, a good model should be considered the one that can provide the discrimination (with good resolution) between competing design candidates under uncertainty. In this work, a design-driven validation approach is presented. By combining data from both physical experiments and the computer model, a Bayesian approach is employed to develop a prediction model as the replacement of the original computer model for the purpose of design. Based on the uncertainty quantification with the Bayesian prediction and, subsequently, that of a design objective, some decision validation metrics are further developed to assess the confidence of using the Bayesian prediction model in making a specific design choice. We demonstrate that the Bayesian approach provides a flexible framework for drawing inferences for predictions in the intended, but maybe untested, design domain. The applicability of the proposed decision validation metrics is examined for designs with either a discrete or continuous set of design alternatives. The approach is demonstrated through an illustrative example of a robust engine piston design.

Download Full-text

Some Metrics and a Bayesian Procedure for Validating Predictive Models in Engineering Design

Volume 1: 32nd Design Automation Conference, Parts A and B ◽

10.1115/detc2006-99599 ◽

2006 ◽

Cited By ~ 10

Author(s):

Wei Chen ◽

Ying Xiong ◽

Kwok-Leung Tsui ◽

Shuchun Wang

Keyword(s):

Engineering Design ◽

Model Validation ◽

Computer Model ◽

Bayesian Approach ◽

Model Prediction ◽

Predictive Models ◽

Design Decision ◽

Validation Metrics ◽

Model Based ◽

Physical Experiments

Even though model-based simulations are widely used in engineering design, it remains a challenge to validate models and assess the risks and uncertainties associated with the use of predictive models for design decision making. In most of the existing work, model validation is viewed as verifying the model accuracy, measured by the agreement between computational and experimental results. However, from the design perspective, a good model is considered as the one that can provide the discrimination (good resolution) between design candidates. In this work, a Bayesian approach is presented to assess the uncertainty in model prediction by combining data from both physical experiments and the computer model. Based on the uncertainty quantification of model prediction, some design-oriented model validation metrics are further developed to guide designers for achieving high confidence of using predictive models in making a specific design decision. We demonstrate that the Bayesian approach provides a flexible framework for drawing inferences for predictions in the intended but may be untested design domain, where design settings of physical experiments and the computer model may or may not overlap. The implications of the proposed validation metrics are studied, and their potential roles in a model validation procedure are highlighted.

Download Full-text

Expert opinion as priors for random effects in Bayesian prediction models: Subclinical ketosis in dairy cows as an example

PLoS ONE ◽

10.1371/journal.pone.0244752 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0244752

Author(s):

Haifang Ni ◽

Irene Klugkist ◽

Saskia van der Drift ◽

Ruurd Jorritsma ◽

Gerrit Hooijer ◽

...

Keyword(s):

Prediction Model ◽

Random Effects ◽

Expert Opinion ◽

Prediction Models ◽

Model Development ◽

Bayesian Prediction ◽

Subclinical Ketosis ◽

Herd Level ◽

Individual Level ◽

The Individual

Random effects regression models are routinely used for clustered data in etiological and intervention research. However, in prediction models, the random effects are either neglected or conventionally substituted with zero for new clusters after model development. In this study, we applied a Bayesian prediction modelling method to the subclinical ketosis data previously collected by Van der Drift et al. (2012). Using a dataset of 118 randomly selected Dutch dairy farms participating in a regular milk recording system, the authors proposed a prediction model with milk measures as well as available test-day information as predictors for the diagnosis of subclinical ketosis in dairy cows. While their original model included random effects to correct for the clustering, the random effect term was removed for their final prediction model. With the Bayesian prediction modelling approach, we first used non-informative priors for the random effects for model development as well as for prediction. This approach was evaluated by comparing it to the original frequentist model. In addition, herd level expert opinion was elicited from a bovine health specialist using three different scales of precision and incorporated in the prediction as informative priors for the random effects, resulting in three more Bayesian prediction models. Results showed that the Bayesian approach could naturally take the clustering structure of clusters into account by keeping the random effects in the prediction model. Expert opinion could be explicitly combined with individual level data for prediction. However in this dataset, when elicited expert opinion was incorporated, little improvement was seen at the individual level as well as at the herd level. When the prediction models were applied to the 118 herds, at the individual cow level, with the original frequentist approach we obtained a sensitivity of 82.4% and a specificity of 83.8% at the optimal cutoff, while with the three Bayesian models with elicited expert opinion, we obtained sensitivities ranged from 78.7% to 84.6% and specificities ranged from 75.0% to 83.6%. At the herd level, 30 out of 118 within herd prevalences were correctly predicted by the original frequentist approach, and 31 to 44 herds were correctly predicted by the three Bayesian models with elicited expert opinion. Further investigation in expert opinion and distributional assumption for the random effects was carried out and discussed.

Download Full-text

Regression Modeling for Computer Model Validation With Functional Responses

Volume 1: 34th Design Automation Conference, Parts A and B ◽

10.1115/detc2008-49662 ◽

2008 ◽

Author(s):

Xuyuan Liu ◽

Kwok-Leung Tsui ◽

Wei Chen

Keyword(s):

Computer Model ◽

Functional Data ◽

Single Step ◽

Functional Regression ◽

Computationally Efficient ◽

Functional Responses ◽

Physical Experiments ◽

Step Procedure ◽

Combining Data ◽

Bias Function

Statistical analysis of functional responses based on functional data from both computer and physical experiments has gained increasing attention due to the dynamic nature of many engineering systems. However, the complexity and huge amount of functional data bring many difficulties to apply traditional or existing methodologies. The objective of the present study is twofold: (1) prediction of functional responses based on functional data and (2) prediction of bias function for validation of a computer model that predicts functional responses. In this paper, we first develop a functional regression model with linear basis functions to analyze functional data. Then combining data from both computer and physical experiments, we use the functional analysis modeling to predict the bias function which is crucial for validating a computer model. The proposed method, following the classical nonparametric regression framework, uses a single step procedure which is easily implemented and computationally efficient. Through an application example of motor engine analysis to predict acceleration performance and gear shift events, we demonstrate our approach and compare it to using the Gaussian process modeling approach.

Download Full-text

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Current Pharmaceutical Design ◽

10.2174/1381612826666200427111309 ◽

2020 ◽

Vol 26 (33) ◽

pp. 4195-4205

Author(s):

Xiaoyu Ding ◽

Chen Cui ◽

Dingyan Wang ◽

Jihui Zhao ◽

Mingyue Zheng ◽

...

Keyword(s):

Prediction Model ◽

Large Scale ◽

Prediction Models ◽

Predictive Accuracy ◽

Lead Optimization ◽

Consensus Method ◽

Molecular Pair ◽

Bioactivity Prediction ◽

Compound Synthesis ◽

Consensus Modeling

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Download Full-text

Fire modelling in Tasmanian buttongrass moorlands. III. Dead fuel moisture

International Journal of Wildland Fire ◽

10.1071/wf01025 ◽

2001 ◽

Vol 10 (2) ◽

pp. 241 ◽

Cited By ~ 27

Author(s):

Jon B. Marsden-Smedley ◽

Wendy R. Catchpole

Keyword(s):

Prediction Model ◽

Regression Model ◽

Fire Management ◽

Prediction Models ◽

Dew Point ◽

Seasonal Effects ◽

Experimental Program ◽

Fuel Moisture ◽

Fire Behaviour ◽

Fire Modelling

An experimental program was carried out in Tasmanian buttongrass moorlands to develop fire behaviour prediction models for improving fire management. This paper describes the results of the fuel moisture modelling section of this project. A range of previously developed fuel moisture prediction models are examined and three empirical dead fuel moisture prediction models are developed. McArthur’s grassland fuel moisture model gave equally good predictions as a linear regression model using humidity and dew-point temperature. The regression model was preferred as a prediction model as it is inherently more robust. A prediction model based on hazard sticks was found to have strong seasonal effects which need further investigation before hazard sticks can be used operationally.

Download Full-text

A Genetic Algorithm Optimized RNN-LSTM Model for Remaining Useful Life Prediction of Turbofan Engine

Electronics ◽

10.3390/electronics10030285 ◽

2021 ◽

Vol 10 (3) ◽

pp. 285

Author(s):

Kwok Tai Chui ◽

Brij B. Gupta ◽

Pandian Vasant

Keyword(s):

Genetic Algorithm ◽

Feature Extraction ◽

Prediction Model ◽

Prediction Models ◽

Remaining Useful Life ◽

Prediction Algorithm ◽

Short Term ◽

Turbofan Engine ◽

Term Prediction ◽

Useful Life

Understanding the remaining useful life (RUL) of equipment is crucial for optimal predictive maintenance (PdM). This addresses the issues of equipment downtime and unnecessary maintenance checks in run-to-failure maintenance and preventive maintenance. Both feature extraction and prediction algorithm have played crucial roles on the performance of RUL prediction models. A benchmark dataset, namely Turbofan Engine Degradation Simulation Dataset, was selected for performance analysis and evaluation. The proposal of the combination of complete ensemble empirical mode decomposition and wavelet packet transform for feature extraction could reduce the average root-mean-square error (RMSE) by 5.14–27.15% compared with six approaches. When it comes to the prediction algorithm, the results of the RUL prediction model could be that the equipment needs to be repaired or replaced within a shorter or a longer period of time. Incorporating this characteristic could enhance the performance of the RUL prediction model. In this paper, we have proposed the RUL prediction algorithm in combination with recurrent neural network (RNN) and long short-term memory (LSTM). The former takes the advantages of short-term prediction whereas the latter manages better in long-term prediction. The weights to combine RNN and LSTM were designed by non-dominated sorting genetic algorithm II (NSGA-II). It achieved average RMSE of 17.2. It improved the RMSE by 6.07–14.72% compared with baseline models, stand-alone RNN, and stand-alone LSTM. Compared with existing works, the RMSE improvement by proposed work is 12.95–39.32%.

Download Full-text

The Role of Board Independence and Ownership Structure in Improving the Efficacy of Corporate Financial Distress Prediction Model Evidence from India

Journal of Risk and Financial Management ◽

10.3390/jrfm14070333 ◽

2021 ◽

Vol 14 (7) ◽

pp. 333

Author(s):

Shilpa H. Shetty ◽

Theresa Nithila Vincent

Keyword(s):

Prediction Model ◽

Ownership Structure ◽

Financial Distress ◽

Prediction Models ◽

Receiver Operating Curve ◽

Financial Measures ◽

Financial Variables ◽

Financial Distress Prediction ◽

Distress Prediction

The study aimed to investigate the role of non-financial measures in predicting corporate financial distress in the Indian industrial sector. The proportion of independent directors on the board and the proportion of the promoters’ share in the ownership structure of the business were the non-financial measures that were analysed, along with ten financial measures. For this, sample data consisted of 82 companies that had filed for bankruptcy under the Insolvency and Bankruptcy Code (IBC). An equal number of matching financially sound companies also constituted the sample. Therefore, the total sample size was 164 companies. Data for five years immediately preceding the bankruptcy filing was collected for the sample companies. The data of 120 companies evenly drawn from the two groups of companies were used for developing the model and the remaining data were used for validating the developed model. Two binary logistic regression models were developed, M1 and M2, where M1 was formulated with both financial and non-financial variables, and M2 only had financial variables as predictors. The diagnostic ability of the model was tested with the aid of the receiver operating curve (ROC), area under the curve (AUC), sensitivity, specificity and annual accuracy. The results of the study show that inclusion of the two non-financial variables improved the efficacy of the financial distress prediction model. This study made a unique attempt to provide empirical evidence on the role played by non-financial variables in improving the efficiency of corporate distress prediction models.

Download Full-text

Predicting Change Prone Classes in Open Source Software

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018100101 ◽

2018 ◽

Vol 8 (4) ◽

pp. 1-23 ◽

Cited By ~ 2

Author(s):

Deepa Godara ◽

Amit Choudhary ◽

Rakesh Kumar Singh

Keyword(s):

Prediction Model ◽

Open Source ◽

Open Source Software ◽

Prediction Models ◽

New Technology ◽

Modern Technology ◽

Time Frequency ◽

Rigorous Testing ◽

Technology Changes ◽

Sensitivity Specificity

In today's world, the heart of modern technology is software. In order to compete with pace of new technology, changes in software are inevitable. This article aims at the association between changes and object-oriented metrics using different versions of open source software. Change prediction models can detect the probability of change in a class earlier in the software life cycle which would result in better effort allocation, more rigorous testing and easier maintenance of any software. Earlier, researchers have used various techniques such as statistical methods for the prediction of change-prone classes. In this article, some new metrics such as execution time, frequency, run time information, popularity and class dependency are proposed which can help in prediction of change prone classes. For evaluating the performance of the prediction model, the authors used Sensitivity, Specificity, and ROC Curve. Higher values of AUC indicate the prediction model gives significant accurate results. The proposed metrics contribute to the accurate prediction of change-prone classes.

Download Full-text

Building prediction models for coronary heart disease by synthesizing multiple longitudinal research findings

European Journal of Cardiovascular Prevention & Rehabilitation ◽

10.1097/01.hjr.0000173109.14228.71 ◽

2005 ◽

Vol 12 (5) ◽

pp. 459-464 ◽

Cited By ~ 4

Author(s):

Guizhou Hu ◽

Martin M. Root

Keyword(s):

Coronary Heart Disease ◽

Heart Disease ◽

Prediction Model ◽

Empirical Model ◽

Complex Disease ◽

Prediction Models ◽

Longitudinal Research ◽

Study Data ◽

Individual Risk ◽

Data Set

Background No methodology is currently available to allow the combining of individual risk factor information derived from different longitudinal studies for a chronic disease in a multivariate fashion. This paper introduces such a methodology, named Synthesis Analysis, which is essentially a multivariate meta-analytic technique. Design The construction and validation of statistical models using available data sets. Methods and results Two analyses are presented. (1) With the same data, Synthesis Analysis produced a similar prediction model to the conventional regression approach when using the same risk variables. Synthesis Analysis produced better prediction models when additional risk variables were added. (2) A four-variable empirical logistic model for death from coronary heart disease was developed with data from the Framingham Heart Study. A synthesized prediction model with five new variables added to this empirical model was developed using Synthesis Analysis and literature information. This model was then compared with the four-variable empirical model using the first National Health and Nutrition Examination Survey (NHANES I) Epidemiologic Follow-up Study data set. The synthesized model had significantly improved predictive power ( x2 = 43.8, P < 0.00001). Conclusions Synthesis Analysis provides a new means of developing complex disease predictive models from the medical literature.

Download Full-text

CORRECTION OF PREDICTION MODEL OUTPUT–APPLICATION TO GENERAL CORROSION MODEL

The International Journal of Maritime Engineering ◽

10.5750/ijme.v156ia4.942 ◽

2021 ◽

Vol 156 (A4) ◽

Author(s):

N Hifi ◽

N Barltrop

Keyword(s):

Prediction Model ◽

Reliability Analysis ◽

Model Calibration ◽

Structural Reliability ◽

Prediction Models ◽

General Corrosion ◽

Model Output ◽

Reliability Models ◽

Corrosion Model ◽

Recorded Data

This paper applies a newly developed methodology to calibrate the corrosion model within a structural reliability analysis. The methodology combines data from experience (measurements and expert judgment) and prediction models to adjust the structural reliability models. Two corrosion models published in the literature have been used to demonstrate the technique used for the model calibration. One model is used as a prediction for a future degradation and a second one to represent the inspection recorded data. The results of the calibration process are presented and discussed.

Download Full-text