New Metrics for Validation of Data-Driven Random Process Models in Uncertainty Quantification

Author(s):  
Hongyi Xu ◽  
Zhen Jiang ◽  
Daniel W. Apley ◽  
Wei Chen

Data-driven random process models have become increasingly important for uncertainty quantification (UQ) in science and engineering applications, due to their merit of capturing both the marginal distributions and the correlations of high-dimensional responses. However, the choice of a random process model is neither unique nor straightforward. To quantitatively validate the accuracy of random process UQ models, new metrics are needed to measure their capability in capturing the statistical information of high-dimensional data collected from simulations or experimental tests. In this work, two goodness-of-fit (GOF) metrics, namely, a statistical moment-based metric (SMM) and an M-margin U-pooling metric (MUPM), are proposed for comparing different stochastic models, taking into account their capabilities of capturing the marginal distributions and the correlations in spatial/temporal domains. This work demonstrates the effectiveness of the two proposed metrics by comparing the accuracies of four random process models (Gaussian process (GP), Gaussian copula, Hermite polynomial chaos expansion (PCE), and Karhunen–Loeve (K–L) expansion) in multiple numerical examples and an engineering example of stochastic analysis of microstructural materials properties. In addition to the new metrics, this paper provides insights into the pros and cons of various data-driven random process models in UQ.

Author(s):  
Mouhib Alnoukari ◽  
Asim El Sheikh

Knowledge Discovery (KD) process model was first discussed in 1989. Different models were suggested starting with Fayyad’s et al (1996) process model. The common factor of all data-driven discovery process is that knowledge is the final outcome of this process. In this chapter, the authors will analyze most of the KD process models suggested in the literature. The chapter will have a detailed discussion on the KD process models that have innovative life cycle steps. It will propose a categorization of the existing KD models. The chapter deeply analyzes the strengths and weaknesses of the leading KD process models, with the supported commercial systems and reported applications, and their matrix characteristics.


2012 ◽  
Vol 2012 ◽  
pp. 1-21 ◽  
Author(s):  
Shen Yin ◽  
Xuebo Yang ◽  
Hamid Reza Karimi

This paper presents an approach for data-driven design of fault diagnosis system. The proposed fault diagnosis scheme consists of an adaptive residual generator and a bank of isolation observers, whose parameters are directly identified from the process data without identification of complete process model. To deal with normal variations in the process, the parameters of residual generator are online updated by standard adaptive technique to achieve reliable fault detection performance. After a fault is successfully detected, the isolation scheme will be activated, in which each isolation observer serves as an indicator corresponding to occurrence of a particular type of fault in the process. The thresholds can be determined analytically or through estimating the probability density function of related variables. To illustrate the performance of proposed fault diagnosis approach, a laboratory-scale three-tank system is finally utilized. It shows that the proposed data-driven scheme is efficient to deal with applications, whose analytical process models are unavailable. Especially, for the large-scale plants, whose physical models are generally difficult to be established, the proposed approach may offer an effective alternative solution for process monitoring.


2021 ◽  
Author(s):  
Yanwen Xu ◽  
Pingfeng Wang

Abstract The Gaussian Process (GP) model has become one of the most popular methods and exhibits superior performance among surrogate models in many engineering design applications. However, the standard Gaussian process model is not able to deal with high dimensional applications. The root of the problem comes from the similarity measurements of the GP model that relies on the Euclidean distance, which becomes uninformative in the high-dimensional cases, and causes accuracy and efficiency issues. Limited studies explore this issue. In this study, thereby, we propose an enhanced squared exponential kernel using Manhattan distance that is more effective at preserving the meaningfulness of proximity measures and preferred to be used in the GP model for high-dimensional cases. The experiments show that the proposed approach has obtained a superior performance in high-dimensional problems. Based on the analysis and experimental results of similarity metrics, a guide to choosing the desirable similarity measures which result in the most accurate and efficient results for the Kriging model with respect to different sample sizes and dimension levels is provided in this paper.


2021 ◽  
Vol 13 (2) ◽  
pp. 558-570
Author(s):  
Jiajia Wang ◽  
Ryan J. Harrigan ◽  
Frederic P. Schoenberg

Coccidioidomycosis is an infectious disease of humans and other mammals that has seen a recent increase in occurrence in the southwestern United States, particularly in California. A rise in cases and risk to public health can serve as the impetus to apply newly developed methods that can quickly and accurately predict future caseloads. The recursive and Hawkes point process models with various triggering functions were fit to the data and their goodness of fit evaluated and compared. Although the point process models were largely similar in their fit to the data, the recursive point process model offered a slightly superior fit. We explored forecasting the spread of coccidioidomycosis in California from December 2002 to December 2017 using this recursive model, and we separated the training and testing portions of the data and achieved a root mean squared error of just 3.62 cases/week.


2016 ◽  
Vol 37 (1) ◽  
Author(s):  
Gintautas Jakimauskas ◽  
Marijus Radavičius ◽  
Jurgis Sušinskas

A simple, data-driven and computationally efficient procedure for testing independence of high-dimensional random vectors is proposed. The procedure is based on interpretation of testing goodness-of-fit as the classification problem, a special sequential partition procedure, elements of sequential testing, resampling and randomization. Monte Carlo simulations are carried out to assess the performance of the procedure.


2011 ◽  
Vol 23 (11) ◽  
pp. 2731-2745 ◽  
Author(s):  
Sridevi V. Sarma ◽  
David P. Nguyen ◽  
Gabriela Czanner ◽  
Sylvia Wirth ◽  
Matthew A. Wilson ◽  
...  

Characterizing neural spiking activity as a function of intrinsic and extrinsic factors is important in neuroscience. Point process models are valuable for capturing such information; however, the process of fully applying these models is not always obvious. A complete model application has four broad steps: specification of the model, estimation of model parameters given observed data, verification of the model using goodness of fit, and characterization of the model using confidence bounds. Of these steps, only the first three have been applied widely in the literature, suggesting the need to dedicate a discussion to how the time-rescaling theorem, in combination with parametric bootstrap sampling, can be generally used to compute confidence bounds of point process models. In our first example, we use a generalized linear model of spiking propensity to demonstrate that confidence bounds derived from bootstrap simulations are consistent with those computed from closed-form analytic solutions. In our second example, we consider an adaptive point process model of hippocampal place field plasticity for which no analytical confidence bounds can be derived. We demonstrate how to simulate bootstrap samples from adaptive point process models, how to use these samples to generate confidence bounds, and how to statistically test the hypothesis that neural representations at two time points are significantly different. These examples have been designed as useful guides for performing scientific inference based on point process models.


2011 ◽  
Vol 90-93 ◽  
pp. 1503-1510
Author(s):  
Fu Jun Liu ◽  
Yu Hua Zhu ◽  
Xiao Hui Ma

In this paper, a modified random process model of earthquake ground motion based on the model proposed by JinPing Ou is presented. The parameters in the model except the factor S0 are determined by using the least square method and the power spectral densities of 361 earthquake records. Then the method for determining the parameter S0 is proposed. The good performance of the proposed model in this paper in modeling the earthquake ground motion on firm ground is demonstrated by comparing it with other random process models.


SPIEL ◽  
2019 ◽  
Vol 4 (1) ◽  
pp. 121-145
Author(s):  
Larissa Leonhard ◽  
Anne Bartsch ◽  
Frank M. Schneider

This article presents an extended dual-process model of entertainment effects on political information processing and engagement. We suggest that entertainment consumption can either be driven by hedonic, escapist motivations that are associated with a superficial mode of information processing, or by eudaimonic, truth-seeking motivations that prompt more elaborate forms of information processing. This framework offers substantial extensions to existing dual-process models of entertainment by conceptualizing the effects of entertainment on active and reflective forms of information seeking, knowledge acquisition and political participation.


2019 ◽  
Vol 2019 ◽  
pp. 1-15 ◽  
Author(s):  
T. Mesbahzadeh ◽  
M. M. Miglietta ◽  
M. Mirakbari ◽  
F. Soleimani Sardoo ◽  
M. Abdolhoseini

Precipitation and temperature are very important climatic parameters as their changes may affect life conditions. Therefore, predicting temporal trends of precipitation and temperature is very useful for societal and urban planning. In this research, in order to study the future trends in precipitation and temperature, we have applied scenarios of the fifth assessment report of IPCC. The results suggest that both parameters will be increasing in the studied area (Iran) in future. Since there is interdependence between these two climatic parameters, the independent analysis of the two fields will generate errors in the interpretation of model simulations. Therefore, in this study, copula theory was used for joint modeling of precipitation and temperature under climate change scenarios. By the joint distribution, we can find the structure of interdependence of precipitation and temperature in current and future under climate change conditions, which can assist in the risk assessment of extreme hydrological and meteorological events. Based on the results of goodness of fit test, the Frank copula function was selected for modeling of recorded and constructed data under RCP2.6 scenario and the Gaussian copula function was used for joint modeling of the constructed data under the RCP4.5 and RCP8.5 scenarios.


Sign in / Sign up

Export Citation Format

Share Document