Exploration of a large-scale reconstructed structure on GaN(0001) surface by Bayesian optimization

Abstract. Improving predictive understanding of Earth system variability and change requires data–model integration. Efficient data–model integration for complex models requires surrogate modeling to reduce model evaluation time. However, building a surrogate of a large-scale Earth system model (ESM) with many output variables is computationally intensive because it involves a large number of expensive ESM simulations. In this effort, we propose an efficient surrogate method capable of using a few ESM runs to build an accurate and fast-to-evaluate surrogate system of model outputs over large spatial and temporal domains. We first use singular value decomposition to reduce the output dimensions and then use Bayesian optimization techniques to generate an accurate neural network surrogate model based on limited ESM simulation samples. Our machine-learning-based surrogate methods can build and evaluate a large surrogate system of many variables quickly. Thus, whenever the quantities of interest change, such as a different objective function, a new site, and a longer simulation time, we can simply extract the information of interest from the surrogate system without rebuilding new surrogates, which significantly reduces computational efforts. We apply the proposed method to a regional ecosystem model to approximate the relationship between eight model parameters and 42 660 carbon flux outputs. Results indicate that using only 20 model simulations, we can build an accurate surrogate system of the 42 660 variables, wherein the consistency between the surrogate prediction and actual model simulation is 0.93 and the mean squared error is 0.02. This highly accurate and fast-to-evaluate surrogate system will greatly enhance the computational efficiency of data–model integration to improve predictions and advance our understanding of the Earth system.

Download Full-text

Optimizing a backscatter forward operator using Sentinel-1 data over irrigated land

Hydrology and Earth System Sciences ◽

10.5194/hess-25-6283-2021 ◽

2021 ◽

Vol 25 (12) ◽

pp. 6283-6307

Author(s):

Sara Modanesi ◽

Christian Massari ◽

Alexander Gruber ◽

Hans Lievens ◽

Angelica Tarpanelli ◽

...

Keyword(s):

Soil Moisture ◽

Data Assimilation ◽

Land Surface ◽

Large Scale ◽

Bayesian Optimization ◽

Irrigation Scheme ◽

Area Index ◽

Observation Operator ◽

Future Data ◽

The Impact

Abstract. Worldwide, the amount of water used for agricultural purposes is rising, and the quantification of irrigation is becoming a crucial topic. Because of the limited availability of in situ observations, an increasing number of studies is focusing on the synergistic use of models and satellite data to detect and quantify irrigation. The parameterization of irrigation in large-scale land surface models (LSMs) is improving, but it is still hampered by the lack of information about dynamic crop rotations, or the extent of irrigated areas, and the mostly unknown timing and amount of irrigation. On the other hand, remote sensing observations offer an opportunity to fill this gap as they are directly affected by, and hence potentially able to detect, irrigation. Therefore, combining LSMs and satellite information through data assimilation can offer the optimal way to quantify the water used for irrigation. This work represents the first and necessary step towards building a reliable LSM data assimilation system which, in future analysis, will investigate the potential of high-resolution radar backscatter observations from Sentinel-1 to improve irrigation quantification. Specifically, the aim of this study is to couple the Noah-MP LSM running within the NASA Land Information System (LIS), with a backscatter observation operator for simulating unbiased backscatter predictions over irrigated lands. In this context, we first tested how well modelled surface soil moisture (SSM) and vegetation estimates, with or without irrigation simulation, are able to capture the signal of aggregated 1 km Sentinel-1 backscatter observations over the Po Valley, an important agricultural area in northern Italy. Next, Sentinel-1 backscatter observations, together with simulated SSM and leaf area index (LAI), were used to optimize a Water Cloud Model (WCM), which will represent the observation operator in future data assimilation experiments. The WCM was calibrated with and without an irrigation scheme in Noah-MP and considering two different cost functions. Results demonstrate that using an irrigation scheme provides a better calibration of the WCM, even if the simulated irrigation estimates are inaccurate. The Bayesian optimization is shown to result in the best unbiased calibrated system, with minimal chances of having error cross-correlations between the model and observations. Our time series analysis further confirms that Sentinel-1 is able to track the impact of human activities on the water cycle, highlighting its potential to improve irrigation, soil moisture, and vegetation estimates via future data assimilation.

Download Full-text

Classification and mapping of sound sources in local urban streets through AudioSet data and Bayesian optimized Neural Networks

Noise Mapping ◽

10.1515/noise-2019-0005 ◽

2019 ◽

Vol 6 (1) ◽

pp. 52-71 ◽

Cited By ~ 2

Author(s):

Deepank Verma ◽

Arnab Jana ◽

Krithi Ramamritham

Keyword(s):

Neural Networks ◽

Large Scale ◽

Interpolation Method ◽

Rapid Assessment ◽

Network Models ◽

Bayesian Optimization ◽

Neural Network Models ◽

Sound Sources ◽

Urban Streets ◽

Spatial Interpolation Method

Abstract Deep learning (DL) methods have provided several breakthroughs in conventional data analysis techniques, especially with image and audio datasets. Rapid assessment and large-scale quantification of environmental attributes have been possible through such models. This study focuses on the creation of Artificial Neural Networks (ANN) and Recurrent Neural Networks (RNN) based models to classify sound sources from manually collected sound clips in local streets. A subset of an openly available AudioSet data is used to train and evaluate the model against the common sound classes present in the urban streets. The collection of audio data is done at random locations in the selected study area of 0.2 sq. km. The audio clips are further classified according to the extent of anthropogenic (mainly traffic), natural and human-based sounds present in particular locations. Rather than the manual tuning of model hyperparameters, the study utilizes Bayesian Optimization to obtain hyperparameter values of Neural Network models. The optimized models produce an overall accuracy of 89 percent and 60 percent on the evaluation set for three and fifteen-class model respectively. The model detections are mapped in the study area with the help of the Inverse Distance Weighted (IDW) spatial interpolation method.

Download Full-text

Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models

10.1101/131367 ◽

2017 ◽

Cited By ~ 1

Author(s):

Safoora Yousefi ◽

Fatemeh Amrollahi ◽

Mohamed Amgad ◽

Coco Dong ◽

Joshua E. Lewis ◽

...

Keyword(s):

Clinical Outcomes ◽

Open Source Software ◽

Large Scale ◽

Prediction Models ◽

Genomic Medicine ◽

Optimization Methods ◽

Bayesian Optimization ◽

Survival Models ◽

High Dimensional ◽

Prognostic Accuracy

ABSTRACTTranslating the vast data generated by genomic platforms into accurate predictions of clinical outcomes is a fundamental challenge in genomic medicine. Many prediction methods face limitations in learning from the high-dimensional profiles generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this paper, we demonstrate how deep learning and Bayesian optimization methods that have been remarkably successful in general high-dimensional prediction tasks can be adapted to the problem of predicting cancer outcomes. We perform an extensive comparison of Bayesian optimized deep survival models and other state of the art machine learning methods for survival analysis, and describe a framework for interpreting deep survival models using a risk backpropagation technique. Finally, we illustrate that deep survival models can successfully transfer information across diseases to improve prognostic accuracy. We provide an open-source software implementation of this framework called SurvivalNet that enables automatic training, evaluation and interpretation of deep survival models.

Download Full-text

Efficient surrogate modeling methods for large-scale Earth system models based on machine learning techniques

10.5194/gmd-2018-327 ◽

2019 ◽

Author(s):

Dan Lu ◽

Daniel Ricciuto

Keyword(s):

Machine Learning ◽

Data Model ◽

Large Scale ◽

Surrogate Modeling ◽

Optimization Techniques ◽

Bayesian Optimization ◽

Model Integration ◽

Model Parameters ◽

Earth System ◽

Surrogate System

Abstract. Improving predictive understanding of Earth system variability and change requires data-model integration. Efficient data-model integration for complex models requires surrogate modeling to reduce model evaluation time. However, building a surrogate of a large-scale Earth system model (ESM) with many output variables is computationally intensive because it involves a large number of expensive ESM simulations. In this effort, we propose an efficient surrogate method capable of using a few ESM runs to build an accurate and fast-to-evaluate surrogate system of model outputs over large spatial and temporal domains. We first use singular value decomposition to reduce the output dimensions, and then use Bayesian optimization techniques to generate an accurate neural network surrogate model based on limited ESM simulation samples. Our machine learning based surrogate methods can build and evaluate a large surrogate system of many variables quickly. Thus, whenever the quantities of interest change such as a different objective function, a new site, and a longer simulation time, we can simply extract the information of interest from the surrogate system without rebuilding new surrogates, which significantly saves computational efforts. We apply the proposed method to a regional ecosystem model to approximate the relationship between 8 model parameters and 42 660 carbon flux outputs. Results indicate that using only 20 model simulations, we can build an accurate surrogate system of the 42 660 variables, where the consistency between the surrogate prediction and actual model simulation is 0.93 and the mean squared error is 0.02. This highly-accurate and fast-to-evaluate surrogate system will greatly enhance the computational efficiency in data-model integration to improve predictions and advance our understanding of the Earth system.

Download Full-text

An Approach to Bayesian Optimization in Optimizing Weighted Tchebycheff Multi-Objective Black-Box Functions

Volume 6: Design, Systems, and Complexity ◽

10.1115/imece2020-23414 ◽

2020 ◽

Author(s):

Arpan Biswas ◽

Claudio Fuentes ◽

Christopher Hoyle

Keyword(s):

Process Model ◽

Large Scale ◽

Low Cost ◽

Pareto Frontier ◽

Black Box ◽

Training Data ◽

Bayesian Optimization ◽

Future Research ◽

Objective Functions ◽

Multi Objective

Abstract Bayesian optimization (BO) is a low-cost global optimization tool for expensive black-box objective functions, where we learn from prior evaluated designs, update a posterior surrogate Gaussian process model, and select new designs for future evaluation using an acquisition function. This research focuses upon developing a BO model with multiple black-box objective functions. In the standard multi-objective optimization problem, the weighted Tchebycheff method is efficiently used to find both convex and non-convex Pareto frontier. This approach requires knowledge of utopia values before we start optimization. However, in the BO framework, since the functions are expensive to evaluate, it is very expensive to obtain the utopia values as a priori knowledge. Therefore, in this paper, we develop a Multi-Objective Bayesian Optimization (MO-BO) framework where we calibrate with Multiple Linear Regression (MLR) models to estimate the utopia value for each objective as a function of design input variables; the models are updated iteratively with sampled training data from the proposed multi-objective BO. The iteratively estimated mean utopia values are used to formulate the weighted Tchebycheff multi-objective acquisition function. The proposed approach is implemented in optimizing a thin tube design under constant loading of temperature and pressure, with multiple objectives such as minimizing the risk of creep-fatigue failure and design cost along with risk-based and manufacturing constraints. Finally, the model accuracy with and without MLR-based calibration is compared to the true Pareto solutions. The results show potential broader impacts, future research directions for further improving the proposed MO-BO model, and potential extensions to the application of large-scale design problems.

Download Full-text

Automated Metamaterial Design with Computer Model Emulation and Bayesian Optimization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.575.201 ◽

2014 ◽

Vol 575 ◽

pp. 201-205

Author(s):

Bin Liu ◽

Chun Lin Ji

Keyword(s):

Reverse Engineering ◽

Computer Model ◽

Large Scale ◽

Optimization Technique ◽

Geometric Dimension ◽

Bayesian Optimization ◽

Functional Regression ◽

Engineering Process ◽

Scale Design ◽

Computer Model Emulation

We present an automated computation system for large scale design of metamaterials (MTMs). A computer model emulation (CME) technique is used to generate a forward mapping from the MTM particle’s geometric dimension to the corresponding electromagnetic (EM) response. Then the design problem translates to be a reverse engineering process which aims to find optimal values of the geometric dimensions for the MTM particles. The core of the CME process is a statistical functional regression module using a Gaussian Process mixture (GPM) model. The reverse engineering process is implemented with a Bayesian optimization technique. Experimental results demonstrate that the proposed approach can facilitate rapid design of MTMs.

Download Full-text

Random Forest-Bayesian Optimization for Product Quality Prediction With Large-Scale Dimensions in Process Industrial Cyber–Physical Systems

IEEE Internet of Things Journal ◽

10.1109/jiot.2020.2992811 ◽

2020 ◽

Vol 7 (9) ◽

pp. 8641-8653

Author(s):

Tianteng Wang ◽

Xuping Wang ◽

Ruize Ma ◽

Xiaoyu Li ◽

Xiangpei Hu ◽

...

Keyword(s):

Random Forest ◽

Product Quality ◽

Large Scale ◽

Cyber Physical Systems ◽

Bayesian Optimization ◽

Quality Prediction ◽

Physical Systems

Download Full-text

Stochastic Analysis of the Kamishiro Earthquake Considering a Dynamic Fault Rupture

Journal of Earthquake and Tsunami ◽

10.1142/s1793431118410099 ◽

2018 ◽

Vol 12 (04) ◽

pp. 1841009

Author(s):

Yuta Mitsuhashi ◽

Gaku Hashimoto ◽

Hiroshi Okuda ◽

Fujio Uchiyama

Keyword(s):

Large Scale ◽

Time History ◽

Strong Motion ◽

Bayesian Optimization ◽

Underground Structures ◽

Dynamic Rupture ◽

Motion Generation ◽

Nonlinear Fem ◽

Need To Evaluate ◽

Time Histories

In recent years, a new demand has appeared for evaluations of earthquake fault displacements, to address the need to evaluate the soundness of underground structures. Fault displacements are caused by the rupturing of earthquake source faults, and are investigated through the use of methods such as the finite difference method and the finite element method (FEM). We conducted dynamic rupture simulations on the Kamishiro Fault Earthquake using a nonlinear FEM, focused on time history of fault displacement and response displacement, and demonstrated an ability to simulate observed values to a certain extent. During these simulations, we created models of homogeneous faults using the ground as the solid element and fault planes as joint elements. Although we were able to roughly simulate displacement time histories, obstacles to achieving more precise simulations still exist. In this research, we conducted investigations to model strong motion generation areas (SMGA). We conducted a searching analysis using Bayesian optimization with SMGA distribution within faults as parameters, and estimated the optimal parameters for simulating time histories of displacement. In addition, we compared our results with estimations of SMGA derived from different methods, and demonstrated that our distributions qualitatively matched. In addition, we evaluated the stochasticity of response displacement considering the randomness of the parameter of the fault. To conduct the simulation, we introduced joint elements from Goodman et al. that had been expanded to the FEM code FrontISTR, which makes it possible to analyze large-scale models.

Download Full-text