A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Elahe Jamalinia; Faraz S. Tehrani; Susan C. Steele-Dunne; Philip J. Vardon

doi:10.3390/w13010107

A Data-Driven Surrogate Approach for the Temporal Stability Forecasting of Vegetation Covered Dikes

Water ◽

10.3390/w13010107 ◽

2021 ◽

Vol 13 (1) ◽

pp. 107

Author(s):

Elahe Jamalinia ◽

Faraz S. Tehrani ◽

Susan C. Steele-Dunne ◽

Philip J. Vardon

Keyword(s):

Numerical Simulation ◽

Water Flux ◽

Temporal Stability ◽

Synthetic Data ◽

Climatic Conditions ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Surface Cracking ◽

Real Time Analysis

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.

Download Full-text

Techniques for Fast Screening of 3D Heterogeneous Shale Barrier Configurations and Their Impacts on SAGD Chamber Development

SPE Journal ◽

10.2118/199906-pa ◽

2021 ◽

pp. 1-25

Author(s):

Chang Gao ◽

Juliana Y. Leung

Keyword(s):

Distance Measure ◽

Flow Simulation ◽

Training Data ◽

Distance Measures ◽

Data Driven ◽

Data Set ◽

Flow Simulations ◽

Steam Chamber ◽

Reservoir Models ◽

Tracking Model

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.

Download Full-text

Comparison among the inversion results of surface wave dispersion curves by the artificial neural networks and local and global search methods

Недропользование. Горное дело. Направления и технологии поиска, разведки и разработки месторождений полезных ископаемых. Экономика. Геоэкология: Материалы XVI международной конференции (20-24 апреля 2020, Новосибирск) ◽

10.18303/b978-5-4262-0102-6-2020-082 ◽

2020 ◽

Author(s):

Alexandr V. Yablokov ◽

◽

Aleksander S. Serdyukov ◽

Georgy N. Loginov ◽

◽

...

Keyword(s):

Surface Wave ◽

Synthetic Data ◽

Wave Dispersion ◽

Dispersion Curves ◽

Global Search ◽

Training Data ◽

Surface Wave Dispersion ◽

Search Methods ◽

Data Set ◽

Artificial Neural

We propose a new method for the inversion of surface wave dispersion curves based on the application of an artificial neural network and we suggest a data–driven approach for selecting the range of the space parameters for the calculating training data set. The synthetic data processing results showed that the accuracy of the proposed method is superior local search and equivalent to global search methods, whereas the proposed method is more robust in the presence of noise.

Download Full-text

Semi-Supervised Learning With Co-Training for Data-Driven Prognostics

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48302 ◽

2011 ◽

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Taejin Kim

Keyword(s):

Remaining Useful Life ◽

Training Data ◽

Data Driven ◽

Individual Data ◽

Data Set ◽

Failure Data ◽

Rich Information ◽

Useful Life ◽

Engineered Systems ◽

Systems Failure

Traditional data-driven prognostics often requires a large amount of failure data for the offline training in order to achieve good accuracy for the online prediction. However, in many engineered systems, failure data are fairly expensive and time-consuming to obtain while suspension data are readily available. In such cases, it becomes essentially critical to utilize suspension data, which may carry rich information regarding the degradation trend and help achieve more accurate remaining useful life (RUL) prediction. To this end, this paper proposes a co-training-based data-driven prognostic algorithm, denoted by Coprog, which uses two individual data-driven algorithms with each predicting RULs of suspension units for the other. The confidence of an individual data-driven algorithm in predicting the RUL of a suspension unit is quantified by the extent to which the inclusion of that unit in the training data set reduces the sum square error (SSE) in RUL prediction on the failure units. After a suspension unit is chosen and its RUL is predicted by an individual algorithm, it becomes a virtual failure unit that is added to the training data set. Results obtained from two case studies suggest that Coprog gives more accurate RUL predictions compared to any individual algorithm without the consideration of suspension data and that Coprog can effectively exploit suspension data to improve the accuracy in data-driven prognostics.

Download Full-text

Car-Following Described by Blending Data-Driven and Analytical Models: A Gaussian Process Regression Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211032648 ◽

2021 ◽

pp. 036119812110326

Author(s):

Ignasi Echaniz Soldevila ◽

Victor L. Knoop ◽

Serge Hoogendoorn

Keyword(s):

Gaussian Process Regression ◽

Large Data ◽

Driving Behavior ◽

Large Data Sets ◽

Training Data ◽

Data Driven ◽

Data Sets ◽

Data Set ◽

Car Following ◽

New Variables

Traffic engineers rely on microscopic traffic models to design, plan, and operate a wide range of traffic applications. Recently, large data sets, yet incomplete and from small space regions, are becoming available thanks to technology improvements and governmental efforts. With this study we aim to gain new empirical insights into longitudinal driving behavior and to formulate a model which can benefit from these new challenging data sources. This paper proposes an application of an existing formulation, Gaussian process regression (GPR), to describe individual longitudinal driving behavior of drivers. The method integrates a parametric and a non-parametric mathematical formulation. The model predicts individual driver’s acceleration given a set of variables. It uses the GPR to make predictions when there exists correlation between new input and the training data set. The data-driven model benefits from a large training data set to capture all driver longitudinal behavior, which would be difficult to fit in fixed parametric equation(s). The methodology allows us to train models with new variables without the need of altering the model formulation. And importantly, the model also uses existing traditional parametric car-following models to predict acceleration when no similar situations are found in the training data set. A case study using radar data in an urban environment shows that a hybrid model performs better than parametric model alone and suggests that traffic light status over time influences drivers’ acceleration. This methodology can help engineers to use large data sets and to find new variables to describe traffic behavior.

Download Full-text

Deep-Learning-Based Automated Sedimentary Geometry Characterization From Borehole Images

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n6-2020a4 ◽

2021 ◽

Vol 62 (6) ◽

pp. 636-650

Author(s):

Marie Lefranc ◽

◽

Zikri Bayraktar ◽

Morten Kristensen ◽

Hedi Driss ◽

...

Keyword(s):

Deep Learning ◽

Flow Direction ◽

Synthetic Data ◽

Network Models ◽

Training Data ◽

Proper Class ◽

Neural Network Models ◽

Data Set ◽

Sedimentary Structures ◽

Automated Method

Sedimentary geometry on borehole images usually summarizes the arrangement of bed boundaries, erosive surfaces, crossbedding, sedimentary dip, and/or deformed beds. The interpretation, very often manual, requires a good level of expertise, is time consuming, can suffer from user bias, and becomes very challenging when dealing with highly deviated wells. Bedform geometry interpretation from crossbed data is rarely completed from a borehole image. The purpose of this study is to develop an automated method to interpret sedimentary structures, including the bedform geometry resulting from the change in flow direction from borehole images. Automation is achieved in this unique interpretation methodology using deep learning (DL). The first task comprised the creation of a training data set of 2D borehole images. This library of images was then used to train deep neural network models. Testing different architectures of convolutional neural networks (CNN) showed the ResNet architecture to give the best performance for the classification of the different sedimentary structures. The validation accuracy was very high, in the range of 93 to 96%. To test the developed method, additional logs of synthetic data were created as sequences of different sedimentary structures (i.e., classes) associated with different well deviations, with the addition of gaps. The model was able to predict the proper class in these composite logs and highlight the transitions accurately.

Download Full-text

Data-Driven Nonlinear Constitutive Relations for Rarefied Flow Computations

10.21203/rs.3.rs-735668/v1 ◽

2021 ◽

Author(s):

Wenwen Zhao ◽

Lijian Jiang ◽

Shaobo Yao ◽

Weifang Chen

Keyword(s):

Heat Flux ◽

Rarefied Gas ◽

Constitutive Relations ◽

Kinetic Scheme ◽

Training Model ◽

Training Data ◽

Data Driven ◽

Data Set ◽

Rarefied Flow ◽

Nonlinear Constitutive Relations

Abstract To overcome the defects of traditional rarefied numerical methods such as the Direct Simulation Monte Carlo (DSMC) method and unified Boltzmann equation schemes and extend the covering range of macroscopic equations in high Knudsen number flows, data-driven nonlinear constitutive relations (DNCR) are proposed firstly through machine learning method. Based on the training data from both Navier-Stokes (NS) solver and unified gas kinetic scheme (UGKS) solver, the map between discrepancies of stress tensors and heat flux and feature vectors is established after training phase. Through the obtained off-line training model, new test case excluded from training data set could be predicated rapidly and accurately by solving conventional equations with modified stress tensor and heat flux. Finally, conventional one-dimensional shock wave cases and two-dimensional hypersonic flows around a blunt circular cylinder are presented to assess the capability of the developed method through a various comparisons between DNCR, NS, UGKS, DSMC and experimental results. The improvement of the predictive capability of the coarse-graining model could make DNCR method to be an effective tool in rarefied gas community, especially for hypersonic engineering applications.

Download Full-text

Modeling highway runoff pollutant levels using a data driven model

Water Science & Technology ◽

10.2166/wst.2009.289 ◽

2009 ◽

Vol 60 (1) ◽

pp. 19-28 ◽

Cited By ~ 8

Author(s):

T. Opher ◽

A. Ostfeld ◽

E. Friedler

Keyword(s):

Management Strategies ◽

Training Data ◽

Data Driven ◽

Environmental Research ◽

Good Prediction ◽

Runoff Water ◽

Highway Runoff ◽

Data Set ◽

Prediction Ability ◽

Road Pavement

Pollutants accumulated on road pavement during dry periods are washed off the surface with runoff water during rainfall events, presenting a potentially hazardous non-point source of pollution. Estimation of pollutant loads in these runoff waters is required for developing mitigation and management strategies, yet the numerous factors involved and their complex interconnected influences make straightforward assessment almost impossible. Data driven models (DDMs) have lately been used in water and environmental research and have shown very good prediction ability. The proposed methodology of a coupled MT-GA model provides an effective, accurate and easily calibrated predictive model for EMC of highway runoff pollutants. The models were trained and verified using a comprehensive data set of runoff events monitored in various highways in California, USA. EMCs of Cr, Pb, Zn, TOC and TSS were modeled, using different combinations of explanatory variables. The models' prediction ability in terms of correlation between predicted and actual values of both training and verification data was mostly higher than previously reported values. PbTotal was modeled with an outcome of R2 of 0.95 on training data and 0.43 on verification data. The developed model for TOC achieved R2 values of 0.91 and 0.49 on training and verification data respectively.

Download Full-text

Automatic velocity analysis using convolutional neural network and transfer learning

Geophysics ◽

10.1190/geo2018-0870.1 ◽

2019 ◽

Vol 85 (1) ◽

pp. V33-V43 ◽

Cited By ~ 4

Author(s):

Min Jun Park ◽

Mauricio D. Sacchi

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Input Data ◽

Synthetic Data ◽

Velocity Model ◽

Training Data ◽

Velocity Analysis ◽

Time Step ◽

Data Set

Velocity analysis can be a time-consuming task when performed manually. Methods have been proposed to automate the process of velocity analysis, which, however, typically requires significant manual effort. We have developed a convolutional neural network (CNN) to estimate stacking velocities directly from the semblance. Our CNN model uses two images as one input data for training. One is an entire semblance (guide image), and the other is a small patch (target image) extracted from the semblance at a specific time step. Labels for each input data set are the root mean square velocities. We generate the training data set using synthetic data. After training the CNN model with synthetic data, we test the trained model with another synthetic data that were not used in the training step. The results indicate that the model can predict a consistent velocity model. We also noticed that when the input data are extremely different from those used for the training, the CNN model will hardly pick the correct velocities. In this case, we adopt transfer learning to update the trained model (base model) with a small portion of the target data to improve the accuracy of the predicted velocity model. A marine data set from the Gulf of Mexico is used for validating our new model. The updated model performed a reasonable velocity analysis in seconds.

Download Full-text

First-arrival picking with a U-net convolutional network

Geophysics ◽

10.1190/geo2018-0688.1 ◽

2019 ◽

Vol 84 (6) ◽

pp. U45-U57 ◽

Cited By ~ 5

Author(s):

Lianlian Hu ◽

Xiaodong Zheng ◽

Yanting Duan ◽

Xinfei Yan ◽

Ying Hu ◽

...

Keyword(s):

Seismic Data ◽

Synthetic Data ◽

Training Data ◽

Convolutional Network ◽

Data Set ◽

Waveform Data ◽

Near Surface ◽

Arrival Times ◽

Seismic Waveform ◽

First Arrival

In exploration geophysics, the first arrivals on data acquired under complicated near-surface conditions are often characterized by significant static corrections, weak energy, low signal-to-noise ratio, and dramatic phase change, and they are difficult to pick accurately with traditional automatic procedures. We have approached this problem by using a U-shaped fully convolutional network (U-net) to first-arrival picking, which is formulated as a binary segmentation problem. U-net has the ability to recognize inherent patterns of the first arrivals by combining attributes of arrivals in space and time on data of varying quality. An effective workflow based on U-net is presented for fast and accurate picking. A set of seismic waveform data and their corresponding first-arrival times are used to train the network in a supervised learning approach, then the trained model is used to detect the first arrivals for other seismic data. Our method is applied on one synthetic data set and three field data sets of low quality to identify the first arrivals. Results indicate that U-net only needs a few annotated samples for learning and is able to efficiently detect first-arrival times with high precision on complicated seismic data from a large survey. With the increasing training data of various first arrivals, a trained U-net has the potential to directly identify the first arrivals on new seismic data.

Download Full-text

Stellar parameter determination from photometry using invertible neural networks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2931 ◽

2020 ◽

Vol 499 (4) ◽

pp. 5447-5485

Author(s):

Victor F Ksoll ◽

Lynton Ardizzone ◽

Ralf Klessen ◽

Ullrich Koethe ◽

Elena Sabbi ◽

...

Keyword(s):

Stellar Evolution ◽

Hubble Space Telescope ◽

Synthetic Data ◽

Mass Function ◽

Initial Mass Function ◽

Training Data ◽

Parameter Determination ◽

Physical Parameters ◽

Initial Current ◽

Data Set

ABSTRACT Photometric surveys with the Hubble Space Telescope (HST) allow us to study stellar populations with high-resolution and deep coverage, with estimates of the physical parameters of the constituent stars being typically obtained by comparing the survey data with adequate stellar evolutionary models. This is a highly non-trivial task due to effects such as differential extinction, photometric errors, low filter coverage, or uncertainties in the stellar evolution calculations. These introduce degeneracies that are difficult to detect and break. To improve this situation, we introduce a novel deep learning approach, called conditional invertible neural network (cINN), to solve the inverse problem of predicting physical parameters from photometry on an individual star basis and to obtain the full posterior distributions. We build a carefully curated synthetic training data set derived from the PARSEC stellar evolution models to predict stellar age, initial/current mass, luminosity, effective temperature, and surface gravity. We perform tests on synthetic data from the MIST and Dartmouth models, and benchmark our approach on HST data of two well-studied stellar clusters, Westerlund 2 and NGC 6397. For the synthetic data, we find overall excellent performance, and note that age is the most difficult parameter to constrain. For the benchmark clusters, we retrieve reasonable results and confirm previous findings for Westerlund 2 on cluster age ($1.04_{-0.90}^{+8.48}\, \mathrm{Myr}$), mass segregation, and the stellar initial mass function. For NGC 6397, we recover plausible estimates for masses, luminosities, and temperatures, however, discrepancies between stellar evolution models and observations prevent an acceptable recovery of age for old stars.

Download Full-text