Monte Carlo Simulations for Uncertainty Estimation in 3D Geological Modeling, A Guide for Disturbance Distribution Selection and Parameterization

Mapping Intimacies ◽

10.5194/se-2017-115 ◽

2017 ◽

Cited By ~ 1

Author(s):

Evren Pakyuz-Charrier ◽

Mark Lindsay ◽

Vitaliy Ogarko ◽

Jeremie Giraud ◽

Mark Jessell

Keyword(s):

Monte Carlo ◽

Input Data ◽

Uncertainty Estimation ◽

Data Sets ◽

Geological Modeling ◽

Data Set ◽

Input Uncertainty ◽

3D Geological Modeling ◽

Wide Range ◽

Geological Models

Abstract. Three-dimensional (3D) geological modeling aims to determine geological information in a 3D space using structural data (foliations and interfaces) and topological rules as inputs. They are necessary in any project where the properties of the subsurface matters, they express our understanding of geometries in depth. For that reason, 3D geological models have a wide range of practical applications including but not restrained to civil engineering, oil and gas industry, mining industry and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator’s parameterization) combined with input uncertainty (observational-, conceptual- and technical errors). Because 3D geological models are often used for impactful decision making it is critical that all 3D geological models provide accurate estimates of uncertainty. This paper’s focus is set on the effect of structural input data uncertainty propagation in implicit 3D geological modeling using GeoModeller API. This aim is achieved using Monte Carlo simulation uncertainty estimation (MCUE), a heuristic stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than dip vector sampling for planar features and the use of a Bayesian approach to disturbance distribution parameterization is suggested. The influence of inappropriate disturbance distributions is discussed and propositions are made and evaluated on synthetic and realistic cases to address the sighted issues. The distribution of the errors of the observed data (i.e. scedasticity) is shown to affect the quality of prior distributions for MCUE. Results demonstrate that the proposed workflows improve the reliability of uncertainty estimation and diminishes the occurrence of artefacts.

Download Full-text

Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization

Solid Earth ◽

10.5194/se-9-385-2018 ◽

2018 ◽

Vol 9 (2) ◽

pp. 385-402 ◽

Cited By ~ 20

Author(s):

Evren Pakyuz-Charrier ◽

Mark Lindsay ◽

Vitaliy Ogarko ◽

Jeremie Giraud ◽

Mark Jessell

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Input Data ◽

Structural Data ◽

Uncertainty Estimation ◽

Data Sets ◽

Geological Modeling ◽

Input Uncertainty ◽

Wide Range ◽

Geological Models

Abstract. Three-dimensional (3-D) geological structural modeling aims to determine geological information in a 3-D space using structural data (foliations and interfaces) and topological rules as inputs. This is necessary in any project in which the properties of the subsurface matters; they express our understanding of geometries in depth. For that reason, 3-D geological models have a wide range of practical applications including but not restricted to civil engineering, the oil and gas industry, the mining industry, and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator's parameterization) and the inherent lack of knowledge in areas where there are no observations combined with input uncertainty (observational, conceptual and technical errors). Because 3-D geological models are often used for impactful decision-making it is critical that all 3-D geological models provide accurate estimates of uncertainty. This paper's focus is set on the effect of structural input data measurement uncertainty propagation in implicit 3-D geological modeling. This aim is achieved using Monte Carlo simulation for uncertainty estimation (MCUE), a stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3-D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than dip vector sampling for planar features and the use of a Bayesian approach to disturbance distribution parameterization is suggested. The influence of incorrect disturbance distributions is discussed and propositions are made and evaluated on synthetic and realistic cases to address the sighted issues. The distribution of the errors of the observed data (i.e., scedasticity) is shown to affect the quality of prior distributions for MCUE. Results demonstrate that the proposed workflows improve the reliability of uncertainty estimation and diminish the occurrence of artifacts.

Download Full-text

Topological Analysis in Monte Carlo Simulation for Uncertainty Estimation

10.5194/se-2019-78 ◽

2019 ◽

Cited By ~ 1

Author(s):

Evren Pakyuz-Charrier ◽

Mark Jessell ◽

Jérémie Giraud ◽

Mark Lindsay ◽

Vitaliy Ogarko

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Input Data ◽

Topological Analysis ◽

Uncertainty Estimation ◽

Population Heterogeneity ◽

Underlying Assumption ◽

3D Geological Modeling ◽

Plausible Model ◽

Uncertainty Estimates

Abstract. This paper proposes and demonstrates improvements for the Monte Carlo simulation for Uncertainty Estimation (MCUE) method. MCUE is a type of Bayesian Monte Carlo aimed at input data uncertainty propagation in implicit 3D geological modeling. In the Monte Carlo process, a series of statistically plausible models are built from the input data set which uncertainty is to be propagated to a final probabilistic geological model (PGM) or uncertainty index model (UIM). Significant differences in terms of topology are observed in the plausible model suite that is generated as an intermediary step in MCUE. These differences are interpreted as analogous to population heterogeneity. The source of this heterogeneity is traced to be the non-linear relationship between plausible datasets’ variability and plausible model’s variability. Non-linearity is shown to arise from the effect of the geometrical ruleset on model building which transforms lithological continuous interfaces into discontinuous piecewise ones. Plausible model heterogeneity induces geological incompatibility and challenges the underlying assumption of homogeneity which global uncertainty estimates rely on. To address this issue, a method for topological analysis applied to the plausible model suite in MCUE is introduced. Boolean topological signatures recording lithological units’ adjacency are used as n-dimensional points to be considered individually or clustered using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The proposed method is tested on two challenging synthetic examples with varying levels of confidence in the structural input data. Results indicate that topological signatures constitute a powerful discriminant to address plausible model heterogeneity. Basic topological signatures appear to be a reliable indicator of the structural behavior of the plausible models and provide useful geological insights. Moreover, ignoring heterogeneity was found to be detrimental to the accuracy and relevance of the PGMs and UIMs.

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

Global whole-rock geochemical database compilation

10.5194/essd-2019-50 ◽

2019 ◽

Cited By ~ 2

Author(s):

Matthew Gard ◽

Derrick Hasterok ◽

Jacqueline Halpin

Keyword(s):

Database Management ◽

Temporal Trends ◽

Physical Property ◽

Database Management System ◽

Geochemical Data ◽

Data Sets ◽

Unique Contribution ◽

Data Set ◽

Wide Range ◽

Geochemical Indices

Abstract. Dissemination and collation of geochemical data are critical to promote rapid, creative and accurate research and place new results in an appropriate global context. To this end, we have assembled a global whole-rock geochemical database, with other associated sample information and properties, sourced from various existing databases and supplemented with numerous individual publications and corrections. Currently the database stands at 1,023,490 samples with varying amounts of associated information including major and trace element concentrations, isotopic ratios, and location data. The distribution both spatially and temporally is quite heterogeneous, however temporal distributions are enhanced over some previous database compilations, particularly in terms of ages older than ~ 1000 Ma. Also included are a wide range of computed geochemical indices, physical property estimates and naming schema on a major element normalized version of the geochemical data for quick reference. This compilation will be useful for geochemical studies requiring extensive data sets, in particular those wishing to investigate secular temporal trends. The addition of physical properties, estimated by sample chemistry, represents a unique contribution to otherwise similar geochemical databases. The data is published in .csv format for the purposes of simple distribution but exists in a format acceptable for database management systems (e.g. SQL). One can either manipulate this data using conventional analysis tools such as MATLAB®, Microsoft® Excel, or R, or upload to a relational database management system for easy querying and management of the data as unique keys already exist. This data set will continue to grow, and we encourage readers to contact us or other database compilations contained within about any data that is yet to be included. The data files described in this paper are available at https://doi.org/10.5281/zenodo.2592823 (Gard et al., 2019).

Download Full-text

Panoramic stitching of heterogeneous single-cell transcriptomic data

10.1101/371179 ◽

2018 ◽

Cited By ~ 17

Author(s):

Brian Hie ◽

Bryan Bryson ◽

Bonnie Berger

Keyword(s):

Single Cell ◽

Cell Types ◽

Data Sets ◽

Cell Type ◽

Data Set ◽

Wide Range ◽

Data Set Integration ◽

Biological Patterns ◽

Insight Into ◽

Comprehensive Reference

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Rapid 3D geological modeling to assess and visualize uncertainties in a web application

10.5194/egusphere-egu21-13187 ◽

2021 ◽

Author(s):

Daniel Pflieger ◽

Miguel de la Varga Hormazabal ◽

Simon Virgo ◽

Jan von Harten ◽

Florian Wellmann

Keyword(s):

Web Application ◽

High Performance ◽

Three Dimensional ◽

Cloud Services ◽

Management Approach ◽

Geological Modeling ◽

Web Browser ◽

3D Geological Modeling ◽

State Management ◽

Wide Range

Three dimensional modeling is a rapidly developing field in geological scientific and commercial applications. The combination of modeling and uncertainty analysis aides in understanding and quantitatively assessing complex subsurface structures. In recent years, many methods have been developed to facilitate this combined analysis, usually either through an extension of existing desktop applications or by making use of Jupyter notebooks as frontends. We evaluate here if modern web browser technology, linked to high-performance cloud services, can also be used for these types of analyses.For this purpose, we developed a web application as proof-of-concept with the aim to visualize three dimensional geological models provided by a server. The implementation enables the modification of input parameters with assigned probability distributions. This step enables the generation of randomized realizations of models and the quantification and visualization of propagated uncertainties. The software is implemented using HTML Web Components on the client side and a Python server, providing a RESTful API to the open source geological modeling tool &#8220;GemPy&#8221;. Encapsulating the main components in custom elements, in combination with a minimalistic state management approach and a template parser, allows for high modularity. This enables rapid extendibility of the functionality of the components depending on the user&#8217;s needs and an easy integration into existing web platforms.Our implementation shows that it is possible to extend and simplify modeling processes by creating an expandable web-based platform for probabilistic modeling, with the aim to increase the usability and to facilitate access to this functionality for a wide range of scientific analyses. The ability to compute models rapidly and with any given device in a web browser makes it flexible to use, and more accessible to a broader range of users.

Download Full-text

GemPy 1.0: open-source stochastic geological modeling and inversion

Geoscientific Model Development ◽

10.5194/gmd-12-1-2019 ◽

2019 ◽

Vol 12 (1) ◽

pp. 1-32 ◽

Cited By ~ 11

Author(s):

Miguel de la Varga ◽

Alexander Schaaf ◽

Florian Wellmann

Keyword(s):

Open Source ◽

Code Generation ◽

Raw Material ◽

Reproducible Research ◽

Geological Modeling ◽

Fault Surface ◽

Wide Range ◽

Geological Models ◽

Density Values ◽

Efficient Code

Abstract. The representation of subsurface structures is an essential aspect of a wide variety of geoscientific investigations and applications, ranging from geofluid reservoir studies, over raw material investigations, to geosequestration, as well as many branches of geoscientific research and applications in geological surveys. A wide range of methods exist to generate geological models. However, the powerful methods are behind a paywall in expensive commercial packages. We present here a full open-source geomodeling method, based on an implicit potential-field interpolation approach. The interpolation algorithm is comparable to implementations in commercial packages and capable of constructing complex full 3-D geological models, including fault networks, fault–surface interactions, unconformities and dome structures. This algorithm is implemented in the programming language Python, making use of a highly efficient underlying library for efficient code generation (Theano) that enables a direct execution on GPUs. The functionality can be separated into the core aspects required to generate 3-D geological models and additional assets for advanced scientific investigations. These assets provide the full power behind our approach, as they enable the link to machine-learning and Bayesian inference frameworks and thus a path to stochastic geological modeling and inversions. In addition, we provide methods to analyze model topology and to compute gravity fields on the basis of the geological models and assigned density values. In summary, we provide a basis for open scientific research using geological models, with the aim to foster reproducible research in the field of geomodeling.

Download Full-text

A New Robust Diagnostic Plot for Classifying Good and Bad High Leverage Points in a Multiple Linear Regression Model

Mathematical Problems in Engineering ◽

10.1155/2015/279472 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12

Author(s):

Mohammed Alguraibawi ◽

Habshah Midi ◽

A. H. M. Rahmatullah Imon

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Multiple Linear Regression Model ◽

Parameter Estimates ◽

Data Sets ◽

Data Set ◽

Monte Carlo Simulation Study ◽

Leverage Points ◽

Diagnostic Plot ◽

Leverage Point

Identification of high leverage point is crucial because it is responsible for inaccurate prediction and invalid inferential statement as it has a larger impact on the computed values of various estimates. It is essential to classify the high leverage points into good and bad leverage points because only the bad leverage points have an undue effect on the parameter estimates. It is now evident that when a group of high leverage points is present in a data set, the existing robust diagnostic plot fails to classify them correctly. This problem is due to the masking and swamping effects. In this paper, we propose a new robust diagnostic plot to correctly classify the good and bad leverage points by reducing both masking and swamping effects. The formulation of the proposed plot is based on the Modified Generalized Studentized Residuals. We investigate the performance of our proposed method by employing a Monte Carlo simulation study and some well-known data sets. The results indicate that the proposed method is able to improve the rate of detection of bad leverage points and also to reduce swamping and masking effects.

Download Full-text

Clusterdv, a simple density-based clustering method that is robust, general and automatic

10.1101/224840 ◽

2017 ◽

Author(s):

João C. Marques ◽

Michael B. Orger

Keyword(s):

Clustering Algorithm ◽

Underlying Structure ◽

Data Sets ◽

Natural Phenomena ◽

Cluster Number ◽

Data Set ◽

Density Peaks ◽

Wide Range ◽

Cluster Shape ◽

Fully Automatic

AbstractHow to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental data sets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.Author summarIt is common that natural phenomena produce groupings, or clusters, in data, that can reveal the underlying processes. However, the form of these clusters can vary arbitrarily, making it challenging to find a single algorithm that identifies their structure correctly, without prior knowledge of the number of groupings or their distribution. We describe a simple clustering algorithm that is fully automatic and is able to correctly identify the number and shape of groupings in data of many types. We expect this algorithm to be useful in finding unknown natural phenomena present in data from a wide range of scientific fields.

Download Full-text