Fuzzy Reliability in Spatial Databases

Improving Real-Time Drilling Data Quality Using Artificial Intelligence and Machine Learning Techniques

10.2118/204658-ms ◽

2021 ◽

Author(s):

S. H. Al Gharbi ◽

A. A. Al-Majed ◽

A. Abdulraheem ◽

S. Patil ◽

S. M. Elkatatny

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Quality ◽

Real Time ◽

Input Data ◽

Support Vector ◽

The Real ◽

Drilling Data ◽

Drilling Operations

Abstract Due to high demand for energy, oil and gas companies started to drill wells in remote areas and unconventional environments. This raised the complexity of drilling operations, which were already challenging and complex. To adapt, drilling companies expanded their use of the real-time operation center (RTOC) concept, in which real-time drilling data are transmitted from remote sites to companies’ headquarters. In RTOC, groups of subject matter experts monitor the drilling live and provide real-time advice to improve operations. With the increase of drilling operations, processing the volume of generated data is beyond a human's capability, limiting the RTOC impact on certain components of drilling operations. To overcome this limitation, artificial intelligence and machine learning (AI/ML) technologies were introduced to monitor and analyze the real-time drilling data, discover hidden patterns, and provide fast decision-support responses. AI/ML technologies are data-driven technologies, and their quality relies on the quality of the input data: if the quality of the input data is good, the generated output will be good; if not, the generated output will be bad. Unfortunately, due to the harsh environments of drilling sites and the transmission setups, not all of the drilling data is good, which negatively affects the AI/ML results. The objective of this paper is to utilize AI/ML technologies to improve the quality of real-time drilling data. The paper fed a large real-time drilling dataset, consisting of over 150,000 raw data points, into Artificial Neural Network (ANN), Support Vector Machine (SVM) and Decision Tree (DT) models. The models were trained on the valid and not-valid datapoints. The confusion matrix was used to evaluate the different AI/ML models including different internal architectures. Despite the slowness of ANN, it achieved the best result with an accuracy of 78%, compared to 73% and 41% for DT and SVM, respectively. The paper concludes by presenting a process for using AI technology to improve real-time drilling data quality. To the author's knowledge based on literature in the public domain, this paper is one of the first to compare the use of multiple AI/ML techniques for quality improvement of real-time drilling data. The paper provides a guide for improving the quality of real-time drilling data.

Download Full-text

DATA QUALITY CONSIDERATIONS FOR PETROPHYSICAL MACHINE LEARNING MODELS

10.30632/spwla-2021-0036 ◽

2021 ◽

Author(s):

Andrew McDonald ◽

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Quality ◽

Input Data ◽

Well Log ◽

Learning Models ◽

Log Data ◽

Quality Issues ◽

Machine Learning Models

Decades of subsurface exploration and characterisation have led to the collation and storage of large volumes of well related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data is of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field. This study focuses on well log data, which can be highly multi-dimensional, diverse and stored in a variety of file formats. Well log data exhibits key characteristics of Big Data: Volume, Variety, Velocity, Veracity and Value. Well data can include numeric values, text values, waveform data, image arrays, maps, volumes, etc. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine learning models. Well log data can be affected by numerous issues causing a degradation in data quality. These include missing data - ranging from single data points to entire curves; noisy data from tool related issues; borehole washout; processing issues; incorrect environmental corrections; and mislabelled data. Having vast quantities of data does not mean it can all be passed into a machine learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data is passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, it also reduces computational time and can provide a better understanding of how the models reach their conclusion. This paper reviews data quality issues typically faced by petrophysicists when working with well log data and deploying machine learning models. First, an overview of machine learning and Big Data is covered in relation to petrophysical applications. Secondly, data quality issues commonly faced with well log data are discussed. Thirdly, methods are suggested on how to deal with data issues prior to modelling. Finally, multiple case studies are discussed covering the impacts of data quality on predictive capability.

Download Full-text

An evaluation of input data quality of lifelog analysis application with a framework based on quantitative index

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12 ◽

10.1145/2184751.2184861 ◽

2012 ◽

Author(s):

Akika Yamashita ◽

Saeko Iwaki ◽

Masato Oguchi

Keyword(s):

Data Quality ◽

Input Data ◽

Quantitative Index ◽

Analysis Application

Download Full-text

Metody i modele oceny jakości danych przestrzennych

10.15576/978-83-66602-30-4 ◽

2017 ◽

Author(s):

Marek Ślusarski ◽

Keyword(s):

Data Quality ◽

Spatial Data ◽

Large Scale ◽

Spatial Databases ◽

Test Sample ◽

Data Sets ◽

Infrastructure Network ◽

Underground Infrastructure

The quality of data collected in official spatial databases is crucial in making strategic decisions as well as in the implementation of planning and design works. Awareness of the level of the quality of these data is also important for individual users of official spatial data. The author presents methods and models of description and evaluation of the quality of spatial data collected in public registers. Data describing the space in the highest degree of detail, which are collected in three databases: land and buildings registry (EGiB), geodetic registry of the land infrastructure network (GESUT) and in database of topographic objects (BDOT500) were analyzed. The results of the research concerned selected aspects of activities in terms of the spatial data quality. These activities include: the assessment of the accuracy of data collected in official spatial databases; determination of the uncertainty of the area of registry parcels, analysis of the risk of damage to the underground infrastructure network due to the quality of spatial data, construction of the quality model of data collected in official databases and visualization of the phenomenon of uncertainty in spatial data. The evaluation of the accuracy of data collected in official, large-scale spatial databases was based on a representative sample of data. The test sample was a set of deviations of coordinates with three variables dX, dY and Dl – deviations from the X and Y coordinates and the length of the point offset vector of the test sample in relation to its position recognized as a faultless. The compatibility of empirical data accuracy distributions with models (theoretical distributions of random variables) was investigated and also the accuracy of the spatial data has been assessed by means of the methods resistant to the outliers. In the process of determination of the accuracy of spatial data collected in public registers, the author’s solution was used – resistant method of the relative frequency. Weight functions, which modify (to varying degree) the sizes of the vectors Dl – the lengths of the points offset vector of the test sample in relation to their position recognized as a faultless were proposed. From the scope of the uncertainty of estimation of the area of registry parcels the impact of the errors of the geodetic network points was determined (points of reference and of the higher class networks) and the effect of the correlation between the coordinates of the same point on the accuracy of the determined plot area. The scope of the correction was determined (in EGiB database) of the plots area, calculated on the basis of re-measurements, performed using equivalent techniques (in terms of accuracy). The analysis of the risk of damage to the underground infrastructure network due to the low quality of spatial data is another research topic presented in the paper. Three main factors have been identified that influence the value of this risk: incompleteness of spatial data sets and insufficient accuracy of determination of the horizontal and vertical position of underground infrastructure. A method for estimation of the project risk has been developed (quantitative and qualitative) and the author’s risk estimation technique, based on the idea of fuzzy logic was proposed. Maps (2D and 3D) of the risk of damage to the underground infrastructure network were developed in the form of large-scale thematic maps, presenting the design risk in qualitative and quantitative form. The data quality model is a set of rules used to describe the quality of these data sets. The model that has been proposed defines a standardized approach for assessing and reporting the quality of EGiB, GESUT and BDOT500 spatial data bases. Quantitative and qualitative rules (automatic, office and field) of data sets control were defined. The minimum sample size and the number of eligible nonconformities in random samples were determined. The data quality elements were described using the following descriptors: range, measure, result, and type and unit of value. Data quality studies were performed according to the users needs. The values of impact weights were determined by the hierarchical analytical process method (AHP). The harmonization of conceptual models of EGiB, GESUT and BDOT500 databases with BDOT10k database was analysed too. It was found that the downloading and supplying of the information in BDOT10k creation and update processes from the analyzed registers are limited. An effective approach to providing spatial data sets users with information concerning data uncertainty are cartographic visualization techniques. Based on the author’s own experience and research works on the quality of official spatial database data examination, the set of methods for visualization of the uncertainty of data bases EGiB, GESUT and BDOT500 was defined. This set includes visualization techniques designed to present three types of uncertainty: location, attribute values and time. Uncertainty of the position was defined (for surface, line, and point objects) using several (three to five) visual variables. Uncertainty of attribute values and time uncertainty, describing (for example) completeness or timeliness of sets, are presented by means of three graphical variables. The research problems presented in the paper are of cognitive and application importance. They indicate on the possibility of effective evaluation of the quality of spatial data collected in public registers and may be an important element of the expert system.

Download Full-text

Data Quality Considerations for Petrophysical Machine-Learning Models

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n6-2020a1 ◽

2021 ◽

Vol 62 (6) ◽

pp. 585-613

Author(s):

Andrew McDonald ◽

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Quality ◽

Input Data ◽

Computational Time ◽

Well Log ◽

Learning Models ◽

Log Data ◽

Machine Learning Models

Decades of subsurface exploration and characterization have led to the collation and storage of large volumes of well-related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine-learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data are of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field. This study focuses on well-log data, which can be highly multidimensional, diverse, and stored in a variety of file formats. Well-log data exhibits key characteristics of big data: volume, variety, velocity, veracity, and value. Well data can include numeric values, text values, waveform data, image arrays, maps, and volumes. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine-learning models. Well-log data can be affected by numerous issues causing a degradation in data quality. These include missing data ranging from single data points to entire curves, noisy data from tool-related issues, borehole washout, processing issues, incorrect environmental corrections, and mislabeled data. Having vast quantities of data does not mean it can all be passed into a machine-learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data are passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, but it also reduces computational time and can provide a better understanding of how the models reach their conclusion. This paper reviews data quality issues typically faced by petrophysicists when working with well-log data and deploying machine-learning models. This is achieved by first providing an overview of machine learning and big data within the petrophysical domain, followed by a review of the common well-log data issues, their impact on machine-learning algorithms, and methods for mitigating their influence.

Download Full-text

Bankruptcy Model Construction and its Limitation in Input Data Quality

Journal of Business and Economics ◽

10.15341/jbe(2155-7950)/02.10.2019/003 ◽

2019 ◽

Vol 10 (2) ◽

pp. 117-125

Author(s):

Dana Kubíčková ◽

◽

Vladimír Nulíček ◽

Keyword(s):

Data Quality ◽

Normal Distribution ◽

Input Data ◽

Model Construction ◽

The Third ◽

Third Stage ◽

Discriminant Analyses ◽

One Year ◽

The University ◽

Multivariate Discriminant

The aim of the research project solved at the University of Finance and administration is to construct a new bankruptcy model. The intention is to use data of the firms that have to cease their activities due to bankruptcy. The most common method for bankruptcy model construction is multivariate discriminant analyses (MDA). It allows to derive the indicators most sensitive to the future companies’ failure as a parts of the bankruptcy model. One of the assumptions for using the MDA method and reassuring the reliable results is the normal distribution and independence of the input data. The results of verification of this assumption as the third stage of the project are presented in this article. We have revealed that this assumption is met only in a few selected indicators. Better results were achieved in the indicators in the set of prosperous companies and one year prior the failure. The selected indicators intended for the bankruptcy model construction thus cannot be considered as suitable for using the MDA method.

Download Full-text

Improving data quality of a trauma register

10.26226/morressier.58f5b02fd462b80296c9e0d7 ◽

2017 ◽

Author(s):

Estefania Rabaneda Romero

Keyword(s):

Data Quality

Download Full-text

Designing Information Product (IP) Maps On the Process of Data Processing and Academic Information

International Journal of New Media Technology ◽

10.31937/ijnmt.v4i1.534 ◽

2017 ◽

Vol 4 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Diana Effendi

Keyword(s):

Data Quality ◽

Data Management ◽

Information Management ◽

Information Quality ◽

Quality Data ◽

Management Approach ◽

Quality Of Data ◽

Information Product ◽

Academic Activities

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management. Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.

Download Full-text

Faculty Opinions recommendation of Measuring quality of life in women with endometriosis: tests of data quality, score reliability, response rate and scaling assumptions of the Endometriosis Health Profile Questionnaire.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1058951.510906 ◽

2007 ◽

Author(s):

Thomas d'Hooghe

Keyword(s):

Quality Of Life ◽

Data Quality ◽

Response Rate ◽

Quality Score ◽

Health Profile ◽

Score Reliability

Download Full-text

Accuracy analysis in borders description for municipal entities in the Republic Bashkortostan

Geodesy and Cartography ◽

10.22389/0016-7126-2017-924-6-2-5 ◽

2017 ◽

Vol 924 (6) ◽

pp. 2-5

Author(s):

V.N. Puchkov ◽

R.S. Musalimov ◽

D.S. Zavarnov

Keyword(s):

Russian Federation ◽

Input Data ◽

Accuracy Analysis ◽

Rural Settlements ◽

Regulatory Requirements ◽

Republic Of Bashkortostan ◽

Weak Points ◽

The Republic

In this work the analysis on description of rural settlements boundaries of the Republic of Bashkortostan, based on the experience of other sub-federal units of Russian Federation was made. A range of weak points in collected input data was defined. In total, of 54 municipal districts of the Republic of Bashkortostan (818 rural settlements), 44 districts showed nonconformity of feed data details to regulatory requirements. And the main reason for this is a low quality of input materials such as base maps at scale 1

Download Full-text