AI-Based Estimation of Hydraulic Fracturing Effect

SPE Journal ◽

10.2118/205479-pa ◽

2021 ◽

pp. 1-12

Author(s):

A. S. Erofeev ◽

D. M. Orlov ◽

D. S. Perets ◽

D. A. Koroteev

Keyword(s):

Hydraulic Fracturing ◽

Candidate Selection ◽

Oil Fields ◽

Gradient Boosting ◽

Data Set ◽

Test Set ◽

New Approach ◽

Boosting Method ◽

And Storage ◽

Storage Properties

Summary We studied the applicability of a gradient-boostingmachine-learning (ML) algorithm for forecasting of oil and total liquid production after hydraulic fracturing (HF). A thorough raw data study with data preprocessing algorithms was provided. The data set included 10 oil fields with more than 2,000 HF events. Each event has been characterized by well coordinates, geology, transport and storage properties, depths, and oil/liquid rates before fracturing for target and neighboring wells. Each ML model has been trained to predict monthly production rates right after fracturing and when the flows are stabilized. The gradient-boosting method justified its choice with R2 being approximately 0.7 to 0.8 on the test set for oil/total liquid production after HF. The developed ML prediction model does not require preliminary numerical simulations of a future HF design. The applied algorithm could be used as a new approach for HF candidate selection based on the real-time state of the field.

Download Full-text

Fiber-optic distributed acoustic sensing of microseismicity, strain and temperature during hydraulic fracturing

Geophysics ◽

10.1190/geo2017-0396.1 ◽

2019 ◽

Vol 84 (1) ◽

pp. D11-D23 ◽

Cited By ~ 20

Author(s):

Martin Karrenbach ◽

Steve Cole ◽

Andrew Ridge ◽

Kevin Boone ◽

Dan Kahn ◽

...

Keyword(s):

Hydraulic Fracturing ◽

Hydraulic Fracture ◽

Fiber Optic ◽

Data Set ◽

New Approach ◽

Acoustic Sensing ◽

Fiber Optic Technology ◽

Physical Effects ◽

Distributed Acoustic Sensing ◽

Temperature Strain

Hydraulic fracturing operations in unconventional reservoirs are typically monitored using geophones located either at the surface or in the adjacent wellbores. A new approach to record hydraulic stimulations uses fiber-optic distributed acoustic sensing (DAS). A fiber-optic cable was installed in a treatment well in the Meramec formation to monitor the hydraulic fracture stimulation of an unconventional reservoir. A variety of physical effects, such as temperature, strain, and microseismicity are measured and correlated with the treatment program during hydraulic fracturing of the well containing the fiber and also an adjacent well. The analysis of this DAS data set demonstrates that current fiber-optic technology provides enough sensitivity to detect a considerable number of microseismic events and that these events can be integrated with temperature and strain measurements for comprehensive hydraulic fracture monitoring.

Download Full-text

Estimation and updating methods for hedonic valuation

Journal of European Real Estate Research ◽

10.1108/jerer-08-2018-0035 ◽

2019 ◽

Vol 12 (1) ◽

pp. 134-150 ◽

Cited By ~ 3

Author(s):

Michael Mayer ◽

Steven C. Bourassa ◽

Martin Hoesli ◽

Donato Scognamiglio

Keyword(s):

Robust Regression ◽

Machine Learning Techniques ◽

Estimation Methods ◽

Gradient Boosting ◽

Single Family ◽

Data Set ◽

Content Type ◽

Hedonic Valuation ◽

Boosting Method ◽

Rich Data

Purpose The purpose of this paper is to investigate the accuracy and volatility of different methods for estimating and updating hedonic valuation models. Design/methodology/approach The authors apply six estimation methods (linear least squares, robust regression, mixed-effects regression, random forests, gradient boosting and neural networks) and two updating methods (moving and extending windows). They use a large and rich data set consisting of over 123,000 single-family houses sold in Switzerland between 2005 and 2017. Findings The gradient boosting method yields the greatest accuracy, while the robust method provides the least volatile predictions. There is a clear trade-off across methods depending on whether the goal is to improve accuracy or avoid volatility. The choice between moving and extending windows has only a modest effect on the results. Originality/value This paper compares a range of linear and machine learning techniques in the context of moving or extending window scenarios that are used in practice but which have not been considered in prior research. The techniques include robust regression, which has not previously been used in this context. The data updating allows for analysis of the volatility in addition to the accuracy of predictions. The results should prove useful in improving hedonic models used by property tax assessors, mortgage underwriters, valuation firms and regulatory authorities.

Download Full-text

GRADIENT BOOSTING METHOD APPLICATION TO SUPPORT PROCESS DECISIONS IN THE ELECTRON-BEAM WELDING PROCESS

Siberian Journal of Science and Technology ◽

10.31772/2587-6066-2020-21-2-206-214 ◽

2020 ◽

Vol 21 (2) ◽

pp. 206-214

Author(s):

V. S. Tynchenko ◽

◽

I. A. Golovenok ◽

V. E. Petrenko ◽

A. V. Milov ◽

...

Keyword(s):

Electron Beam ◽

Electron Beam Welding ◽

Welding Process ◽

Gradient Boosting ◽

Boosting Method

Download Full-text

A New Approach for the Analysis of Mixture Toxicity Data

Water Science & Technology ◽

10.2166/wst.1992.0733 ◽

1992 ◽

Vol 26 (9-11) ◽

pp. 2345-2348 ◽

Cited By ~ 2

Author(s):

C. N. Haas

Keyword(s):

Quantitative Analysis ◽

Mixture Toxicity ◽

New Method ◽

Metal Exposure ◽

Positive Interactions ◽

Toxicity Data ◽

Data Set ◽

New Approach ◽

Negative Interactions

A new method for the quantitative analysis of multiple toxicity data is described and illustrated using a data set on metal exposure to copepods. Positive interactions are observed for Ni-Pb and Pb-Cr, with weak negative interactions observed for Ni-Cr.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text

Correlation between the structure and skin permeability of compounds

Scientific Reports ◽

10.1038/s41598-021-89587-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ruolan Zeng ◽

Jiyong Deng ◽

Limin Dang ◽

Xinliang Yu

Keyword(s):

Large Data ◽

Qsar Model ◽

Coefficient Of Determination ◽

Support Vector ◽

Skin Permeability ◽

Data Set ◽

Test Set ◽

Svm Algorithm ◽

Svm Model ◽

Toxicity Relationship

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Wheat Grinding Process with Low Moisture Content: A New Approach for Wholemeal Flour Production

Processes ◽

10.3390/pr9010032 ◽

2020 ◽

Vol 9 (1) ◽

pp. 32

Author(s):

Waleed H. Hassoon ◽

Dariusz Dziki ◽

Antoni Miś ◽

Beata Biernacka

Keyword(s):

Particle Size ◽

Moisture Content ◽

Average Particle Size ◽

Average Particle ◽

Grinding Process ◽

Mixing Properties ◽

New Approach ◽

Grain Moisture ◽

Flour Production ◽

And Storage

The objective of this study was to determine the grinding characteristics of wheat with a low moisture content. Two kinds of wheat—soft spelt wheat and hard Khorasan wheat—were dried at 45 °C to reduce the moisture content from 12% to 5% (wet basis). Air drying at 45 °C and storage in a climatic chamber (45 °C, 10% relative humidity) were the methods used for grain dehydration. The grinding process was carried out using a knife mill. After grinding, the particle size distribution, average particle size and grinding energy indices were determined. In addition, the dough mixing properties of wholemeal flour dough were studied using a farinograph. It was observed that decreasing the moisture content in wheat grains from 12% to 5% made the grinding process more effective. As a result, the average particle size of the ground material was decreased. This effect was found in both soft and hard wheat. Importantly, lowering the grain moisture led to about a twofold decrease in the required grinding energy. Moreover, the flour obtained from the dried grains showed higher water absorption and higher dough stability during mixing. However, the method of grain dehydration had little or no effect on the results of the grinding process or dough properties.

Download Full-text

Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03658-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Toktam Khatibi ◽

Elham Hanifi ◽

Mohammad Mehdi Sepehri ◽

Leila Allahqoli

Keyword(s):

Machine Learning ◽

External Validation ◽

Fetal Loss ◽

Null Distribution ◽

Training Dataset ◽

Gradient Boosting ◽

Support Vector ◽

Cross Sectional ◽

Boosting Method ◽

Demographic Features

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.

Download Full-text