Comparison of Machine Learning Classifiers for Accurate Prediction of Real-Time Stuck Pipe Incidents

Javed Akbar Khan; Muhammad Irfan; Sonny Irawan; Fong Kam Yao; Md Shokor Abdul Rahaman; Ahmad Radzi Shahari; Adam Glowacz; Nazia Zeb

doi:10.3390/en13143683

Comparison of Machine Learning Classifiers for Accurate Prediction of Real-Time Stuck Pipe Incidents

Energies ◽

10.3390/en13143683 ◽

2020 ◽

Vol 13 (14) ◽

pp. 3683 ◽

Cited By ~ 1

Author(s):

Javed Akbar Khan ◽

Muhammad Irfan ◽

Sonny Irawan ◽

Fong Kam Yao ◽

Md Shokor Abdul Rahaman ◽

...

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Sensitivity And Specificity ◽

Activation Function ◽

Ann Model ◽

Learning Models ◽

Drilling Operation ◽

Svm Model ◽

Drilling Operations ◽

Machine Learning Models

Stuck pipe incidents are one of the contributors to non-productive time (NPT), where they can result in a higher well cost. This research investigates the feasibility of applying machine learning to predict events of stuck pipes during drilling operations in petroleum fields. The predictive model aims to predict the occurrence of stuck pipes so that relevant drilling operation personnel are warned to enact a mitigation plan to prevent stuck pipes. Two machine learning methodologies were studied in this research, namely, the artificial neural network (ANN) and support vector machine (SVM). A total of 268 data sets were successfully collected through data extraction for the well drilling operation. The data also consist of the parameters with which the stuck pipes occurred during the drilling operations. These drilling parameters include information such as the properties of the drilling fluid, bottom-hole assembly (BHA) specification, state of the bore-hole and operating conditions. The R programming software was used to construct both the ANN and SVM machine learning models. The prediction performance of the machine learning models was evaluated in terms of accuracy, sensitivity and specificity. Sensitivity analysis was conducted on these two machine learning models. For the ANN, two activation functions—namely, the logistic activation function and hyperbolic tangent activation function—were tested. Additionally, all the possible combinations of network structures, from [19, 1, 1, 1, 1] to [19, 10, 10, 10, 1], were tested for each activation function. For the SVM, three kernel functions—namely, linear, Radial Basis Function (RBF) and polynomial—were tested. Apart from that, SVM hyper-parameters such as the regularization factor (C), sigma (σ) and degree (D) were used in sensitivity analysis as well. The results from the sensitivity analysis demonstrate that the best ANN model managed to achieve an 88.89% accuracy, 91.89% sensitivity and 86.36% specificity, whereas the best SVM model managed to achieve an 83.95% accuracy, 86.49% sensitivity and 81.82% specificity. Upon comparison, the ANN model is the better machine learning model in this study because its accuracy, sensitivity and specificity are consistently higher than those of the best SVM model. In conclusion, judging from the promising prediction accurateness as demonstrated in the results of this study, it is suggested that stuck pipe prediction using machine learning is indeed practical.

Download Full-text

3132 Machine Learning for Prediction of Pathologic Pneumatosis Intestinalis Using CT Scans

Journal of Clinical and Translational Science ◽

10.1017/cts.2019.142 ◽

2019 ◽

Vol 3 (s1) ◽

pp. 60-61

Author(s):

Kadie Clancy ◽

Esmaeel Dadashzadeh ◽

Christof Kaltenmeier ◽

JB Moses ◽

Shandong Wu

Keyword(s):

Machine Learning ◽

Decision Making ◽

Pneumatosis Intestinalis ◽

Learning Models ◽

Surgical Decision ◽

Prediction Task ◽

Svm Model ◽

Surgical Decision Making ◽

Intraoperative Visualization ◽

Machine Learning Models

OBJECTIVES/SPECIFIC AIMS: This retrospective study aims to create and train machine learning models using a radiomic-based feature extraction method for two classification tasks: benign vs. pathologic PI and operation of benefit vs. operation not needed. The long-term goal of our study is to build a computerized model that incorporates both radiomic features and critical non-imaging clinical factors to improve current surgical decision-making when managing PI patients. METHODS/STUDY POPULATION: Searched radiology reports from 2010-2012 via the UPMC MARS Database for reports containing the term “pneumatosis” (subsequently accounting for negations and age restrictions). Our inclusion criteria included: patient age 18 or older, clinical data available at time of CT diagnosis, and PI visualized on manual review of imaging. Cases with intra-abdominal free air were excluded. Collected CT imaging data and an additional 149 clinical data elements per patient for a total of 75 PI cases. Data collection of an additional 225 patients is ongoing. We trained models for two clinically-relevant prediction tasks. The first (referred to as prediction task 1) classifies between benign and pathologic PI. Benign PI is defined as either lack of intraoperative visualization of transmural intestinal necrosis or successful non-operative management until discharge. Pathologic PI is defined as either intraoperative visualization of transmural PI or withdrawal of care and subsequent death during hospitalization. The distribution of data samples for prediction task 1 is 47 benign cases and 38 pathologic cases. The second (referred to as prediction task 2) classifies between whether the patient benefitted from an operation or not. “Operation of benefit” is defined as patients with PI, be it transmural or simply mucosal, who benefited from an operation. “Operation not needed” is defined as patients who were safely discharged without an operation or patients who had an operation, but nothing was found. The distribution of data samples for prediction task 2 is 37 operation not needed cases and 38 operation of benefit cases. An experienced surgical resident from UPMC manually segmented 3D PI ROIs from the CT scans (5 mm Axial cut) for each case. The most concerning ~10-15 cm segment of bowel for necrosis with a 1 cm margin was selected. A total of 7 slices per patient were segmented for consistency. For both prediction task 1 and prediction task 2, we independently completed the following procedure for testing and training: 1.) Extracted radiomic features from the 3D PI ROIs that resulted in 99 total features. 2.) Used LASSO feature selection to determine the subset of the original 99 features that are most significant for performance of the prediction task. 3.) Used leave-one-out cross-validation for testing and training to account for the small dataset size in our preliminary analysis. Implemented and trained several machine learning models (AdaBoost, SVM, and Naive Bayes). 4.) Evaluated the trained models in terms of AUC and Accuracy and determined the ideal model structure based on these performance metrics. RESULTS/ANTICIPATED RESULTS: Prediction Task 1: The top-performing model for this task was an SVM model trained using 19 features. This model had an AUC of 0.79 and an accuracy of 75%. Prediction Task 2: The top-performing model for this task was an SVM model trained using 28 features. This model had an AUC of 0.74 and an accuracy of 64%. DISCUSSION/SIGNIFICANCE OF IMPACT: To the best of our knowledge, this is the first study to use radiomic-based machine learning models for the prediction of tissue ischemia, specifically intestinal ischemia in the setting of PI. In this preliminary study, which serves as a proof of concept, the performance of our models has demonstrated the potential of machine learning based only on radiomic imaging features to have discriminative power for surgical decision-making problems. While many non-imaging-related clinical factors play a role in the gestalt of clinical decision making when PI presents, we have presented radiomic-based models that may augment this decision-making process, especially for more difficult cases when clinical features indicating acute abdomen are absent. It should be noted that prediction task 2, whether or not a patient presenting with PI would benefit from an operation, has lower performance than prediction task 1 and is also a more challenging task for physicians in real clinical environments. While our results are promising and demonstrate potential, we are currently working to increase our dataset to 300 patients to further train and assess our models. References DuBose, Joseph J., et al. “Pneumatosis Intestinalis Predictive Evaluation Study (PIPES): a multicenter epidemiologic study of the Eastern Association for the Surgery of Trauma.” Journal of Trauma and Acute Care Surgery 75.1 (2013): 15-23. Knechtle, Stuart J., Andrew M. Davidoff, and Reed P. Rice. “Pneumatosis intestinalis. Surgical management and clinical outcome.” Annals of Surgery 212.2 (1990): 160.

Download Full-text

AB0652 MACHINE LEARNING TO PREDICT EARLY TNF INHIBITOR USERS IN PATIENTS WITH ANKYLOSING SPONDYLITIS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3743 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1620.1-1621

Author(s):

J. Lee ◽

H. Kim ◽

S. Y. Kang ◽

S. Lee ◽

Y. H. Eun ◽

...

Keyword(s):

Machine Learning ◽

Ankylosing Spondylitis ◽

Tnf Inhibitors ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

Background:Tumor necrosis factor (TNF) inhibitors are important drugs in treating patients with ankylosing spondylitis (AS). However, they are not used as a first-line treatment for AS. There is an insufficient treatment response to the first-line treatment, non-steroidal anti-inflammatory drugs (NSAIDs), in over 40% of patients. If we can predict who will need TNF inhibitors at an earlier phase, adequate treatment can be provided at an appropriate time and potential damages can be avoided. There is no precise predictive model at present. Recently, various machine learning methods show great performances in predictions using clinical data.Objectives:We aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis.Methods:The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early TNF inhibitor users treated by TNF inhibitors within six months of their follow-up (early-TNF users), and the others (non-early-TNF users). Machine learning models were formulated to predict the early-TNF users using the baseline data. Additionally, feature importance analysis was performed to delineate significant baseline characteristics.Results:The numbers of early-TNF and non-early-TNF users were 90 and 509, respectively. The best performing ANN model utilized 3 hidden layers with 50 hidden nodes each; its performance (area under curve (AUC) = 0.75) was superior to logistic regression model, support vector machine, and random forest model (AUC = 0.72, 0.65, and 0.71, respectively) in predicting early-TNF users. Feature importance analysis revealed erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), and height as the top significant baseline characteristics for predicting early-TNF users. Among these characteristics, height was revealed by machine learning models but not by conventional statistical techniques.Conclusion:Our model displayed superior performance in predicting early TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.Disclosure of Interests:None declared

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

A Sensitivity Analysis of Poisoning and Evasion Attacks in Network Intrusion Detection System Machine Learning Models

10.1109/milcom52596.2021.9652959 ◽

2021 ◽

Author(s):

Kevin Talty ◽

John Stockdale ◽

Nathaniel D. Bastian

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Network Intrusion Detection ◽

Learning Models ◽

Network Intrusion ◽

Network Intrusion Detection System ◽

Machine Learning Models

Download Full-text

Machine Learning Models for Predicting Hearing Prognosis in Unilateral Idiopathic Sudden Sensorineural Hearing Loss

Clinical and Experimental Otorhinolaryngology ◽

10.21053/ceo.2019.01858 ◽

2020 ◽

Vol 13 (2) ◽

pp. 148-156

Author(s):

Keon Vin Park ◽

Kyoung Ho Oh ◽

Yong Jun Jeong ◽

Jihye Rhee ◽

Mun Soo Han ◽

...

Keyword(s):

Machine Learning ◽

Hearing Loss ◽

Sensorineural Hearing Loss ◽

Sudden Sensorineural Hearing Loss ◽

Sensorineural Hearing ◽

Statistical Hypothesis ◽

Support Vector ◽

Learning Models ◽

Svm Model ◽

Machine Learning Models

Objectives. Prognosticating idiopathic sudden sensorineural hearing loss (ISSNHL) is an important challenge. In our study, a dataset was split into training and test sets and cross-validation was implemented on the training set, thereby determining the hyperparameters for machine learning models with high test accuracy and low bias. The effectiveness of the following five machine learning models for predicting the hearing prognosis in patients with ISSNHL after 1 month of treatment was assessed: adaptive boosting, K-nearest neighbor, multilayer perceptron, random forest (RF), and support vector machine (SVM).Methods. The medical records of 523 patients with ISSNHL admitted to Korea University Ansan Hospital between January 2010 and October 2017 were retrospectively reviewed. In this study, we analyzed data from 227 patients (recovery, 106; no recovery, 121) after excluding those with missing data. To determine risk factors, statistical hypothesis tests (e.g., the two-sample <i>t</i>-test for continuous variables and the chi-square test for categorical variables) were conducted to compare patients who did or did not recover. Variables were selected using an RF model depending on two criteria (mean decreases in the Gini index and accuracy).Results. The SVM model using selected predictors achieved both the highest accuracy (75.36%) and the highest F-score (0.74) on the test set. The RF model with selected variables demonstrated the second-highest accuracy (73.91%) and F-score (0.74). The RF model with the original variables showed the same accuracy (73.91%) as that of the RF model with selected variables, but a lower F-score (0.73). All the tested models, except RF, demonstrated better performance after variable selection based on RF.Conclusion. The SVM model with selected predictors was the best-performing of the tested prediction models. The RF model with selected predictors was the second-best model. Therefore, machine learning models can be used to predict hearing recovery in patients with ISSNHL.

Download Full-text

Data-driven sensitivity analysis of complex machine learning models: A case study of directional drilling

Journal of Petroleum Science and Engineering ◽

10.1016/j.petrol.2020.107630 ◽

2020 ◽

Vol 195 ◽

pp. 107630

Author(s):

Andrzej T. Tunkiel ◽

Dan Sui ◽

Tomasz Wiktorski

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Data Driven ◽

Directional Drilling ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis

Scientific Reports ◽

10.1038/s41598-020-75352-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Seulkee Lee ◽

Yeonghee Eun ◽

Hyungjin Kim ◽

Hoon-Suk Cha ◽

Eun-Mi Koh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.

Download Full-text

A Support Vector Machine Model with Hyperparameters Optimised by Mind Evolutionary Algorithm for Assessing Permeability of Rock

Advances in Civil Engineering ◽

10.1155/2020/4718493 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Wenjin Zhu ◽

Zhiming Chao ◽

Guotao Ma

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Evolutionary Algorithm ◽

Predictive Accuracy ◽

Support Vector ◽

Particle Swarm Algorithm ◽

Learning Models ◽

Machine Model ◽

Svm Model ◽

Machine Learning Models

In this paper, a database developed from the existing literature about permeability of rock was established. Based on the constructed database, a Support Vector Machine (SVM) model with hyperparameters optimised by Mind Evolutionary Algorithm (MEA) was proposed to predict the permeability of rock. Meanwhile, the Genetic Algorithm- (GA-) and Particle Swarm Algorithm- (PSO-) SVM models were constructed to compare the improving effects of MEA on the foretelling accuracy of machine learning models with those of GA and PSO, respectively. The following conclusions were drawn. MEA can increase the predictive accuracy of the constructed machine learning models remarkably in a few iteration times, which has better optimisation performance than that of GA and PSO. MEA-SVM has the best forecasting performance, followed by PSO-SVM, while the estimating precision of GA-SVM is lower than them. The proposed MEA-SVM model can accurately predict the permeability of rock indicating the model having a satisfactory generalization and extrapolation capacity.

Download Full-text

Risk Assessment in Energy Infrastructure Installations by Horizontal Directional Drilling Using Machine Learning

Energies ◽

10.3390/en14020289 ◽

2021 ◽

Vol 14 (2) ◽

pp. 289

Author(s):

Maria Krechowicz ◽

Adam Krechowicz

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Assessment Process ◽

Future Research ◽

Directional Drilling ◽

Ann Model ◽

Learning Models ◽

Horizontal Directional Drilling ◽

Risk Assessment Process ◽

Machine Learning Models

Nowadays we can observe a growing demand for installations of new gas pipelines in Europe. A large number of them are installed using trenchless Horizontal Directional Drilling (HDD) technology. The aim of this work was to develop and compare new machine learning models dedicated for risk assessment in HDD projects. The data from 133 HDD projects from eight countries of the world were gathered, profiled, and preprocessed. Three machine learning models, logistic regression, random forests, and Artificial Neural Network (ANN), were developed to predict the overall HDD project outcome (failure free installation or installation likely to fail), and the occurrence of identified unwanted events. The best performance in terms of recall and accuracy was achieved for the developed ANN model, which proved to be efficient, fast and robust in predicting risks in HDD projects. Machine learning applications in the proposed models enabled eliminating the involvement of a group of experts in the risk assessment process and therefore significantly lower the costs associated with the risk assessment process. Future research may be oriented towards developing a comprehensive risk management system, which will enable dynamic risk assessment taking into account various combinations of risk mitigation actions.

Download Full-text

Detecting Myocardial Infarction by Electrocardiogram Machine Learning Models With Greater Accuracy; A Technical Advance Article

10.21203/rs.3.rs-150700/v1 ◽

2021 ◽

Author(s):

M.D.S. Sudaraka ◽

I. Abeyagunawardena ◽

E. S. De Silva ◽

S Abeyagunawardena

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Random Forest ◽

Decision Trees ◽

Sensitivity And Specificity ◽

Random Forest Model ◽

Multi Layer Perceptron ◽

Learning Models ◽

Forest Model ◽

Machine Learning Models

Abstract BackgroundElectrocardiogram (ECG) is a key diagnostic test in cardiac investigation. Interpretation of ECG is based on the understanding of normal electrical patterns produced by the heart and alterations of those patterns in specific disease conditions. With machine learning techniques, it is possible to interpret ECGs with increased accuracy. However, there is a lacuna in machine learning models to detect myocardial infarction (MI) coupled with the affected territories of the heart. MethodsThe dataset was obtained from the University of California, Irvine, Machine Learning Repository. It was filtered to obtain observations categorized as Normal, Ischemic changes, Old Anterior MI and Old Inferior MI. The dataset was randomly split into a training set (70%) and a test set (30%). 73 out of the 270 ECG features were selected based on the changes observed following MI, after excluding predictors that had near zero variance across the observations. Three machine learning classification models (Bootstrap Aggregation Decision Trees, Random Forest, Multi-layer Perceptron) were trained using the training dataset, optimizing for the Kappa statistic and the parameter tuning was achieved with repeated 10-fold cross validation. Accuracy and Kappa of the samples were used to evaluate performance between the models. ResultsThe Random Forest model identified old anterior and old inferior MIs with 100% sensitivity and specificity and all 4 categorized observations with an overall accuracy of 0.9167 (95% CI 0.8424 - 0.9633). Both the Bootstrap Aggregation Decision Trees and the Multi-layer Perceptron models identified old anterior MIs with 100% sensitivity and specificity and their overall accuracies for all 4 observations were 0.8958 (95% CI 0.8168 - 0.9489) and 0.8542 (95% CI 0.7674 - 0.9179) respectively.Conclusion With a medically informed feature selection we were able to identify old anterior MI with 100% sensitivity and specificity by all three models in this study, and old inferior MI with 100% sensitivity and specificity by Random Forest Model. If the data set can be improved it is possible to utilize these machine learning models in hospital setting to identify cardiac emergencies by incorporating them into cardiac monitors, until trained personnel become available.

Download Full-text