Machine learning based recursive partitioning for simplifying OPC model building complexity

Author(s):  
Apoorva Oak ◽  
Soobin Hwang ◽  
Ruoxia Chen ◽  
Shinill Kang ◽  
Ryan Ryoung-Han Kim
2019 ◽  
Author(s):  
Qiannan Duan ◽  
Jianchao Lee ◽  
Jinhong Gao ◽  
Jiayuan Chen ◽  
Yachao Lian ◽  
...  

<p>Machine learning (ML) has brought significant technological innovations in many fields, but it has not been widely embraced by most researchers of natural sciences to date. Traditional understanding and promotion of chemical analysis cannot meet the definition and requirement of big data for running of ML. Over the years, we focused on building a more versatile and low-cost approach to the acquisition of copious amounts of data containing in a chemical reaction. The generated data meet exclusively the thirst of ML when swimming in the vast space of chemical effect. As proof in this study, we carried out a case for acute toxicity test throughout the whole routine, from model building, chip preparation, data collection, and ML training. Such a strategy will probably play an important role in connecting ML with much research in natural science in the future.</p>


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen &kappa;. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


2021 ◽  
Author(s):  
Tao Lin ◽  
Mokhles Mezghani ◽  
Chicheng Xu ◽  
Weichang Li

Abstract Reservoir characterization requires accurate prediction of multiple petrophysical properties such as bulk density (or acoustic impedance), porosity, and permeability. However, it remains a big challenge in heterogeneous reservoirs due to significant diagenetic impacts including dissolution, dolomitization, cementation, and fracturing. Most well logs lack the resolution to obtain rock properties in detail in a heterogenous formation. Therefore, it is pertinent to integrate core images into the prediction workflow. This study presents a new approach to solve the problem of obtaining the high-resolution multiple petrophysical properties, by combining machine learning (ML) algorithms and computer vision (CV) techniques. The methodology can be used to automate the process of core data analysis with a minimum number of plugs, thus reducing human effort and cost and improving accuracy. The workflow consists of conditioning and extracting features from core images, correlating well logs and core analysis with those features to build ML models, and applying the models on new cores for petrophysical properties predictions. The core images are preprocessed and analyzed using color models and texture recognition, to extract image characteristics and core textures. The image features are then aggregated into a profile in depth, resampled and aligned with well logs and core analysis. The ML regression models, including classification and regression trees (CART) and deep neural network (DNN), are trained and validated from the filtered training samples of relevant features and target petrophysical properties. The models are then tested on a blind test dataset to evaluate the prediction performance, to predict target petrophysical properties of grain density, porosity and permeability. The profile of histograms of each target property are computed to analyze the data distribution. The feature vectors are extracted from CV analysis of core images and gamma ray logs. The importance of each feature is generated by CART model to individual target, which may be used to reduce model complexity of future model building. The model performances are evaluated and compared on each target. We achieved reasonably good correlation and accuracy on the models, for example, porosity R2=49.7% and RMSE=2.4 p.u., and logarithmic permeability R2=57.8% and RMSE=0.53. The field case demonstrates that inclusion of core image attributes can improve petrophysical regression in heterogenous reservoirs. It can be extended to a multi-well setting to generate vertical distribution of petrophysical properties which can be integrated into reservoir modeling and characterization. Machine leaning algorithms can help automate the workflow and be flexible to be adjusted to take various inputs for prediction.


2021 ◽  
Author(s):  
Haibin Di ◽  
Chakib Kada Kloucha ◽  
Cen Li ◽  
Aria Abubakar ◽  
Zhun Li ◽  
...  

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.


2021 ◽  
Vol 73 (03) ◽  
pp. 25-30
Author(s):  
Srikanta Mishra ◽  
Jared Schuetter ◽  
Akhil Datta-Gupta ◽  
Grant Bromhal

Algorithms are taking over the world, or so we are led to believe, given their growing pervasiveness in multiple fields of human endeavor such as consumer marketing, finance, design and manufacturing, health care, politics, sports, etc. The focus of this article is to examine where things stand in regard to the application of these techniques for managing subsurface energy resources in domains such as conventional and unconventional oil and gas, geologic carbon sequestration, and geothermal energy. It is useful to start with some definitions to establish a common vocabulary. Data analytics (DA)—Sophisticated data collection and analysis to understand and model hidden patterns and relationships in complex, multivariate data sets Machine learning (ML)—Building a model between predictors and response, where an algorithm (often a black box) is used to infer the underlying input/output relationship from the data Artificial intelligence (AI)—Applying a predictive model with new data to make decisions without human intervention (and with the possibility of feedback for model updating) Thus, DA can be thought of as a broad framework that helps determine what happened (descriptive analytics), why it happened (diagnostic analytics), what will happen (predictive analytics), or how can we make something happen (prescriptive analytics) (Sankaran et al. 2019). Although DA is built upon a foundation of classical statistics and optimization, it has increasingly come to rely upon ML, especially for predictive and prescriptive analytics (Donoho 2017). While the terms DA, ML, and AI are often used interchangeably, it is important to recognize that ML is basically a subset of DA and a core enabling element of the broader application for the decision-making construct that is AI. In recent years, there has been a proliferation in studies using ML for predictive analytics in the context of subsurface energy resources. Consider how the number of papers on ML in the OnePetro database has been increasing exponentially since 1990 (Fig. 1). These trends are also reflected in the number of technical sessions devoted to ML/AI topics in conferences organized by SPE, AAPG, and SEG among others; as wells as books targeted to practitioners in these professions (Holdaway 2014; Mishra and Datta-Gupta 2017; Mohaghegh 2017; Misra et al. 2019). Given these high levels of activity, our goal is to provide some observations and recommendations on the practice of data-driven model building using ML techniques. The observations are motivated by our belief that some geoscientists and petroleum engineers may be jumping the gun by applying these techniques in an ad hoc manner without any foundational understanding, whereas others may be holding off on using these methods because they do not have any formal ML training and could benefit from some concrete advice on the subject. The recommendations are conditioned by our experience in applying both conventional statistical modeling and data analytics approaches to practical problems.


2022 ◽  
pp. 27-50
Author(s):  
Rajalaxmi Prabhu B. ◽  
Seema S.

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.


Author(s):  
Mosammat Tahnin Tariq ◽  
Aidin Massahi ◽  
Rajib Saha ◽  
Mohammed Hadi

Events such as surges in demand or lane blockages can create queue spillbacks even during off-peak periods, resulting in delays and spillbacks to upstream intersections. To address this issue, some transportation agencies have started implementing processes to change signal timings in real time based on traffic signal engineers’ observations of incident and traffic conditions at the intersections upstream and downstream of the congested locations. Decisions to change the signal timing are governed by many factors, such as queue length, conditions of the main and side streets, potential of traffic spilling back to upstream intersections, the importance of upstream cross streets, and the potential of the queue backing up to a freeway ramp. This paper investigates and assesses automating the process of updating the signal timing plans during non-recurrent conditions by capturing the history of the responses of the traffic signal engineers to non-recurrent conditions and utilizing this experience to train a machine learning model. A combination of recursive partitioning and regression decision tree (RPART) and fuzzy rule-based system (FRBS) is utilized in this study to deal with the vagueness and uncertainty of human decisions. Comparing the decisions made based on the resulting fuzzy rules from applying the methodology with previously recorded expert decisions for a project case study indicates accurate recommendations for shifts in the green phases of traffic signals. The simulation results indicate that changing the green times based on the output of the fuzzy rules decreased delays caused by lane blockages or demand surge.


Sign in / Sign up

Export Citation Format

Share Document