scholarly journals Statistical model building, machine learning, and the ah-ha moment

2019 ◽  
Author(s):  
Qiannan Duan ◽  
Jianchao Lee ◽  
Jinhong Gao ◽  
Jiayuan Chen ◽  
Yachao Lian ◽  
...  

<p>Machine learning (ML) has brought significant technological innovations in many fields, but it has not been widely embraced by most researchers of natural sciences to date. Traditional understanding and promotion of chemical analysis cannot meet the definition and requirement of big data for running of ML. Over the years, we focused on building a more versatile and low-cost approach to the acquisition of copious amounts of data containing in a chemical reaction. The generated data meet exclusively the thirst of ML when swimming in the vast space of chemical effect. As proof in this study, we carried out a case for acute toxicity test throughout the whole routine, from model building, chip preparation, data collection, and ML training. Such a strategy will probably play an important role in connecting ML with much research in natural science in the future.</p>


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


Author(s):  
Wei-Chun Wang ◽  
Ting-Yu Lin ◽  
Sherry Yueh-Hsia Chiu ◽  
Chiung-Nien Chen ◽  
Pongdech Sarakarn ◽  
...  

2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen &kappa;. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


2021 ◽  
Author(s):  
Tao Lin ◽  
Mokhles Mezghani ◽  
Chicheng Xu ◽  
Weichang Li

Abstract Reservoir characterization requires accurate prediction of multiple petrophysical properties such as bulk density (or acoustic impedance), porosity, and permeability. However, it remains a big challenge in heterogeneous reservoirs due to significant diagenetic impacts including dissolution, dolomitization, cementation, and fracturing. Most well logs lack the resolution to obtain rock properties in detail in a heterogenous formation. Therefore, it is pertinent to integrate core images into the prediction workflow. This study presents a new approach to solve the problem of obtaining the high-resolution multiple petrophysical properties, by combining machine learning (ML) algorithms and computer vision (CV) techniques. The methodology can be used to automate the process of core data analysis with a minimum number of plugs, thus reducing human effort and cost and improving accuracy. The workflow consists of conditioning and extracting features from core images, correlating well logs and core analysis with those features to build ML models, and applying the models on new cores for petrophysical properties predictions. The core images are preprocessed and analyzed using color models and texture recognition, to extract image characteristics and core textures. The image features are then aggregated into a profile in depth, resampled and aligned with well logs and core analysis. The ML regression models, including classification and regression trees (CART) and deep neural network (DNN), are trained and validated from the filtered training samples of relevant features and target petrophysical properties. The models are then tested on a blind test dataset to evaluate the prediction performance, to predict target petrophysical properties of grain density, porosity and permeability. The profile of histograms of each target property are computed to analyze the data distribution. The feature vectors are extracted from CV analysis of core images and gamma ray logs. The importance of each feature is generated by CART model to individual target, which may be used to reduce model complexity of future model building. The model performances are evaluated and compared on each target. We achieved reasonably good correlation and accuracy on the models, for example, porosity R2=49.7% and RMSE=2.4 p.u., and logarithmic permeability R2=57.8% and RMSE=0.53. The field case demonstrates that inclusion of core image attributes can improve petrophysical regression in heterogenous reservoirs. It can be extended to a multi-well setting to generate vertical distribution of petrophysical properties which can be integrated into reservoir modeling and characterization. Machine leaning algorithms can help automate the workflow and be flexible to be adjusted to take various inputs for prediction.


Sign in / Sign up

Export Citation Format

Share Document