scholarly journals RNAPosers: Machine Learning Classifiers For RNA-Ligand Poses

2019 ◽  
Author(s):  
Sahil Chhabra ◽  
Jingru Xie ◽  
Aaron T. Frank

ABSTRACTDetermining the 3-dimensional (3D) structures of ribonucleic acid (RNA)-small molecule complexes is critical to understanding molecular recognition in RNA. Computer docking can, in principle, be used to predict the 3D structure of RNA-small molecule complexes. Unfortunately, retrospective analysis has shown that the scoring functions that are typically used to rank poses tend to misclassify non-native poses as native, and vice versa. This misclassification of non-native poses severely limits the utility of computer docking in the context pose prediction, as well as in virtual screening. Here, we use machine learning to train a set of pose classifiers that estimate the relative “nativeness” of a set of RNA-ligand poses. At the heart of our approach is the use of a pose “fingerprint” that is a composite of a set of atomic fingerprints, which individually encode the local “RNA environment” around ligand atoms. We found that by ranking poses based on the classification scores from our machine learning classifiers, we were able to recover native-like poses better than when we ranked poses based on their docking scores. With a leave-one-out training and testing approach, we found that one of our classifiers could recover poses that were within 2.5 Å of the native poses in ∼80% of the 88 cases we examined, and similarly, on a separate validation set, we could recover such poses in ∼70% of the cases. Our set of classifiers, which we refer to as RNAPosers, should find utility as a tool to aid in RNA-ligand pose prediction and so we make RNAPosers open to the academic community via https://github.com/atfrank/RNAPosers.

BMJ Open ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. e043964
Author(s):  
Vishal Sharma ◽  
Vinaykumar Kulkarni ◽  
Dean T Eurich ◽  
Luke Kumar ◽  
Salim Samanani

ObjectiveTo develop machine learning models employing administrative health data that can estimate risk of adverse outcomes within 30 days of an opioid dispensation for use by health departments or prescription monitoring programmes.Design, setting and participantsThis prognostic study was conducted in Alberta, Canada between 2017 and 2018. Participants included all patients 18 years of age and older who received at least one opioid dispensation. Pregnant and cancer patients were excluded.ExposureEach opioid dispensation served as an exposure.Main outcomes/measuresOpioid-related adverse outcomes were identified from linked administrative health data. Machine learning algorithms were trained using 2017 data to predict risk of hospitalisation, emergency department visit and mortality within 30 days of an opioid dispensation. Two validation sets, using 2017 and 2018 data, were used to evaluate model performance. Model discrimination and calibration performance were assessed for all patients and those at higher risk. Machine learning discrimination was compared with current opioid guidelines.ResultsParticipants in the 2017 training set (n=275 150) and validation set (n=117 829) had similar baseline characteristics. In the 2017 validation set, c-statistics for the XGBoost, logistic regression and neural network classifiers were 0.87, 0.87 and 0.80, respectively. In the 2018 validation set (n=393 023), the corresponding c-statistics were 0.88, 0.88 and 0.82. C-statistics from the Canadian guidelines ranged from 0.54 to 0.69 while the US guidelines ranged from 0.50 to 0.62. The top five percentile of predicted risk for the XGBoost and logistic regression classifiers captured 42% of all events and translated into post-test probabilities of 13.38% and 13.45%, respectively, up from the pretest probability of 1.6%.ConclusionMachine learning classifiers, especially incorporating hospitalisation/physician claims data, have better predictive performance compared with guideline or prescription history only approaches when predicting 30-day risk of adverse outcomes. Prescription monitoring programmes and health departments with access to administrative data can use machine learning classifiers to effectively identify those at higher risk compared with current guideline-based approaches.


2021 ◽  
Vol 13 (8) ◽  
pp. 1433
Author(s):  
Shobitha Shetty ◽  
Prasun Kumar Gupta ◽  
Mariana Belgiu ◽  
S. K. Srivastav

Machine learning classifiers are being increasingly used nowadays for Land Use and Land Cover (LULC) mapping from remote sensing images. However, arriving at the right choice of classifier requires understanding the main factors influencing their performance. The present study investigated firstly the effect of training sampling design on the classification results obtained by Random Forest (RF) classifier and, secondly, it compared its performance with other machine learning classifiers for LULC mapping using multi-temporal satellite remote sensing data and the Google Earth Engine (GEE) platform. We evaluated the impact of three sampling methods, namely Stratified Equal Random Sampling (SRS(Eq)), Stratified Proportional Random Sampling (SRS(Prop)), and Stratified Systematic Sampling (SSS) upon the classification results obtained by the RF trained LULC model. Our results showed that the SRS(Prop) method favors major classes while achieving good overall accuracy. The SRS(Eq) method provides good class-level accuracies, even for minority classes, whereas the SSS method performs well for areas with large intra-class variability. Toward evaluating the performance of machine learning classifiers, RF outperformed Classification and Regression Trees (CART), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) with a >95% confidence level. The performance of CART and SVM classifiers were found to be similar. RVM achieved good classification results with a limited number of training samples.


Author(s):  
Chunyan Ji ◽  
Thosini Bamunu Mudiyanselage ◽  
Yutong Gao ◽  
Yi Pan

AbstractThis paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal processing techniques, and machine learning classification methods. We introduce pre-processing approaches and describe a diversity of features such as MFCC, spectrogram, and fundamental frequency, etc. Both acoustic features and prosodic features extracted from different domains can discriminate frame-based signals from one another and can be used to train machine learning classifiers. Together with traditional machine learning classifiers such as KNN, SVM, and GMM, newly developed neural network architectures such as CNN and RNN are applied in infant cry research. We present some significant experimental results on pathological cry identification, cry reason classification, and cry sound detection with some typical databases. This survey systematically studies the previous research in all relevant areas of infant cry and provides an insight on the current cutting-edge works in infant cry signal analysis and classification. We also propose future research directions in data processing, feature extraction, and neural network classification fields to better understand, interpret, and process infant cry signals.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Joshua E. Lewis ◽  
Melissa L. Kemp

AbstractResistance to ionizing radiation, a first-line therapy for many cancers, is a major clinical challenge. Personalized prediction of tumor radiosensitivity is not currently implemented clinically due to insufficient accuracy of existing machine learning classifiers. Despite the acknowledged role of tumor metabolism in radiation response, metabolomics data is rarely collected in large multi-omics initiatives such as The Cancer Genome Atlas (TCGA) and consequently omitted from algorithm development. In this study, we circumvent the paucity of personalized metabolomics information by characterizing 915 TCGA patient tumors with genome-scale metabolic Flux Balance Analysis models generated from transcriptomic and genomic datasets. Metabolic biomarkers differentiating radiation-sensitive and -resistant tumors are predicted and experimentally validated, enabling integration of metabolic features with other multi-omics datasets into ensemble-based machine learning classifiers for radiation response. These multi-omics classifiers show improved classification accuracy, identify clinical patient subgroups, and demonstrate the utility of personalized blood-based metabolic biomarkers for radiation sensitivity. The integration of machine learning with genome-scale metabolic modeling represents a significant methodological advancement for identifying prognostic metabolite biomarkers and predicting radiosensitivity for individual patients.


Sign in / Sign up

Export Citation Format

Share Document