Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms

A feature-based hybrid recommender system for risk prediction : Machine learning approach (Preprint)

10.2196/preprints.11010 ◽

2020 ◽

Author(s):

Uzair Bhatti

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Predictive Accuracy ◽

Correct Diagnosis ◽

Recommendation Systems ◽

Data Integrity ◽

Machine Learning Algorithms ◽

Patient Counseling ◽

Hybrid Filtering ◽

Novel Algorithm

BACKGROUND In the era of health informatics, exponential growth of information generated by health information systems and healthcare organizations demands expert and intelligent recommendation systems. It has become one of the most valuable tools as it reduces problems such as information overload while selecting and suggesting doctors, hospitals, medicine, diagnosis etc according to patients’ interests. OBJECTIVE Recommendation uses Hybrid Filtering as one of the most popular approaches, but the major limitations of this approach are selectivity and data integrity issues.Mostly existing recommendation systems & risk prediction algorithms focus on a single domain, on the other end cross-domain hybrid filtering is able to alleviate the degree of selectivity and data integrity problems to a better extent. METHODS We propose a novel algorithm for recommendation & predictive model using KNN algorithm with machine learning algorithms and artificial intelligence (AI). We find the factors that directly impact on diseases and propose an approach for predicting the correct diagnosis of different diseases. We have constructed a series of models with good reliability for predicting different surgery complications and identified several novel clinical associations. We proposed a novel algorithm pr-KNN to use KNN for prediction and recommendation of diseases RESULTS Beside that we compared the performance of our algorithm with other machine algorithms and found better performance of our algorithm, with predictive accuracy improving by +3.61%. CONCLUSIONS The potential to directly integrate these predictive tools into EHRs may enable personalized medicine and decision-making at the point of care for patient counseling and as a teaching tool. CLINICALTRIAL dataset for the trials of patient attached

Download Full-text

Remote sensing inversion of water quality in coastal sea area based on machine learning: a case study of Shenzhen bay, China

10.5194/egusphere-egu21-1972 ◽

2021 ◽

Author(s):

Xiaotong Zhu ◽

Jinhui Jeanne Huang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Water Quality ◽

Predictive Accuracy ◽

Water Environment ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Support Vector ◽

Seawater Quality

Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R2 = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R2 = 0.75 and 0.78 respectively) while DL performs better in TDS (R2 =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.

Download Full-text

Predicting dengue importation into Europe, using machine learning and model-agnostic methods

10.1101/19013383 ◽

2019 ◽

Author(s):

Donald Salami ◽

Carla Alexandra Sousa ◽

Maria do Rosário Oliveira Martins ◽

César Capinha

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Predictive Accuracy ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Transport Network ◽

Air Transport ◽

Health Concern ◽

Centrality Measures ◽

Network Centrality

ABSTRACTThe geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation.Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale.Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions.We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.

Download Full-text

Lead-based virtual screening and prediction of EGFR inhibitors using PubChem’s database with data mining and machine learning algorithms

10.1021/scimeetings.0c03836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kedan He

Keyword(s):

Machine Learning ◽

Data Mining ◽

Virtual Screening ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Egfr Inhibitors

Download Full-text

Selecting Machine-Learning Scoring Functions for Structure-Based Virtual Screening

10.26434/chemrxiv.12967160 ◽

2020 ◽

Author(s):

Pedro Ballester

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Predictive Accuracy ◽

Scoring Function ◽

3D Models ◽

Large Datasets ◽

Scoring Functions ◽

Discovery Process ◽

Drug Discovery Process

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

Using machine learning algorithms to predict response and toxicity to immune checkpoint inhibitors (ICIs) in melanoma patients.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.2581 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 2581-2581 ◽

Cited By ~ 1

Author(s):

Paul Johannet ◽

Nicolas Coudray ◽

George Jour ◽

Douglas MacArthur Donnelly ◽

Shirin Bajaj ◽

...

Keyword(s):

Machine Learning ◽

Immune Checkpoint ◽

Immune Checkpoint Inhibitors ◽

Predictive Accuracy ◽

Checkpoint Inhibitors ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Operating Characteristics ◽

Melanoma Patients

2581 Background: There is growing interest in optimizing patient selection for treatment with immune checkpoint inhibitors (ICIs). We postulate that phenotypic features present in metastatic melanoma tissue reflect the biology of tumor cells, immune cells, and stromal tissue, and hence can provide predictive information about tumor behavior. Here, we test the hypothesis that machine learning algorithms can be trained to predict the likelihood of response and/or toxicity to ICIs. Methods: We examined 124 stage III/IV melanoma patients who received anti-CTLA-4 (n = 81), anti-PD-1 (n = 25), or combination (n = 18) therapy as first line. The tissue analyzed was resected before treatment with ICIs. In total, 340 H&E slides were digitized and annotated for three regions of interest: tumor, lymphocytes, and stroma. The slides were then partitioned into training (n = 285), validation (n = 26), and test (n = 29) sets. Slides were tiled (299x299 pixels) at 20X magnification. We trained a deep convolutional neural network (DCNN) to automatically segment the images into each of the three regions and then deconstruct images into their component features to detect non-obvious patterns with objectivity and reproducibility. We then trained the DCNN for two classifications: 1) complete/partial response versus progression of disease (POD), and 2) severe versus no immune-related adverse events (irAEs). Predictive accuracy was estimated by area under the curve (AUC) of receiver operating characteristics (ROC). Results: The DCNN identified tumor within LN with AUC 0.987 and within ST with AUC 0.943. Prediction of POD based on ST-only always performed better than prediction based on LN-only (AUC 0.84 compared to 0.61, respectively). The DCNN had an average AUC 0.69 when analyzing only tumor regions from both LN and ST data sets and AUC 0.68 when analyzing tumor and lymphocyte regions. Severe irAEs were predicted with limited accuracy (AUC 0.53). Conclusions: Our results support the potential application of machine learning on pre-treatment histologic slides to predict response to ICIs. It also revealed their limited value in predicting toxicity. We are currently investigating whether the predictive capability of the algorithm can be further improved by incorporating additional immunologic biomarkers.

Download Full-text

Amyloid PET-Positive Predictability of Machine Learning Algorithm Based on MDS-OAβ Levels

10.21203/rs.3.rs-578834/v1 ◽

2021 ◽

Author(s):

Young Chul Youn ◽

Jung-Min Pyun ◽

Hye Ryoun Kim ◽

Sungmin Kang ◽

Nayoung Ryoo ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Detection System ◽

Amyloid Β ◽

Machine Learning Algorithms ◽

Amyloid Pet ◽

Amyloid Positron Emission Tomography ◽

Positron Emission ◽

Negative Findings

Abstract Background: The Multimer Detection System-Oligomeric amyloid-β (MDS-OAβ) level is a valuable blood-based biomarker for Alzheimer’s disease (AD). We used machine learning algorithms trained using multi-center datasets to examine whether blood MDS-OAβ values can predict AD-associated changes in the brain.Methods: A logistic regression model using TensorFlow (ver. 2.3.0) was applied to data obtained from 163 participants (amyloid positron emission tomography [PET]-positive and -negative findings in 102 and 61 participants, respectively). Algorithms with various combinations of features (MDS-OAβ levels, age, gender, and anticoagulant type) were tested 50 times on each dataset. Results: The predictive accuracy, sensitivity, and specificity values of blood MDS-OAβ levels for amyloid PET positivity were 78.16±4.97%, 83.87±9.40%, and 70.00±13.13%, respectively.Conclusions: The findings from this multi-center machine learning-based study suggest that MDS-OAβ values may be used to predict amyloid PET-positivity.

Download Full-text

Reverse-engineering human olfactory perception from chemical features of odor molecules

10.1101/082495 ◽

2016 ◽

Cited By ~ 2

Author(s):

Andreas Keller ◽

Richard C. Gerkin ◽

Yuanfang Guan ◽

Amit Dhurandhar ◽

Gabor Turu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Molecular Mechanisms ◽

Linear Models ◽

Predictive Accuracy ◽

High Accuracy ◽

Machine Learning Algorithms ◽

Olfactory Perception ◽

Theoretical Limit ◽

Reverse Engineer

AbstractDespite 25 years of progress in understanding the molecular mechanisms of olfaction, it is still not possible to predict whether a given molecule will have a perceived odor, or what olfactory percept it will produce. To address this stimulus-percept problem for olfaction, we organized the crowd-sourced DREAM Olfaction Prediction Challenge. Working from a large olfactory psychophysical dataset, teams developed machine learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models predicted odor intensity and pleasantness with high accuracy, and also successfully predicted eight semantic descriptors (“garlic”, “fish”, “sweet”, “fruit”, “burnt”, “spices”, “flower”, “sour”). Regularized linear models performed nearly as well as random-forest-based approaches, with a predictive accuracy that closely approaches a key theoretical limit. The models presented here make it possible to predict the perceptual qualities of virtually any molecule with an impressive degree of accuracy to reverse-engineer the smell of a molecule.One Sentence SummaryResults of a crowdsourcing competition show that it is possible to accurately predict and reverse-engineer the smell of a molecule.

Download Full-text

Selecting Machine-Learning Scoring Functions for Structure-Based Virtual Screening

10.26434/chemrxiv.12967160.v1 ◽

2020 ◽

Author(s):

Pedro Ballester

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Predictive Accuracy ◽

Scoring Function ◽

3D Models ◽

Large Datasets ◽

Scoring Functions ◽

Discovery Process ◽

Drug Discovery Process

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.

Download Full-text