NbX: Machine Learning-Guided Re-Ranking of Nanobody–Antigen Binding Poses

Chunlai Tam; Ashutosh Kumar; Kam Y. J. Zhang

doi:10.3390/ph14100968

NbX: Machine Learning-Guided Re-Ranking of Nanobody–Antigen Binding Poses

Pharmaceuticals ◽

10.3390/ph14100968 ◽

2021 ◽

Vol 14 (10) ◽

pp. 968

Author(s):

Chunlai Tam ◽

Ashutosh Kumar ◽

Kam Y. J. Zhang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Affinity Maturation ◽

Antigen Binding ◽

Pose Prediction ◽

Decision Tree Classifier ◽

Single Domain Antibody ◽

Protein Protein Interaction ◽

Tree Classifier ◽

3D Cnn

Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.

Download Full-text

A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction

10.1101/2021.07.06.451258 ◽

2021 ◽

Author(s):

Philippe Auguste Robert ◽

Rahmad Akbar ◽

Robert Frank ◽

Milena Pavlović ◽

Michael Widrich ◽

...

Keyword(s):

Machine Learning ◽

In Silico ◽

Prediction Accuracy ◽

Large Scale ◽

Structural Information ◽

Antigen Binding ◽

Antibody Specificity ◽

Binding Prediction ◽

Information Encoding ◽

Prediction Problems

Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.

Download Full-text

Heart Disease Prediction using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9780.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 700-704

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Nearest Neighbour ◽

Decision Tree Classifier ◽

Support Vector Classifier ◽

Learning Techniques ◽

Tree Classifier

Deriving the methodologies to detect heart issues at an earlier stage and intimating the patient to improve their health. To resolve this problem, we will use Machine Learning techniques to predict the incidence at an earlier stage. We have a tendency to use sure parameters like age, sex, height, weight, case history, smoking and alcohol consumption and test like pressure ,cholesterol, diabetes, ECG, ECHO for prediction. In machine learning there are many algorithms which will be used to solve this issue. The algorithms include K-Nearest Neighbour, Support vector classifier, decision tree classifier, logistic regression and Random Forest classifier. Using these parameters and algorithms we need to predict whether or not the patient has heart disease or not and recommend the patient to improve his/her health.

Download Full-text

Analyzing Behavior of Cancer Patients using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8414.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1547-1556

Keyword(s):

Machine Learning ◽

Natural Language ◽

Cancer Patients ◽

Language Processing ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Operating Characteristics ◽

Decision Tree Classifier ◽

Tree Classifier

The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).

Download Full-text

Successful Case Study of Machine Learning Application to Streamline and Improve History Matching Process for Complex Gas-Condensate Reservoirs in Hai Thach Field, Offshore Vietnam

10.2118/204835-ms ◽

2021 ◽

Author(s):

Son Hoang ◽

Tung Tran ◽

Tan Nguyen ◽

Tu Truong ◽

Duy Pham ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

History Matching ◽

Dynamic Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Gas Condensate ◽

Decision Tree Classifier ◽

Matching Process ◽

Tree Classifier

Abstract This paper reports a successful case study of applying machine learning to improve the history matching process, making it easier, less time-consuming, and more accurate, by determining whether Local Grid Refinement (LGR) with transmissibility multiplier is needed to history match gas-condensate wells producing from geologically complex reservoirs as well as determining the required LGR setup to history match those gas-condensate producers. History matching Hai Thach gas-condensate production wells is extremely challenging due to the combined effect of condensate banking, sub-seismic fault network, complex reservoir distribution and connectivity, uncertain HIIP, and lack of PVT data for most reservoirs. In fact, for some wells, many trial simulation runs were conducted before it became clear that LGR with transmissibility multiplier was required to obtain good history matching. In order to minimize this time-consuming trial-and-error process, machine learning was applied in this study to analyze production data using synthetic samples generated by a very large number of compositional sector models so that the need for LGR could be identified before the history matching process begins. Furthermore, machine learning application could also determine the required LGR setup. The method helped provide better models in a much shorter time, and greatly improved the efficiency and reliability of the dynamic modeling process. More than 500 synthetic samples were generated using compositional sector models and divided into separate training and test sets. Multiple classification algorithms such as logistic regression, Gaussian Naive Bayes, Bernoulli Naive Bayes, multinomial Naive Bayes, linear discriminant analysis, support vector machine, K-nearest neighbors, and Decision Tree as well as artificial neural networks were applied to predict whether LGR was used in the sector models. The best algorithm was found to be the Decision Tree classifier, with 100% accuracy on the training set and 99% accuracy on the test set. The LGR setup (size of LGR area and range of transmissibility multiplier) was also predicted best by the Decision Tree classifier with 91% accuracy on the training set and 88% accuracy on the test set. The machine learning model was validated using actual production data and the dynamic models of history-matched wells. Finally, using the machine learning prediction on wells with poor history matching results, their dynamic models were updated and significantly improved.

Download Full-text

Improved argumentative paragraphs detection in academic theses supported with unit segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219237 ◽

2021 ◽

pp. 1-11

Author(s):

Jesús Miguel García-Gorrostieta ◽

Aurelio López-López ◽

Samuel González-López ◽

Adrián Pastor López-Monroy

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Automatic Detection ◽

Machine Learning Techniques ◽

Svm Classifier ◽

Complex Task ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Academic Author

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Download Full-text

An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers

Applied Sciences ◽

10.3390/app9112375 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2375 ◽

Cited By ~ 7

Author(s):

Riaz Ullah Khan ◽

Xiaosong Zhang ◽

Rajesh Kumar ◽

Abubakar Sharif ◽

Noorbakhsh Amiri Golilarz ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Network Traffic ◽

Traffic Classification ◽

Decision Tree Classifier ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Average Accuracy ◽

Final Layer ◽

Tree Classifier

In recent years, the botnets have been the most common threats to network security since it exploits multiple malicious codes like a worm, Trojans, Rootkit, etc. The botnets have been used to carry phishing links, to perform attacks and provide malicious services on the internet. It is challenging to identify Peer-to-peer (P2P) botnets as compared to Internet Relay Chat (IRC), Hypertext Transfer Protocol (HTTP) and other types of botnets because P2P traffic has typical features of the centralization and distribution. To resolve the issues of P2P botnet identification, we propose an effective multi-layer traffic classification method by applying machine learning classifiers on features of network traffic. Our work presents a framework based on decision trees which effectively detects P2P botnets. A decision tree algorithm is applied for feature selection to extract the most relevant features and ignore the irrelevant features. At the first layer, we filter non-P2P packets to reduce the amount of network traffic through well-known ports, Domain Name System (DNS). query, and flow counting. The second layer further characterized the captured network traffic into non-P2P and P2P. At the third layer of our model, we reduced the features which may marginally affect the classification. At the final layer, we successfully detected P2P botnets using decision tree Classifier by extracting network communication features. Furthermore, our experimental evaluations show the significance of the proposed method in P2P botnets detection and demonstrate an average accuracy of 98.7%.

Download Full-text

Functional and Structural Connectome Features for Machine Learning Chemo-Brain Prediction in Women Treated for Breast Cancer with Chemotherapy

Brain Sciences ◽

10.3390/brainsci10110851 ◽

2020 ◽

Vol 10 (11) ◽

pp. 851

Author(s):

Vincent Chin-Hung Chen ◽

Tung-Yeh Lin ◽

Dah-Cherng Yeh ◽

Jyh-Wen Chai ◽

Jun-Cheng Weng

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cognitive Disorders ◽

Breast Cancer Patients ◽

Functional Magnetic Resonance ◽

Decision Tree Classifier ◽

Global Efficiency ◽

Tree Classifier ◽

Study Participants ◽

Chemo Brain

Breast cancer is the leading cancer among women worldwide, and a high number of breast cancer patients are struggling with psychological and cognitive disorders. In this study, we aim to use machine learning models to discriminate between chemo-brain participants and healthy controls (HCs) using connectomes (connectivity matrices) and topological coefficients. Nineteen female post-chemotherapy breast cancer (BC) survivors and 20 female HCs were recruited for this study. Participants in both groups received resting-state functional magnetic resonance imaging (rs-fMRI) and generalized q-sampling imaging (GQI). Logistic regression (LR), decision tree classifier (CART), and xgboost (XGB) were the models we adopted for classification. In connectome analysis, LR achieved an accuracy of 79.49% with the functional connectomes and an accuracy of 71.05% with the structural connectomes. In the topological coefficient analysis, accuracies of 87.18%, 82.05%, and 83.78% were obtained by the functional global efficiency with CART, the functional global efficiency with XGB, and the structural transitivity with CART, respectively. The areas under the curves (AUCs) were 0.93, 0.94, 0.87, 0.88, and 0.84, respectively. Our study showed the discriminating ability of functional connectomes, structural connectomes, and global efficiency. We hope our findings can contribute to an understanding of the chemo brain and the establishment of a clinical system for tracking chemo brain.

Download Full-text

Trial by trial, machine learning approach identifies temporally discrete Aδ- and C-fibre mediated laser evoked potentials that predict pain behaviour in rats.

10.1101/2021.08.10.455801 ◽

2021 ◽

Author(s):

Anna C Sales ◽

Anthony James Blockeel ◽

John R Huxter ◽

James P Dunham ◽

Robert AR Drake ◽

...

Keyword(s):

Machine Learning ◽

Evoked Potentials ◽

Experimental Pain ◽

High Dose ◽

Single Trial ◽

Decision Tree Classifier ◽

Laser Evoked Potentials ◽

Behavioural Responses ◽

Trial Analysis ◽

Tree Classifier

Laser evoked potentials (LEPs) – the EEG response to temporally-discrete thermal stimuli – are commonly used in experimental pain studies in humans. Such stimuli selectively activate nociceptors and produce EEG features which correlate with pain intensity. The rodent LEP has been proposed to be a translational biomarker of nociception and pain, however its validity has been questioned because of reported differences in the classes of nociceptive fibres mediating the response. Here we use a machine learning, trial by trial analysis approach on wavelet-denoised LEPs generated by stimulation of the plantar hindpaw of rats. The LEP amplitude was more strongly related to behavioural response than to laser stimulus energy. A simple decision tree classifier using LEP features was able to predict behavioural responses with 73% accuracy. An examination of the features used by the classifier showed that mutually exclusive short and long latency LEP peaks were clearly seen in single-trial data, yet were not evident in grand average data pooled from multiple trials. This bimodal distribution of LEP latencies was mirrored in the paw withdrawal latencies which were preceded and predicted by the LEP responses. The proportion of short latency events was increased after intradermal application of high dose capsaicin (to defunctionalise TRPV1 expressing nociceptors), suggesting they were mediated by Aδ-fibres (specifically AMH-I). These findings demonstrate that both C- and Aδ-fibres contribute to rodent LEPs and concomitant behavioural responses, providing a real-time assay of specific fibre function in conscious animals. Single-trial analysis approaches can improve the utility of LEPs as a translatable biomarker of pain.

Download Full-text

A Preliminary Look at Heuristic Analysis for Assessing Artificial Intelligence Explainability

WSEAS TRANSACTIONS ON COMPUTER RESEARCH ◽

10.37394/232018.2020.8.9 ◽

2020 ◽

Vol 8 ◽

pp. 61-72

Author(s):

Kara Combs ◽

Mary Fendley ◽

Trevor Bihl

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Tree ◽

Human Factors ◽

Black Box ◽

Decision Processes ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Explainable Ai ◽

Heuristic Analysis

Artificial Intelligence and Machine Learning (AI/ML) models are increasingly criticized for their “black-box” nature. Therefore, eXplainable AI (XAI) approaches to extract human-interpretable decision processes from algorithms have been explored. However, XAI research lacks understanding of algorithmic explainability from a human factors’ perspective. This paper presents a repeatable human factors heuristic analysis for XAI with a demonstration on four decision tree classifier algorithms.

Download Full-text

Clustering Visualization and Class Prediction using Flask of Benchmark Dataset for Unsupervised Techniques in Machine learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5943.059720 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1297-1302 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Dimensionality Reduction ◽

Support Vector ◽

Decision Tree Classifier ◽

Class Prediction ◽

Linear Discriminant ◽

Reduction Techniques ◽

Tree Classifier ◽

Dimensionality Reduction Techniques ◽

Clustering Pattern

Cutting edge improved techniques gave greater values to Artificial Intelligence (AI) and Machine Learning (ML) which are becoming a part of interest rapidly for numerous types of researches presently. Clustering and Dimensionality Reduction Techniques are one of the trending methods utilized in Machine Learning these days. Fundamentally clustering techniques such as K-means and Hierarchical is utilized to predict the data and put it into the required group in a cluster format. Clustering can be utilized in recommendation frameworks, examination of clients related to social media platforms, patients related to particular diseases of specific age groups can be categorized, etc. While most aspects of the dimensionality lessening method such as Principal Component Analysis and Linear Discriminant Analysis are a bit like the clustering method but it decreases the data size and plots the cluster. In this paper, a comparative and predictive analysis is done utilizing three different datasets namely IRIS, Wine, and Seed from the UCI benchmark in Machine learning on four distinctive techniques. The class prediction analysis of the dataset is done employing a flask-app. The main aim is to form a good clustering pattern for each dataset for given techniques. The experimental analysis calculates the accuracy of the shaped clusters used different machine learning classifiers namely Logistic Regression, K-nearest neighbors, Support Vector Machine, Gaussian Naïve Bayes, Decision Tree Classifier, and Random Forest Classifier. Cohen Kappa is another accuracy indicator used to compare the obtained classification result. It is observed that Kmeans and Hierarchical clustering analysis provide a good clustering pattern of the input dataset than the dimensionality reduction techniques. Clustering Design is well-formed in all the techniques. The KNN classifier provides an improved accuracy in all the techniques of the dataset.

Download Full-text