Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction

Abstract The existing methods ignore the adverse effect of knowledge graph incompleteness on knowledge graph embedding. In addition, the complexity and large-scale of knowledge information hinder knowledge graph embedding performance of the classic graph convolutional network. In this paper, we analyzed the structural characteristics of knowledge graph and the imbalance of knowledge information. Complex knowledge information requires that the model should have better learnability, rather than linearly weighted qualitative constraints, so the method of end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding is proposed. Firstly, we construct the relation-enhanced adjacency matrix to consider the incompleteness of the knowledge graph. Secondly, the graph self-attention network is employed to obtain the global encoding and relevance ranking of entity node information. Thirdly, we propose the concept of convolutional knowledge subgraph, it is constructed according to the entity relevance ranking. Finally, we improve the training effect of the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The experimental results based on the data sets FB15k-237 and WN18RR show that the proposed method facilitates more comprehensive representation of knowledge information than the existing methods, in terms of Hits@10 and MRR.

Download Full-text

OPTIMAL METHODS FOR RE-ORDERING DATA MATRICES IN SYSTEMS BIOLOGY AND DRUG DISCOVERY APPLICATIONS

Biophysical Reviews and Letters ◽

10.1142/s1793048008000605 ◽

2008 ◽

Vol 03 (01n02) ◽

pp. 19-42

Author(s):

PETER A. DIMAGGIO ◽

SCOTT R. MCALLISTER ◽

CHRISTODOULOS A. FLOUDAS ◽

XIAO-JIANG FENG ◽

JOSHUA D. RABINOWITZ ◽

...

Keyword(s):

Systems Biology ◽

Drug Discovery ◽

Large Scale ◽

Mixed Integer ◽

Data Sets ◽

Clustering Methods ◽

Clustering Techniques ◽

Large Scale Data ◽

Heuristic Strategies ◽

Data Matrices

The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Many of the methods developed employ local search or heuristic strategies for identifying the "best" arrangement of features according to some metric. In this article, we present rigorous clustering methods based on the optimal re-ordering of data matrices. Distinct mixed-integer linear programming (MILP) models are utilized for the clustering of (a) dense data matrices, such as gene expression data, and (b) sparse data matrices, which are commonly encountered in the field of drug discovery. Both methods can be used in an iterative framework to bicluster data and assist in the synthesis of drug compounds, respectively. We demonstrate the capability of the proposed optimal re-ordering methods on several data sets from both systems biology and molecular discovery studies and compare our results to other clustering techniques when applicable.

Download Full-text

HTX: a tool for the exploration and visualization of high-throughput image assays

10.1101/204016 ◽

2017 ◽

Author(s):

Carlos Arteta ◽

Victor Lempitsky ◽

Jaroslav Zak ◽

Xin Lu ◽

J. Alison Noble ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Domain Knowledge ◽

Large Scale ◽

Data Sets ◽

Imaging Data ◽

Specific Domain ◽

Small Molecule Libraries ◽

Algorithmic Techniques ◽

2D Data

AbstractHigh-throughput screening (HTS) techniques have enabled large scale image-based studies, but extracting biological insights from the imaging data in an exploratory setting remains a challenge. Existing packages for this task either require expert annotations, which can bias the outcome of the study, or are completely unsupervised, failing to leverage the information present in the assay design. We present HTX, an interactive tool to aid in the exploration of large microscopy data sets by allowing the visualization of entire image-based assays according to visual similarities between the samples in an intuitive and navigable manner. Underlying HTX are a collection of novel algorithmic techniques for deep texture descriptor learning, 2D data visualization, adversarial suppression of batch effects, and backprop-based image saliency estimation.We demonstrate that HTX can exploit the screen meta-data in order to learn screen-specific image descriptors, which are then used to quantify the visual similarity between samples in the assay. Given these similarities and the different visualization resources of HTX, it is shown that screens of small-molecule libraries on cell data can be easily explored, reproducing the results of previous studies where highly-specific domain knowledge was required.

Download Full-text

Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects

10.26434/chemrxiv.11364884.v2 ◽

2020 ◽

Cited By ~ 3

Author(s):

Christina Schindler ◽

Hannah Baumann ◽

Andreas Blum ◽

Dietrich Böse ◽

Hans-Peter Buchstaller ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Large Scale ◽

Active Drug ◽

Binding Free Energy ◽

Free Energy Calculations ◽

Energy Calculation ◽

Large Scale Assessment ◽

New Public ◽

Binding Free Energy Calculations

Here we present an evaluation of the binding affinity prediction accuracy of the free energy calculation method FEP+ on internal active drug discovery projects and on a large new public benchmark set.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270.v2 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

NPU RGB+D Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players

Applied Sciences ◽

10.3390/app11104426 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4426

Author(s):

Chunyan Ma ◽

Ji Fan ◽

Jinghao Yao ◽

Tao Zhang

Keyword(s):

Action Recognition ◽

Large Scale ◽

Short Term Memory ◽

Evaluation Criteria ◽

Image Data ◽

Basketball Player ◽

Basketball Players ◽

Convolutional Network ◽

Atomic Actions ◽

New Feature

Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.

Download Full-text