Prioritizing Small Molecule as Candidates for Drug Repositioning using Machine Learning

Mapping Intimacies ◽

10.1101/331975 ◽

2018 ◽

Author(s):

Khader Shameer ◽

Kipp W. Johnson ◽

Benjamin S. Glicksberg ◽

Rachel Hodos ◽

Ben Readhead ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Drug Repositioning ◽

Chemical Properties ◽

Support Vector ◽

Feature Engineering ◽

Connectivity Map ◽

Molecular Features

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.

Download Full-text

Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review

Mini-Reviews in Medicinal Chemistry ◽

10.2174/1389557520666200429102334 ◽

2020 ◽

Vol 20 (14) ◽

pp. 1375-1388 ◽

Cited By ~ 2

Author(s):

Patnala Ganga Raju Achary

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Model Building ◽

Chemical Space ◽

Qsar Model ◽

Quantitative Structure ◽

Efficient Manner ◽

Qsar Analysis ◽

Structure Activity

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.

Download Full-text

Prediction of drug-protein interaction and drug repositioning using machine learning model

10.1101/2020.07.29.218826 ◽

2020 ◽

Author(s):

Yu-Ting Lin ◽

Sheh-Yi Sheu ◽

Chen-Ching Lin

Keyword(s):

Machine Learning ◽

Betweenness Centrality ◽

Drug Repositioning ◽

Protein Network ◽

Training Data ◽

Support Vector ◽

Binding Interaction ◽

Binding Strength ◽

Edge Betweenness ◽

Protein Models

AbstractBackgroundTraditional drug development is time-consuming and expensive, while computer-aided drug repositioning can improve efficiency and productivity. In this study, we proposed a machine learning pipeline to predict the binding interaction between proteins and marketed or studied drugs. We then extended the predicted interactions to construct a protein network that could be applied to discover the potentially shared drugs between proteins and thus predict drug repositioning.MethodsBinding information between proteins and drugs from the Binding Database and the physicochemical properties of drugs from the ChEMBL database were used to build the machine learning models, i.e. support vector regression. We further measured proportionalities between proteins by the predicted binding affinity and introduced edge betweenness centrality to construct a protein similarity network for drug repositioning.ResultsAs the proof of concept, we demonstrated our machine learning approach is capable of reflecting the binding strength between drugs and the target protein. When comparing coefficients of protein models, we found proteins SYUA and TAU that may share common ligand which were not in our training data. Using the edge betweenness centrality network based on the prediction proportionality of protein models, we found a potential target, AK1C2, of aspirin and of which the binding interaction had been validated.ConclusionsOur study could not only be applied to drug repositioning by comparing protein models or searching the protein-protein network, but also to predict the binding strength once the sufficient experimental data was provided to train the protein models.

Download Full-text

Deep learning on chaos game representation for proteins

Bioinformatics ◽

10.1093/bioinformatics/btz493 ◽

2019 ◽

Vol 36 (1) ◽

pp. 272-279 ◽

Cited By ~ 5

Author(s):

Hannah F Löchel ◽

Dominic Eger ◽

Theodor Sperlea ◽

Dominik Heider

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Chemical Properties ◽

Protein Sequences ◽

Machine Learning Techniques ◽

Supplementary Information ◽

Support Vector ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Pushing the Frontiers of Accessible Chemical Space to Unleash Design Creativity and Accelerate Drug Discovery

CHIMIA International Journal for Chemistry ◽

10.2533/chimia.2020.803 ◽

2020 ◽

Vol 74 (10) ◽

pp. 803-807

Author(s):

Thomas C. Fessard ◽

Kristina Goncharenko ◽

Quentin Lefebvre ◽

Christophe Salomé

Keyword(s):

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Molecular Structures ◽

Complex Structures ◽

Small Molecule Drug ◽

Design Creativity ◽

Research Environments ◽

Complex Structural ◽

Synthetic Methodologies

In highly competitive research environments, the ability to access more complex structural spaces efficiently is a predictor of a company's ability to generate novel IP-protected small molecule candidates with adequate properties, hence filling their development pipelines. SpiroChem is consistently developing new synthetic methodologies and strategies to access complex molecular structure, thereby facilitating and accelerating small molecule drug discovery. Pushing the limits of what are perceived as complex molecular structures allows SpiroChem and its clients to unleash creativity and explore meaningful chemical spaces, which are under-exploited sources of novel active molecules. In this article, we explain how we differentiated ourselves in a globalized R&D environment and we provide several snapshots of how efficient methodologies can generate complex structures, rapidly.

Download Full-text

Applications of Machine Learning in Drug Discovery I: Target Discovery and Small Molecule Drug Design

Artificial Intelligence in Oncology Drug Discovery and Development ◽

10.5772/intechopen.93159 ◽

2020 ◽

Author(s):

John W. Cassidy

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Drug Design ◽

Small Molecule ◽

Target Discovery ◽

Small Molecule Drug ◽

Applications Of Machine Learning

Download Full-text

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques (Preprint)

10.2196/preprints.13611 ◽

2019 ◽

Author(s):

Max Wang ◽

Wenbo Ge ◽

Deborah Apthorp ◽

Hanna Suominen

Keyword(s):

Machine Learning ◽

Parkinson Disease ◽

Neurodegenerative Disorder ◽

Disease Diagnosis ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Engineering ◽

Data Set ◽

Performance Improvements ◽

Number Of Patients

BACKGROUND Parkinson disease (PD) is a common neurodegenerative disorder that affects between 7 and 10 million people worldwide. No objective test for PD currently exists, and studies suggest misdiagnosis rates of up to 34%. Machine learning (ML) presents an opportunity to improve diagnosis; however, the size and nature of data sets make it difficult to generalize the performance of ML models to real-world applications. OBJECTIVE This study aims to consolidate prior work and introduce new techniques in feature engineering and ML for diagnosis based on vowel phonation. Additional features and ML techniques were introduced, showing major performance improvements on the large mPower vocal phonation data set. METHODS We used 1600 randomly selected /aa/ phonation samples from the entire data set to derive rules for filtering out faulty samples from the data set. The application of these rules, along with a joint age-gender balancing filter, results in a data set of 511 PD patients and 511 controls. We calculated features on a 1.5-second window of audio, beginning at the 1-second mark, for a support vector machine. This was evaluated with 10-fold cross-validation (CV), with stratification for balancing the number of patients and controls for each CV fold. RESULTS We showed that the features used in prior literature do not perform well when extrapolated to the much larger mPower data set. Owing to the natural variation in speech, the separation of patients and controls is not as simple as previously believed. We presented significant performance improvements using additional novel features (with 88.6% certainty, derived from a Bayesian correlated t test) in separating patients and controls, with accuracy exceeding 58%. CONCLUSIONS The results are promising, showing the potential for ML in detecting symptoms imperceptible to a neurologist.

Download Full-text

Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

10.26434/chemrxiv.12902288 ◽

2020 ◽

Author(s):

Alice Capecchi ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Natural Products ◽

Chemical Space ◽

Chemical Properties ◽

Structural Diversity ◽

Physico Chemical ◽

Machine Learning Model ◽

Interactive Map ◽

Microbial Natural Products ◽

Tree Map

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>

Download Full-text

Proposed Improvements for Automated Chemical Safety Evaluations Using In-Silico Techniques

10.20944/preprints202005.0408.v1 ◽

2020 ◽

Author(s):

Bryan Jordan

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Chemical Space ◽

Machine Learning Techniques ◽

Training Dataset ◽

Generative Adversarial Networks ◽

Chemical Safety ◽

Adversarial Networks ◽

Wide Range ◽

Traditional Drug

The vastness of chemical-space constrains traditional drug-discovery methods to the organic laws that are guiding the chemistry involved in filtering through candidates. Leveraging computing with machine-learning to intelligently generate compounds that meet a wide range of objectives can bring significant gains in time and effort needed to filter through a broad range of candidates. This paper details how the use of Generative-Adversarial-Networks, novel machine learning techniques to format the training dataset and the use of quantum computing offer new ways to expedite drug-discovery.

Download Full-text

Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

Biomolecules ◽

10.3390/biom10101385 ◽

2020 ◽

Vol 10 (10) ◽

pp. 1385

Author(s):

Alice Capecchi ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Natural Products ◽

Chemical Space ◽

Chemical Properties ◽

Structural Diversity ◽

Physico Chemical ◽

Machine Learning Model ◽

Interactive Map ◽

Microbial Natural Products ◽

Tree Map

Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.

Download Full-text

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions

Nature Communications ◽

10.1038/s41467-019-12875-2 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 64

Author(s):

K. T. Schütt ◽

M. Gastegger ◽

A. Tkatchenko ◽

K.-R. Müller ◽

R. J. Maurer

Keyword(s):

Machine Learning ◽

Quantum Chemistry ◽

Degrees Of Freedom ◽

Large Scale ◽

Materials Science ◽

Chemical Space ◽

Chemical Properties ◽

Molecular Structures ◽

Learning Framework ◽

Molecular Wavefunctions

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.

Download Full-text