scholarly journals Evaluation of network architecture and data augmentation methods for deep learning in chemogenomics

2019 ◽  
Author(s):  
Benoit Playe ◽  
Véronique Stoven

AbstractAmong virtual screening methods that have been developed to facilitate the drug discovery process, chemogenomics presents the particularity to tackle the question of predicting ligands for proteins, at at scales both in the protein and chemical spaces. Therefore, in addition to to predict drug candidates for a given therapeutic protein target, like more classical ligand-based or receptor-based methods do, chemogenomics can also predict off-targets at the proteome level, and therefore, identify potential side-effects or drug repositioning opportunities. In this study, we study and compare machine-learning and deep learning approaches for chemogenomics, that are applicable to screen large sets of compounds against large sets of druggable proteins. State-of-the-art drug chemogenomics methods rely on expert-based chemical and protein descriptors or similarity measures. The recent development of deep learning approaches enabled to design algorithms that learn numerical abstract representations of molecular graphs and protein sequences in an end-to-end fashion, i.e., so that the learnt features optimise the objective function of the drug-target interaction prediction task. In this paper, we address drug-target interaction prediction at the druggable proteome-level, with what we define as the chemogenomic neuron network. This network consists of a feed-forward neuron network taking as input the combination of molecular and protein representations learnt by molecular graph and protein sequence encoders. We first propose a standard formulation of this chemogenomic neuron network. Then, we compare the performances of the standard chemogenomic network to reference deep learning or shallow (machine-learning without deep learning) methods. In particular, we show that such a representation learning approach is competitive with state-of-the-art chemogenomics with shallow methods, but not ultimately superior. We evaluate the most promising neuron network architectures and data augmentation techniques, such as multi-view and transfer learning, to improve the prediction performance of the chemogenomic network. Our results shed new insights on the design of chemogenomics approaches based on representation learning algorithms. Most importantly, we conclude from our observations that a promising research direction is to integrate heterogeneous sources of data such as various bioactivity datasets, or independently, multiple molecule and protein attribute views, instead of focusing on sophisticated, yet intuitively relevant, encoder’s neuron network architecture.

2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2021 ◽  
Vol 11 (11) ◽  
pp. 4753
Author(s):  
Gen Ye ◽  
Chen Du ◽  
Tong Lin ◽  
Yan Yan ◽  
Jack Jiang

(1) Background: Deep learning has become ubiquitous due to its impressive performance in various domains, such as varied as computer vision, natural language and speech processing, and game-playing. In this work, we investigated the performance of recent deep learning approaches on the laryngopharyngeal reflux (LPR) diagnosis task. (2) Methods: Our dataset is composed of 114 subjects with 37 pH-positive cases and 77 control cases. In contrast to prior work based on either reflux finding score (RFS) or pH monitoring, we directly take laryngoscope images as inputs to neural networks, as laryngoscopy is the most common and simple diagnostic method. The diagnosis task is formulated as a binary classification problem. We first tested a powerful backbone network that incorporates residual modules, attention mechanism and data augmentation. Furthermore, recent methods in transfer learning and few-shot learning were investigated. (3) Results: On our dataset, the performance is the best test classification accuracy is 73.4%, while the best AUC value is 76.2%. (4) Conclusions: This study demonstrates that deep learning techniques can be applied to classify LPR images automatically. Although the number of pH-positive images used for training is limited, deep network can still be capable of learning discriminant features with the advantage of technique.


2020 ◽  
Author(s):  
Ming Chen ◽  
Xiuze Zhou

Abstract Background: Because it is so laborious and expensive to experimentally identify Drug-Target Interactions (DTIs), only a few DTIs have been verified. Computational methods are useful for identifying DTIs in biological studies of drug discovery and development. Results: For drug-target interaction prediction, we propose a novel neural network architecture, DAEi, extended from Denoising AutoEncoder (DAE). We assume that a set of verified DTIs is a corrupted version of the full interaction set. We use DAEi to learn latent features from corrupted DTIs to reconstruct the full input. Also, to better predict DTIs, we add some similarities to DAEi and adopt a new nonlinear method for calculation. Similarity information is very effective at improving the prediction of DTIs. Conclusion: Results of the extensive experiments we conducted on four real data sets show that our proposed methods are superior to other baseline approaches.Availability: All codes in this paper are open-sourced, and our projects are available at: https://github.com/XiuzeZhou/DAEi.


Author(s):  
Kexin Huang ◽  
Tianfan Fu ◽  
Lucas M Glass ◽  
Marinka Zitnik ◽  
Cao Xiao ◽  
...  

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Kexin Huang ◽  
Cao Xiao ◽  
Lucas M Glass ◽  
Jimeng Sun

Abstract Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 2019 ◽  
pp. 1-7 ◽  
Author(s):  
Okeke Stephen ◽  
Mangal Sain ◽  
Uchenna Joseph Maduh ◽  
Do-Un Jeong

This study proposes a convolutional neural network model trained from scratch to classify and detect the presence of pneumonia from a collection of chest X-ray image samples. Unlike other methods that rely solely on transfer learning approaches or traditional handcrafted techniques to achieve a remarkable classification performance, we constructed a convolutional neural network model from scratch to extract features from a given chest X-ray image and classify it to determine if a person is infected with pneumonia. This model could help mitigate the reliability and interpretability challenges often faced when dealing with medical imagery. Unlike other deep learning classification tasks with sufficient image repository, it is difficult to obtain a large amount of pneumonia dataset for this classification task; therefore, we deployed several data augmentation algorithms to improve the validation and classification accuracy of the CNN model and achieved remarkable validation accuracy.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Daniel Griffith ◽  
Alex S Holehouse

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.


2021 ◽  
Vol 12 ◽  
Author(s):  
Prabhakar Maheswari ◽  
Purushothaman Raja ◽  
Orly Enrique Apolo-Apolo ◽  
Manuel Pérez-Ruiz

Smart farming employs intelligent systems for every domain of agriculture to obtain sustainable economic growth with the available resources using advanced technologies. Deep Learning (DL) is a sophisticated artificial neural network architecture that provides state-of-the-art results in smart farming applications. One of the main tasks in this domain is yield estimation. Manual yield estimation undergoes many hurdles such as labor-intensive, time-consuming, imprecise results, etc. These issues motivate the development of an intelligent fruit yield estimation system that offers more benefits to the farmers in deciding harvesting, marketing, etc. Semantic segmentation combined with DL adds promising results in fruit detection and localization by performing pixel-based prediction. This paper reviews the different literature employing various techniques for fruit yield estimation using DL-based semantic segmentation architectures. It also discusses the challenging issues that occur during intelligent fruit yield estimation such as sampling, collection, annotation and data augmentation, fruit detection, and counting. Results show that the fruit yield estimation employing DL-based semantic segmentation techniques yields better performance than earlier techniques because of human cognition incorporated into the architecture. Future directions like customization of DL architecture for smart-phone applications to predict the yield, development of more comprehensive model encompassing challenging situations like occlusion, overlapping and illumination variation, etc., were also discussed.


2016 ◽  
Author(s):  
Fangping Wan ◽  
Jianyang (Michael) Zeng

AbstractAccurately identifying compound-protein interactions in silico can deepen our understanding of the mechanisms of drug action and significantly facilitate the drug discovery and development process. Traditional similarity-based computational models for compound-protein interaction prediction rarely exploit the latent features from current available large-scale unlabelled compound and protein data, and often limit their usage on relatively small-scale datasets. We propose a new scheme that combines feature embedding (a technique of representation learning) with deep learning for predicting compound-protein interactions. Our method automatically learns the low-dimensional implicit but expressive features for compounds and proteins from the massive amount of unlabelled data. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline for accurate compound-protein interaction prediction, even when the interaction knowledge of compounds and proteins is entirely unknown. Evaluations on current large-scale databases of the measured compound-protein affinities, such as ChEMBL and BindingDB, as well as known drug-target interactions from DrugBank have demonstrated the superior prediction performance of our method, and suggested that it can offer a useful tool for drug development and drug repositioning.


BMC Genomics ◽  
2018 ◽  
Vol 19 (S7) ◽  
Author(s):  
Lingwei Xie ◽  
Song He ◽  
Xinyu Song ◽  
Xiaochen Bo ◽  
Zhongnan Zhang

Sign in / Sign up

Export Citation Format

Share Document