scholarly journals Drug–drug interaction prediction with Wasserstein Adversarial Autoencoder-based knowledge graph embeddings

Author(s):  
Yuanfei Dai ◽  
Chenhao Guo ◽  
Wenzhong Guo ◽  
Carsten Eickhoff

Abstract An interaction between pharmacological agents can trigger unexpected adverse events. Capturing richer and more comprehensive information about drug–drug interactions (DDIs) is one of the key tasks in public health and drug development. Recently, several knowledge graph (KG) embedding approaches have received increasing attention in the DDI domain due to their capability of projecting drugs and interactions into a low-dimensional feature space for predicting links and classifying triplets. However, existing methods only apply a uniformly random mode to construct negative samples. As a consequence, these samples are often too simplistic to train an effective model. In this paper, we propose a new KG embedding framework by introducing adversarial autoencoders (AAEs) based on Wasserstein distances and Gumbel-Softmax relaxation for DDI tasks. In our framework, the autoencoder is employed to generate high-quality negative samples and the hidden vector of the autoencoder is regarded as a plausible drug candidate. Afterwards, the discriminator learns the embeddings of drugs and interactions based on both positive and negative triplets. Meanwhile, in order to solve vanishing gradient problems on the discrete representation—an inherent flaw in traditional generative models—we utilize the Gumbel-Softmax relaxation and the Wasserstein distance to train the embedding model steadily. We empirically evaluate our method on two tasks: link prediction and DDI classification. The experimental results show that our framework can attain significant improvements and noticeably outperform competitive baselines. Supplementary information: Supplementary data and code are available at https://github.com/dyf0631/AAE_FOR_KG.

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1407
Author(s):  
Peng Wang ◽  
Jing Zhou ◽  
Yuzhang Liu ◽  
Xingchen Zhou

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.


Author(s):  
Zhuohang Yu ◽  
Zengrui Wu ◽  
Weihua Li ◽  
Guixia Liu ◽  
Yun Tang

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 29 (07) ◽  
pp. 1850058 ◽  
Author(s):  
Juan M. Górriz ◽  
Javier Ramírez ◽  
F. Segovia ◽  
Francisco J. Martínez ◽  
Meng-Chuan Lai ◽  
...  

Although much research has been undertaken, the spatial patterns, developmental course, and sexual dimorphism of brain structure associated with autism remains enigmatic. One of the difficulties in investigating differences between the sexes in autism is the small sample sizes of available imaging datasets with mixed sex. Thus, the majority of the investigations have involved male samples, with females somewhat overlooked. This paper deploys machine learning on partial least squares feature extraction to reveal differences in regional brain structure between individuals with autism and typically developing participants. A four-class classification problem (sex and condition) is specified, with theoretical restrictions based on the evaluation of a novel upper bound in the resubstitution estimate. These conditions were imposed on the classifier complexity and feature space dimension to assure generalizable results from the training set to test samples. Accuracies above [Formula: see text] on gray and white matter tissues estimated from voxel-based morphometry (VBM) features are obtained in a sample of equal-sized high-functioning male and female adults with and without autism ([Formula: see text], [Formula: see text]/group). The proposed learning machine revealed how autism is modulated by biological sex using a low-dimensional feature space extracted from VBM. In addition, a spatial overlap analysis on reference maps partially corroborated predictions of the “extreme male brain” theory of autism, in sexual dimorphic areas.


2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


2020 ◽  
Vol 34 (04) ◽  
pp. 5620-5627 ◽  
Author(s):  
Murat Sensoy ◽  
Lance Kaplan ◽  
Federico Cerutti ◽  
Maryam Saleki

Deep neural networks are often ignorant about what they do not know and overconfident when they make uninformed predictions. Some recent approaches quantify classification uncertainty directly by training the model to output high uncertainty for the data samples close to class boundaries or from the outside of the training distribution. These approaches use an auxiliary data set during training to represent out-of-distribution samples. However, selection or creation of such an auxiliary data set is non-trivial, especially for high dimensional data such as images. In this work we develop a novel neural network model that is able to express both aleatoric and epistemic uncertainty to distinguish decision boundary and out-of-distribution regions of the feature space. To this end, variational autoencoders and generative adversarial networks are incorporated to automatically generate out-of-distribution exemplars for training. Through extensive analysis, we demonstrate that the proposed approach provides better estimates of uncertainty for in- and out-of-distribution samples, and adversarial examples on well-known data sets against state-of-the-art approaches including recent Bayesian approaches for neural networks and anomaly detection methods.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (14) ◽  
pp. 2395-2402 ◽  
Author(s):  
Xiaoying Wang ◽  
Bin Yu ◽  
Anjun Ma ◽  
Cheng Chen ◽  
Bingqiang Liu ◽  
...  

Abstract Motivation The prediction of protein–protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2–15.7% and 6.1–18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document