Persona2vec: a flexible multi-role representations learning framework for graphs

PeerJ Computer Science ◽

10.7717/peerj-cs.439 ◽

2021 ◽

Vol 7 ◽

pp. e439

Author(s):

Jisung Yoon ◽

Kai-Cheng Yang ◽

Woo-Sung Jung ◽

Yong-Yeol Ahn

Keyword(s):

Community Structure ◽

Link Prediction ◽

Graph Mining ◽

State Of The Art ◽

Multiple Representations ◽

Graph Embedding ◽

Learning Framework ◽

Overlapping Community ◽

Art Performance ◽

Low Dimensional

Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively overlapping community structure, where most nodes belong to multiple communities, playing different roles depending on the contexts. Here, we propose persona2vec, a graph embedding framework that efficiently learns multiple representations of nodes based on their structural contexts. Using link prediction-based evaluation, we show that our framework is significantly faster than the existing state-of-the-art model while achieving better performance.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

A Novel Negative Sampling Based on Frequency of Relational Association Entities for Knowledge Graph Embedding

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2068 ◽

2021 ◽

Author(s):

Wanhua Cao ◽

Yi Zhang ◽

Juntao Liu ◽

Ziyun Rao

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Evaluation Criteria ◽

Relation Extraction ◽

Graph Embedding ◽

Semantic Space ◽

Knowledge Graph ◽

Knowledge Reasoning ◽

Relation Prediction ◽

Low Dimensional

Knowledge graph embedding improves the performance of relation extraction and knowledge reasoning by encoding entities and relationships in low-dimensional semantic space. During training, negative samples are usually constructed by replacing the head/tail entity. And the different replacing relationships lead to different accuracy of the prediction results. This paper develops a negative triplets construction framework according to the frequency of relational association entities. The proposed construction framework can fully consider the quantitative of relations and entities in the dataset to assign the proportion of relation and entity replacement and the frequency of the entities associated with each relationship to set reasonable proportions for different relations. To verify the validity of the proposed construction framework, it is integrated into the state-of-the-art knowledge graph embedding models, such as TransE, TransH, DistMult, ComplEx, and Analogy. And both the evaluation criteria of relation prediction and entity prediction are used to evaluate the performance of link prediction more comprehensively. The experimental results on two commonly used datasets, WN18 and FB15K, show that the proposed method improves entity link and triplet classification accuracy, especially the accuracy of relational link prediction.

Download Full-text

How much topological structure is preserved by graph embeddings?

Computer Science and Information Systems ◽

10.2298/csis181001011l ◽

2019 ◽

Vol 16 (2) ◽

pp. 597-614 ◽

Cited By ~ 1

Author(s):

Xin Liu ◽

Chenyi Zhuang ◽

Tsuyoshi Murata ◽

Kyoung-Sook Kim ◽

Natthawut Kertkeidkachorn

Keyword(s):

Topological Structure ◽

Link Prediction ◽

State Of The Art ◽

Graph Embedding ◽

Graph Embeddings ◽

Dimensional Vector ◽

Original Graph ◽

Dimensional Vector Space ◽

Low Dimensional ◽

Link Distribution

Graph embedding aims at learning representations of nodes in a low dimensional vector space. Good embeddings should preserve the graph topological structure. To study how much such structure can be preserved, we propose evaluation methods from four aspects: 1) How well the graph can be reconstructed based on the embeddings, 2) The divergence of the original link distribution and the embedding-derived distribution, 3) The consistency of communities discovered from the graph and embeddings, and 4) To what extent we can employ embeddings to facilitate link prediction. We find that it is insufficient to rely on the embeddings to reconstruct the original graph, to discover communities, and to predict links at a high precision. Thus, the embeddings by the state-of-the-art approaches can only preserve part of the topological structure.

Download Full-text

Exploiting node metadata to predict interactions in large networks using graph embedding and neural networks

10.1101/2021.06.10.447991 ◽

2021 ◽

Author(s):

Rogini Runghen ◽

Daniel B Stouffer ◽

Giulio Valentino Dalla Riva

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Link Prediction ◽

Graph Embedding ◽

Feature Space ◽

Machine Learning Techniques ◽

Large Networks ◽

Data Set ◽

Learning Techniques ◽

Low Dimensional

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.

Download Full-text

Semisupervised Community Preserving Network Embedding with Pairwise Constraints

Complexity ◽

10.1155/2020/7953758 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Dong Liu ◽

Yan Ru ◽

Qinpeng Li ◽

Shibin Wang ◽

Jianwei Niu

Keyword(s):

Community Structure ◽

Link Prediction ◽

Learning Algorithms ◽

Nonnegative Matrix ◽

Machine Learning Algorithms ◽

Network Visualization ◽

Network Embedding ◽

Pairwise Constraints ◽

Node Clustering ◽

Low Dimensional

Network embedding aims to learn the low-dimensional representations of nodes in networks. It preserves the structure and internal attributes of the networks while representing nodes as low-dimensional dense real-valued vectors. These vectors are used as inputs of machine learning algorithms for network analysis tasks such as node clustering, classification, link prediction, and network visualization. The network embedding algorithms, which considered the community structure, impose a higher level of constraint on the similarity of nodes, and they make the learned node embedding results more discriminative. However, the existing network representation learning algorithms are mostly unsupervised models; the pairwise constraint information, which represents community membership, is not effectively utilized to obtain node embedding results that are more consistent with prior knowledge. This paper proposes a semisupervised modularized nonnegative matrix factorization model, SMNMF, while preserving the community structure for network embedding; the pairwise constraints (must-link and cannot-link) information are effectively fused with the adjacency matrix and node similarity matrix of the network so that the node representations learned by the model are more interpretable. Experimental results on eight real network datasets show that, comparing with the representative network embedding methods, the node representations learned after incorporating the pairwise constraints can obtain higher accuracy in node clustering task and the results of link prediction, and network visualization tasks indicate that the semisupervised model SMNMF is more discriminative than unsupervised ones.

Download Full-text

GLEE: Geometric Laplacian Eigenmap Embedding

Journal of Complex Networks ◽

10.1093/comnet/cnaa007 ◽

2020 ◽

Vol 8 (2) ◽

Author(s):

Leo Torres ◽

Kevin S Chan ◽

Tina Eliassi-Rad

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Laplacian Matrix ◽

Dimensional Representation ◽

Laplacian Eigenmaps ◽

New Approach ◽

Graph Reconstruction ◽

Node Similarity ◽

Distance Minimization ◽

Low Dimensional

Abstract Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, Geometric Laplacian Eigenmap Embedding, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction.

Download Full-text

Link Prediction via Ranking Metric Dual-Level Attention Network Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/493 ◽

2017 ◽

Cited By ~ 3

Author(s):

Zhou Zhao ◽

Ben Gao ◽

Vincent W. Zheng ◽

Deng Cai ◽

Xiaofei He ◽

...

Keyword(s):

Link Prediction ◽

Large Scale ◽

State Of The Art ◽

Feature Representation ◽

Telecommunication Networks ◽

Metric Embedding ◽

Network Learning ◽

Learning Framework ◽

Discriminative Feature ◽

Attentional Learning

Link prediction is a challenging problem for complex network analysis, arising in many disciplines such as social networks and telecommunication networks. Currently, many existing approaches estimate the proximity of the link endpoints for link prediction from their feature or the local neighborhood around them, which suffer from the localized view of network connections and insufficiency of discriminative feature representation. In this paper, we consider the problem of link prediction from the viewpoint of learning discriminative path-based proximity ranking metric embedding. We propose a novel ranking metric network learning framework by jointly exploiting both node-level and path-level attentional proximity of the endpoints for link prediction. We then develop the path-based dual-level reasoning attentional learning method with recurrent neural network for proximity ranking metric embedding. The extensive experiments on two large-scale datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.

Download Full-text

Learning Network Embedding with Community Structural Information

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/407 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Li ◽

Ying Wang ◽

Tingting Zhang ◽

Jiawei Zhang ◽

Yi Chang

Keyword(s):

Community Structure ◽

Link Prediction ◽

Structural Information ◽

Representation Learning ◽

Network Embedding ◽

Learning Network ◽

Optimization Framework ◽

Vertex Representation ◽

Low Dimensional ◽

Embedding Methods

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

Download Full-text

Understanding Negative Sampling in Knowledge Graph Embedding

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12105 ◽

2021 ◽

Vol 12 (1) ◽

pp. 71-81

Author(s):

Jing Qian ◽

Gangmin Li ◽

Katie Atkinson ◽

Yong Yue

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Direct Impact ◽

Dimensional Vector Space ◽

Dynamic Distribution ◽

Space Efficiency ◽

Node Classification ◽

Low Dimensional

Knowledge graph embedding (KGE) is to project entities and relations of a knowledge graph (KG) into a low-dimensional vector space, which has made steady progress in recent years. Conventional KGE methods, especially translational distance-based models, are trained through discriminating positive samples from negative ones. Most KGs store only positive samples for space efficiency. Negative sampling thus plays a crucial role in encoding triples of a KG. The quality of generated negative samples has a direct impact on the performance of learnt knowledge representation in a myriad of downstream tasks, such as recommendation, link prediction and node classification. We summarize current negative sampling approaches in KGE into three categories, static distribution-based, dynamic distribution-based and custom cluster-based respectively. Based on this categorization we discuss the most prevalent existing approaches and their characteristics. It is a hope that this review can provide some guidelines for new thoughts about negative sampling in KGE.

Download Full-text

Graph embedding on biomedical networks: methods, applications and evaluations

Bioinformatics ◽

10.1093/bioinformatics/btz718 ◽

2019 ◽

Cited By ~ 14

Author(s):

Xiang Yue ◽

Zhen Wang ◽

Jingong Huang ◽

Srinivasan Parthasarathy ◽

Soheil Moosavinasab ◽

...

Keyword(s):

Protein Function ◽

State Of The Art ◽

Protein Function Prediction ◽

Graph Embedding ◽

Experimental Results ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Biological Features ◽

Low Dimensional ◽

Embedding Methods

Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text