Semisupervised Community Preserving Network Embedding with Pairwise Constraints

Learning Network Embedding with Community Structural Information

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/407 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Li ◽

Ying Wang ◽

Tingting Zhang ◽

Jiawei Zhang ◽

Yi Chang

Keyword(s):

Community Structure ◽

Link Prediction ◽

Structural Information ◽

Representation Learning ◽

Network Embedding ◽

Learning Network ◽

Optimization Framework ◽

Vertex Representation ◽

Low Dimensional ◽

Embedding Methods

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

Download Full-text

Attributed Network Embedding with Micro-Meso Structure

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441486 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-26

Author(s):

Juan-Hui Li ◽

Ling Huang ◽

Chang-Dong Wang ◽

Dong Huang ◽

Jian-Huang Lai ◽

...

Keyword(s):

Community Structure ◽

Optimization Problems ◽

Nonnegative Matrix ◽

Superior Performance ◽

Network Embedding ◽

Attributed Network ◽

Attribute Information ◽

Low Dimensional ◽

Embedding Methods ◽

Proximity Structure

Recently, network embedding has received a large amount of attention in network analysis. Although some network embedding methods have been developed from different perspectives, on one hand, most of the existing methods only focus on leveraging the plain network structure, ignoring the abundant attribute information of nodes. On the other hand, for some methods integrating the attribute information, only the lower-order proximities (e.g., microscopic proximity structure) are taken into account, which may suffer if there exists the sparsity issue and the attribute information is noisy. To overcome this problem, the attribute information and mesoscopic community structure are utilized. In this article, we propose a novel network embedding method termed Attributed Network Embedding with Micro-Meso structure, which is capable of preserving both the attribute information and the structural information including the microscopic proximity structure and mesoscopic community structure. In particular, both the microscopic proximity structure and node attributes are factorized by Nonnegative Matrix Factorization (NMF), from which the low-dimensional node representations can be obtained. For the mesoscopic community structure, a community membership strength matrix is inferred by a generative model (i.e., BigCLAM) or modularity from the linkage structure, which is then factorized by NMF to obtain the low-dimensional node representations. The three components are jointly correlated by the low-dimensional node representations, from which two objective functions (i.e., ANEM_B and ANEM_M) can be defined. Two efficient alternating optimization schemes are proposed to solve the optimization problems. Extensive experiments have been conducted to confirm the superior performance of the proposed models over the state-of-the-art network embedding methods.

Download Full-text

Network Embedding on Hierarchical Community Structure Network

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3434747 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-23

Author(s):

Guojie Song ◽

Yun Wang ◽

Lun Du ◽

Yi Li ◽

Junshan Wang

Keyword(s):

Community Structure ◽

Structural Information ◽

Spherical Surface ◽

Network Embedding ◽

The Galaxy ◽

Community Information ◽

The Hierarchical Structure ◽

Network Properties ◽

Multi Class Classification ◽

Low Dimensional

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low-dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification, network visualization, and link prediction. The source code of GNE is available online.

Download Full-text

Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods

Information ◽

10.3390/info13010035 ◽

2022 ◽

Vol 13 (1) ◽

pp. 35

Author(s):

Jibouni Ayoub ◽

Dounia Lotfi ◽

Ahmed Hammouch

Keyword(s):

Machine Learning ◽

Link Prediction ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Actual State ◽

The Future ◽

Auc Value ◽

The Mean ◽

Analysis Of Social Networks

The analysis of social networks has attracted a lot of attention during the last two decades. These networks are dynamic: new links appear and disappear. Link prediction is the problem of inferring links that will appear in the future from the actual state of the network. We use information from nodes and edges and calculate the similarity between users. The more users are similar, the higher the probability of their connection in the future will be. The similarity metrics play an important role in the link prediction field. Due to their simplicity and flexibility, many authors have proposed several metrics such as Jaccard, AA, and Katz and evaluated them using the area under the curve (AUC). In this paper, we propose a new parameterized method to enhance the AUC value of the link prediction metrics by combining them with the mean received resources (MRRs). Experiments show that the proposed method improves the performance of the state-of-the-art metrics. Moreover, we used machine learning algorithms to classify links and confirm the efficiency of the proposed combination.

Download Full-text

JANE: Jointly Adversarial Network Embedding

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/192 ◽

2020 ◽

Author(s):

Liang Yang ◽

Yuexue Wang ◽

Junhua Gu ◽

Chuan Wang ◽

Xiaochun Cao ◽

...

Keyword(s):

Link Prediction ◽

Real Data ◽

Semantic Space ◽

Network Embedding ◽

Generative Adversarial Network ◽

Adversarial Learning ◽

Adversarial Network ◽

Node Clustering ◽

Topology Information ◽

Embedding Methods

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).

Download Full-text

Deep learning for predicting disease status using genomic data

10.7287/peerj.preprints.27123 ◽

2018 ◽

Cited By ~ 1

Author(s):

Qianfan Wu ◽

Adel Boueiz ◽

Alican Bozkurt ◽

Arya Masoomi ◽

Allan Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Rapid Development ◽

Learning Algorithms ◽

Genomic Data ◽

Disease Status ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Learning Approach ◽

Low Dimensional

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Download Full-text

Galaxy Network Embedding: A Hierarchical Community Structure Preserving Approach

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/287 ◽

2018 ◽

Cited By ~ 12

Author(s):

Lun Du ◽

Zhicong Lu ◽

Yun Wang ◽

Guojie Song ◽

Yiming Wang ◽

...

Keyword(s):

Community Structure ◽

Structural Information ◽

Spherical Surface ◽

Network Embedding ◽

Structure Preserving ◽

The Galaxy ◽

Community Information ◽

The Hierarchical Structure ◽

Multi Class Classification ◽

Low Dimensional

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.

Download Full-text

Proximity Measures as Graph Convolution Matrices for Link Prediction in Biological Networks

10.1101/2020.11.14.382655 ◽

2020 ◽

Author(s):

Mustafa Coşkun ◽

Mehmet Koyutürk

Keyword(s):

Link Prediction ◽

Similarity Measures ◽

Graph Representation ◽

Supplementary Information ◽

Great Promise ◽

Network Embedding ◽

Common Neighbor ◽

Node Similarity ◽

Topological Characteristics ◽

Low Dimensional

AbstractMotivationLink prediction is an important and well-studied problem in computational biology, with a broad range of applications including disease gene prioritization, drug-disease associations, and drug response in cancer. The general principle in link prediction is to use the topological characteristics and the attributes–if available– of the nodes in the network to predict new links that are likely to emerge/disappear. Recently, graph representation learning methods, which aim to learn a low-dimensional representation of topological characteristics and the attributes of the nodes, have drawn increasing attention to solve the link prediction problem via learnt low-dimensional features. Most prominently, Graph Convolution Network (GCN)-based network embedding methods have demonstrated great promise in link prediction due to their ability of capturing non-linear information of the network. To date, GCN-based network embedding algorithms utilize a Laplacian matrix in their convolution layers as the convolution matrix and the effect of the convolution matrix on algorithm performance has not been comprehensively characterized in the context of link prediction in biomedical networks. On the other hand, for a variety of biomedical link prediction tasks, traditional node similarity measures such as Common Neighbor, Ademic-Adar, and other have shown promising results, and hence there is a need to systematically evaluate the node similarity measures as convolution matrices in terms of their usability and potential to further the state-of-the-art.ResultsWe select 8 representative node similarity measures as convolution matrices within the single-layered GCN graph embedding method and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction. Our experimental results demonstrate that the node similarity-based convolution matrices significantly improves GCN-based embedding algorithms and deserve more attention in the future biomedical link predictionAvailabilityOur method is implemented as a python library and is available at [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Deep learning for predicting disease status using genomic data

10.7287/peerj.preprints.27123v1 ◽

2018 ◽

Author(s):

Qianfan Wu ◽

Adel Boueiz ◽

Alican Bozkurt ◽

Arya Masoomi ◽

Allan Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Rapid Development ◽

Learning Algorithms ◽

Genomic Data ◽

Disease Status ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Learning Approach ◽

Low Dimensional

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Download Full-text

Literal2Feature: An Automatic Scalable RDF Graph Feature Extractor

10.3233/ssw210036 ◽

2021 ◽

Author(s):

Farshad Bakhshandegan Moghaddam ◽

Carsten Draschner ◽

Jens Lehmann ◽

Hajira Jabeen

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Generation ◽

Fully Integrated ◽

Rdf Graph ◽

Big Data Technologies ◽

Rdf Data ◽

Low Dimensional ◽

Rdf Graphs

The last decades have witnessed significant advancements in terms of data generation, management, and maintenance. This has resulted in vast amounts of data becoming available in a variety of forms and formats including RDF. As RDF data is represented as a graph structure, applying machine learning algorithms to extract valuable knowledge and insights from them is not straightforward, especially when the size of the data is enormous. Although Knowledge Graph Embedding models (KGEs) convert the RDF graphs to low-dimensional vector spaces, these vectors often lack the explainability. On the contrary, in this paper, we introduce a generic, distributed, and scalable software framework that is capable of transforming large RDF data into an explainable feature matrix. This matrix can be exploited in many standard machine learning algorithms. Our approach, by exploiting semantic web and big data technologies, is able to extract a variety of existing features by deep traversing a given large RDF graph. The proposed framework is open-source, well-documented, and fully integrated into the active community project Semantic Analytics Stack (SANSA). The experiments on real-world use cases disclose that the extracted features can be successfully used in machine learning tasks like classification and clustering.

Download Full-text