Network embedding in biomedical data science

Chang Su; Jie Tong; Yongjun Zhu; Peng Cui; Fei Wang

doi:10.1093/bib/bby117

Network embedding in biomedical data science

Briefings in Bioinformatics ◽

10.1093/bib/bby117 ◽

2018 ◽

Vol 21 (1) ◽

pp. 182-197 ◽

Cited By ~ 17

Author(s):

Chang Su ◽

Jie Tong ◽

Yongjun Zhu ◽

Peng Cui ◽

Fei Wang

Keyword(s):

Data Science ◽

Dimensional Space ◽

Rapid Development ◽

Deep Understanding ◽

Biomedical Science ◽

Biomedical Data ◽

Network Embedding ◽

Learning Methods ◽

Low Dimensional ◽

Human Healthcare

AbstractOwning to the rapid development of computer technologies, an increasing number of relational data have been emerging in modern biomedical research. Many network-based learning methods have been proposed to perform analysis on such data, which provide people a deep understanding of topology and knowledge behind the biomedical networks and benefit a lot of applications for human healthcare. However, most network-based methods suffer from high computational and space cost. There remain challenges on handling high dimensionality and sparsity of the biomedical networks. The latest advances in network embedding technologies provide new effective paradigms to solve the network analysis problem. It converts network into a low-dimensional space while maximally preserves structural properties. In this way, downstream tasks such as link prediction and node classification can be done by traditional machine learning methods. In this survey, we conduct a comprehensive review of the literature on applying network embedding to advance the biomedical domain. We first briefly introduce the widely used network embedding models. After that, we carefully discuss how the network embedding approaches were performed on biomedical networks as well as how they accelerated the downstream tasks in biomedical science. Finally, we discuss challenges the existing network embedding applications in biomedical domains are faced with and suggest several promising future directions for a better improvement in human healthcare.

Download Full-text

Network Embedding via a Bi-Mode and Deep Neural Network Model

10.20944/preprints201712.0156.v1 ◽

2017 ◽

Author(s):

Yang Fang ◽

Xiang Zhao ◽

Zhen Tan

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Semantic Information ◽

Dimensional Space ◽

Relation Extraction ◽

Network Embedding ◽

Structure Information ◽

Second Mode ◽

Real World Datasets ◽

Low Dimensional

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.

Download Full-text

Community detection in complex network by network embedding and density clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202961 ◽

2021 ◽

pp. 1-12

Author(s):

JinFang Sheng ◽

Huaiyu Zuo ◽

Bin Wang ◽

Qiong Li

Keyword(s):

Complex Network ◽

Community Detection ◽

Dimensional Space ◽

Detection Algorithm ◽

Superior Performance ◽

Network Embedding ◽

Detection Algorithms ◽

Density Clustering ◽

Community Detection Algorithm ◽

Low Dimensional

In a complex network system, the structure of the network is an extremely important element for the analysis of the system, and the study of community detection algorithms is key to exploring the structure of the complex network. Traditional community detection algorithms would represent the network using an adjacency matrix based on observations, which may contain redundant information or noise that interferes with the detection results. In this paper, we propose a community detection algorithm based on density clustering. In order to improve the performance of density clustering, we consider an algorithmic framework for learning the continuous representation of network nodes in a low-dimensional space. The network structure is effectively preserved through network embedding, and density clustering is applied in the embedded low-dimensional space to compute the similarity of nodes in the network, which in turn reveals the implied structure in a given network. Experiments show that the algorithm has superior performance compared to other advanced community detection algorithms for real-world networks in multiple domains as well as synthetic networks, especially when the network data chaos is high.

Download Full-text

Learning Manifolds

Machine Learning in Computer-Aided Diagnosis - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-0059-1.ch018 ◽

2012 ◽

pp. 374-402

Author(s):

Diana Mateus ◽

Christian Wachinger ◽

Selen Atasoy ◽

Loren Schwarz ◽

Nassir Navab

Keyword(s):

Manifold Learning ◽

Domain Knowledge ◽

Dimensional Space ◽

Human Motion ◽

Motion Modeling ◽

Learning Methods ◽

Data Representations ◽

Non Linear ◽

Data Points ◽

Low Dimensional

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.

Download Full-text

Deep Learning in Biomedical Data Science

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-080917-013343 ◽

2018 ◽

Vol 1 (1) ◽

pp. 181-205 ◽

Cited By ~ 27

Author(s):

Pierre Baldi

Keyword(s):

Health Care ◽

Deep Learning ◽

Spatial Scale ◽

Biomedical Imaging ◽

Data Science ◽

Black Box ◽

Biomedical Data ◽

Learning Methods

Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.

Download Full-text

Relation Structure-Aware Heterogeneous Information Network Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014456 ◽

2019 ◽

Vol 33 ◽

pp. 4456-4463 ◽

Cited By ~ 8

Author(s):

Yuanfu Lu ◽

Chuan Shi ◽

Linmei Hu ◽

Zhiyuan Liu

Keyword(s):

Real World ◽

Dimensional Space ◽

Structural Characteristics ◽

Information Network ◽

Network Embedding ◽

Heterogeneous Information Network ◽

Heterogeneous Information ◽

Real World Datasets ◽

Low Dimensional ◽

Embedding Methods

Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.

Download Full-text

Semisupervised, Multilabel, Multi-Instance Learning for Structured Data

Neural Computation ◽

10.1162/neco_a_00939 ◽

2017 ◽

Vol 29 (4) ◽

pp. 1053-1102 ◽

Cited By ~ 1

Author(s):

Hossein Soleimani ◽

David J. Miller

Keyword(s):

Dimensional Space ◽

Binary Classification ◽

Data Sets ◽

Learning Methods ◽

Practical Applications ◽

Bayes Methods ◽

Text Document ◽

Standard Classification ◽

Low Dimensional ◽

Inference Methods

Many classification tasks require both labeling objects and determining label associations for parts of each object. Example applications include labeling segments of images or determining relevant parts of a text document when the training labels are available only at the image or document level. This task is usually referred to as multi-instance (MI) learning, where the learner typically receives a collection of labeled (or sometimes unlabeled) bags, each containing several segments (instances). We propose a semisupervised MI learning method for multilabel classification. Most MI learning methods treat instances in each bag as independent and identically distributed samples. However, in many practical applications, instances are related to each other and should not be considered independent. Our model discovers a latent low-dimensional space that captures structure within each bag. Further, unlike many other MI learning methods, which are primarily developed for binary classification, we model multiple classes jointly, thus also capturing possible dependencies between different classes. We develop our model within a semisupervised framework, which leverages both labeled and, typically, a larger set of unlabeled bags for training. We develop several efficient inference methods for our model. We first introduce a Markov chain Monte Carlo method for inference, which can handle arbitrary relations between bag labels and instance labels, including the standard hard-max MI assumption. We also develop an extension of our model that uses stochastic variational Bayes methods for inference, and thus scales better to massive data sets. Experiments show that our approach outperforms several MI learning and standard classification methods on both bag-level and instance-level label prediction. All code for replicating our experiments is available from https://github.com/hsoleimani/MLTM .

Download Full-text

User Profile Preserving Social Network Embedding

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/472 ◽

2017 ◽

Cited By ~ 29

Author(s):

Daokun Zhang ◽

Jie Yin ◽

Xingquan Zhu ◽

Chengqi Zhang

Keyword(s):

Social Networks ◽

Social Network ◽

Network Structure ◽

Dimensional Space ◽

User Profile ◽

Network Embedding ◽

Network Nodes ◽

Profile Information ◽

Low Dimensional ◽

Performance Gains

This paper addresses social network embedding, which aims to embed social network nodes, including user profile information, into a latent low-dimensional space. Most of the existing works on network embedding only consider network structure, but ignore user-generated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user profile information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Profile Preserving Social Network Embedding (UPP-SNE), which incorporates user profile with network structure to jointly learn a vector representation of a social network. The theme of UPP-SNE is to embed user profile information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four real-world social networks show that compared to state-of-the-art baselines, our method learns better social network representations and achieves substantial performance gains in node classification and clustering tasks.

Download Full-text

Motif-Preserving Temporal Network Embedding

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/172 ◽

2020 ◽

Author(s):

Hong Huang ◽

Zixuan Fang ◽

Xiao Wang ◽

Youshan Miao ◽

Hai Jin

Keyword(s):

Link Prediction ◽

Temporal Dynamics ◽

Dimensional Space ◽

Temporal Network ◽

Hawkes Process ◽

Temporal Networks ◽

Network Embedding ◽

Different Types ◽

Low Dimensional ◽

Link Recommendation

Network embedding, mapping nodes in a network to a low-dimensional space, achieves powerful performance. An increasing number of works focus on static network embedding, however, seldom attention has been paid to temporal network embedding, especially without considering the effect of mesoscopic dynamics when the network evolves. In light of this, we concentrate on a particular motif --- triad --- and its temporal dynamics, to study the temporal network embedding. Specifically, we propose MTNE, a novel embedding model for temporal networks. MTNE not only integrates the Hawkes process to stimulate the triad evolution process that preserves motif-aware high-order proximities, but also combines attention mechanism to distinguish the importance of different types of triads better. Experiments on various real-world temporal networks demonstrate that, compared with several state-of-the-art methods, our model achieves the best performance in both static and dynamic tasks, including node classification, link prediction, and link recommendation.

Download Full-text

It’s all in the timing: calibrating temporal penalties for biomedical data sharing

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx101 ◽

2017 ◽

Vol 25 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Weiyi Xia ◽

Zhiyu Wan ◽

Zhijun Yin ◽

James Gaupp ◽

Yongtai Liu ◽

...

Keyword(s):

Data Sharing ◽

Data Science ◽

Data Use ◽

Policy Implications ◽

Biomedical Science ◽

Impact Factors ◽

Biomedical Data ◽

Data Usage ◽

The Impact ◽

Over Time

Abstract Objective Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient’s data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.

Download Full-text

Learning Sentiment over Network Embedding for Recommendation System

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.1.1008 ◽

2021 ◽

Vol 11 (1) ◽

pp. 12-20

Author(s):

Phatpicha Yochum ◽

◽

Liang Chang ◽

Tianlong Gu ◽

Manli Zhu

Keyword(s):

Large Scale ◽

Recommendation System ◽

Rapid Development ◽

Knowledge Bases ◽

Network Embedding ◽

Embedding Method ◽

Novel Approach ◽

Structured Knowledge ◽

Low Dimensional ◽

Unstructured Information

With the rapid development of Internet, various unstructured information, such as user-generated content, textual reviews, and implicit or explicit feedbacks have grown continuously. Though structured knowledge bases (KBs) which consist of a large number of triples exhibit great advantages in recommendation field recently. In this paper, we propose a novel approach to learn sentiment over network embedding for recommendation system based on the knowledge graph which we have been built, that is, we integrate the network embedding method with the sentiment of user reviews. Specifically, we use the typical network embedding method node2vec to embed the large-scale structured data into a low-dimensional vector space to capture the internal semantic information of users and attractions and apply the user weight scoring which is the combination of user review ratings and textual reviews to get similar attractions among users. Experimental results on real-world dataset verified the superior recommendation performance on precision, recall, and F-measure of our approach compared with state-of-the-art baselines.

Download Full-text