Network embedding in biomedical data science

2018 ◽  
Vol 21 (1) ◽  
pp. 182-197 ◽  
Author(s):  
Chang Su ◽  
Jie Tong ◽  
Yongjun Zhu ◽  
Peng Cui ◽  
Fei Wang

AbstractOwning to the rapid development of computer technologies, an increasing number of relational data have been emerging in modern biomedical research. Many network-based learning methods have been proposed to perform analysis on such data, which provide people a deep understanding of topology and knowledge behind the biomedical networks and benefit a lot of applications for human healthcare. However, most network-based methods suffer from high computational and space cost. There remain challenges on handling high dimensionality and sparsity of the biomedical networks. The latest advances in network embedding technologies provide new effective paradigms to solve the network analysis problem. It converts network into a low-dimensional space while maximally preserves structural properties. In this way, downstream tasks such as link prediction and node classification can be done by traditional machine learning methods. In this survey, we conduct a comprehensive review of the literature on applying network embedding to advance the biomedical domain. We first briefly introduce the widely used network embedding models. After that, we carefully discuss how the network embedding approaches were performed on biomedical networks as well as how they accelerated the downstream tasks in biomedical science. Finally, we discuss challenges the existing network embedding applications in biomedical domains are faced with and suggest several promising future directions for a better improvement in human healthcare.

Author(s):  
Yang Fang ◽  
Xiang Zhao ◽  
Zhen Tan

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.


2021 ◽  
pp. 1-12
Author(s):  
JinFang Sheng ◽  
Huaiyu Zuo ◽  
Bin Wang ◽  
Qiong Li

 In a complex network system, the structure of the network is an extremely important element for the analysis of the system, and the study of community detection algorithms is key to exploring the structure of the complex network. Traditional community detection algorithms would represent the network using an adjacency matrix based on observations, which may contain redundant information or noise that interferes with the detection results. In this paper, we propose a community detection algorithm based on density clustering. In order to improve the performance of density clustering, we consider an algorithmic framework for learning the continuous representation of network nodes in a low-dimensional space. The network structure is effectively preserved through network embedding, and density clustering is applied in the embedded low-dimensional space to compute the similarity of nodes in the network, which in turn reveals the implied structure in a given network. Experiments show that the algorithm has superior performance compared to other advanced community detection algorithms for real-world networks in multiple domains as well as synthetic networks, especially when the network data chaos is high.


Author(s):  
Diana Mateus ◽  
Christian Wachinger ◽  
Selen Atasoy ◽  
Loren Schwarz ◽  
Nassir Navab

Computer aided diagnosis is often confronted with processing and analyzing high dimensional data. One alternative to deal with such data is dimensionality reduction. This chapter focuses on manifold learning methods to create low dimensional data representations adapted to a given application. From pairwise non-linear relations between neighboring data-points, manifold learning algorithms first approximate the low dimensional manifold where data lives with a graph; then, they find a non-linear map to embed this graph into a low dimensional space. Since the explicit pairwise relations and the neighborhood system can be designed according to the application, manifold learning methods are very flexible and allow easy incorporation of domain knowledge. The authors describe different assumptions and design elements that are crucial to building successful low dimensional data representations with manifold learning for a variety of applications. In particular, they discuss examples for visualization, clustering, classification, registration, and human-motion modeling.


2018 ◽  
Vol 1 (1) ◽  
pp. 181-205 ◽  
Author(s):  
Pierre Baldi

Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.


Author(s):  
Yuanfu Lu ◽  
Chuan Shi ◽  
Linmei Hu ◽  
Zhiyuan Liu

Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.


2017 ◽  
Vol 29 (4) ◽  
pp. 1053-1102 ◽  
Author(s):  
Hossein Soleimani ◽  
David J. Miller

Many classification tasks require both labeling objects and determining label associations for parts of each object. Example applications include labeling segments of images or determining relevant parts of a text document when the training labels are available only at the image or document level. This task is usually referred to as multi-instance (MI) learning, where the learner typically receives a collection of labeled (or sometimes unlabeled) bags, each containing several segments (instances). We propose a semisupervised MI learning method for multilabel classification. Most MI learning methods treat instances in each bag as independent and identically distributed samples. However, in many practical applications, instances are related to each other and should not be considered independent. Our model discovers a latent low-dimensional space that captures structure within each bag. Further, unlike many other MI learning methods, which are primarily developed for binary classification, we model multiple classes jointly, thus also capturing possible dependencies between different classes. We develop our model within a semisupervised framework, which leverages both labeled and, typically, a larger set of unlabeled bags for training. We develop several efficient inference methods for our model. We first introduce a Markov chain Monte Carlo method for inference, which can handle arbitrary relations between bag labels and instance labels, including the standard hard-max MI assumption. We also develop an extension of our model that uses stochastic variational Bayes methods for inference, and thus scales better to massive data sets. Experiments show that our approach outperforms several MI learning and standard classification methods on both bag-level and instance-level label prediction. All code for replicating our experiments is available from https://github.com/hsoleimani/MLTM .


Author(s):  
Daokun Zhang ◽  
Jie Yin ◽  
Xingquan Zhu ◽  
Chengqi Zhang

This paper addresses social network embedding, which aims to embed social network nodes, including user profile information, into a latent low-dimensional space. Most of the existing works on network embedding only consider network structure, but ignore user-generated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user profile information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Profile Preserving Social Network Embedding (UPP-SNE), which incorporates user profile with network structure to jointly learn a vector representation of a social network. The theme of UPP-SNE is to embed user profile information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four real-world social networks show that compared to state-of-the-art baselines, our method learns better social network representations and achieves substantial performance gains in node classification and clustering tasks.


Author(s):  
Hong Huang ◽  
Zixuan Fang ◽  
Xiao Wang ◽  
Youshan Miao ◽  
Hai Jin

Network embedding, mapping nodes in a network to a low-dimensional space, achieves powerful performance. An increasing number of works focus on static network embedding, however, seldom attention has been paid to temporal network embedding, especially without considering the effect of mesoscopic dynamics when the network evolves. In light of this, we concentrate on a particular motif --- triad --- and its temporal dynamics, to study the temporal network embedding. Specifically, we propose MTNE, a novel embedding model for temporal networks. MTNE not only integrates the Hawkes process to stimulate the triad evolution process that preserves motif-aware high-order proximities, but also combines attention mechanism to distinguish the importance of different types of triads better. Experiments on various real-world temporal networks demonstrate that, compared with several state-of-the-art methods, our model achieves the best performance in both static and dynamic tasks, including node classification, link prediction, and link recommendation.


2017 ◽  
Vol 25 (1) ◽  
pp. 25-31 ◽  
Author(s):  
Weiyi Xia ◽  
Zhiyu Wan ◽  
Zhijun Yin ◽  
James Gaupp ◽  
Yongtai Liu ◽  
...  

Abstract Objective Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient’s data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.


2021 ◽  
Vol 11 (1) ◽  
pp. 12-20
Author(s):  
Phatpicha Yochum ◽  
◽  
Liang Chang ◽  
Tianlong Gu ◽  
Manli Zhu

With the rapid development of Internet, various unstructured information, such as user-generated content, textual reviews, and implicit or explicit feedbacks have grown continuously. Though structured knowledge bases (KBs) which consist of a large number of triples exhibit great advantages in recommendation field recently. In this paper, we propose a novel approach to learn sentiment over network embedding for recommendation system based on the knowledge graph which we have been built, that is, we integrate the network embedding method with the sentiment of user reviews. Specifically, we use the typical network embedding method node2vec to embed the large-scale structured data into a low-dimensional vector space to capture the internal semantic information of users and attractions and apply the user weight scoring which is the combination of user review ratings and textual reviews to get similar attractions among users. Experimental results on real-world dataset verified the superior recommendation performance on precision, recall, and F-measure of our approach compared with state-of-the-art baselines.


Sign in / Sign up

Export Citation Format

Share Document