Pathogenic gene prediction based on network embedding

Briefings in Bioinformatics ◽

10.1093/bib/bbaa353 ◽

2020 ◽

Author(s):

Yang Liu ◽

Yuchen Guo ◽

Xiaoyan Liu ◽

Chunyu Wang ◽

Maozu Guo

Keyword(s):

Heterogeneous Network ◽

Large Scale ◽

State Of The Art ◽

Gene Prediction ◽

Data Sets ◽

Network Embedding ◽

Correlation Score ◽

Disease Research ◽

Pathogenic Genes ◽

Pathogenic Gene

Abstract In disease research, the study of gene–disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.

Download Full-text

Large-Scale Multi-View Subspace Clustering in Linear Time

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5867 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4412-4419 ◽

Cited By ~ 3

Author(s):

Zhao Kang ◽

Wangtao Zhou ◽

Zhitong Zhao ◽

Junming Shao ◽

Meng Han ◽

...

Keyword(s):

Large Scale ◽

State Of The Art ◽

Linear Time ◽

Subspace Clustering ◽

Data Sets ◽

Clustering Methods ◽

Single View ◽

Novel Approach ◽

Points Of View ◽

Effectiveness And Efficiency

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.

Download Full-text

A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network

BMC Medical Genomics ◽

10.1186/s12920-019-0627-z ◽

2019 ◽

Vol 12 (S10) ◽

Cited By ~ 1

Author(s):

Bo Xu ◽

Yu Liu ◽

Shuo Yu ◽

Lei Wang ◽

Jie Dong ◽

...

Keyword(s):

Heterogeneous Network ◽

Structural Information ◽

Single Gene ◽

Prediction Method ◽

Experimental Results ◽

Network Embedding ◽

Pathogenic Genes ◽

Phenotype Data ◽

Multiple Paths ◽

Complex Structural

Abstract Background Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes. Methods We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP−network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP−network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype. Results We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. Conclusions We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction.

Download Full-text

SepNE: Bringing Separability to Network Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014261 ◽

2019 ◽

Vol 33 ◽

pp. 4261-4268 ◽

Cited By ~ 2

Author(s):

Ziyao Li ◽

Liang Zhang ◽

Guojie Song

Keyword(s):

Large Scale ◽

State Of The Art ◽

Dynamic Networks ◽

Distributed Learning ◽

Network Embedding ◽

Large Networks ◽

Comparable Accuracy ◽

Large Scale Networks ◽

Low Dimensional ◽

Almost All

Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

10.1101/017830 ◽

2015 ◽

Cited By ~ 10

Author(s):

Stinus Lindgreen ◽

Karen L Adair ◽

Paul Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

State Of The Art ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

High Degree ◽

Realistic Data

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

Unsupervised Detection of Sub-Events in Large Scale Disasters

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5370 ◽

2020 ◽

Vol 34 (01) ◽

pp. 354-361 ◽

Cited By ~ 1

Author(s):

Chidubem Arachie ◽

Manas Gaur ◽

Sam Anzaroot ◽

William Groves ◽

Ke Zhang ◽

...

Keyword(s):

Social Media ◽

Large Scale ◽

State Of The Art ◽

Qualitative Evaluation ◽

Irrelevant Information ◽

Data Sets ◽

Emergency Responders ◽

Learning Framework ◽

Emergency Event ◽

2015 Nepal Earthquake

Social media plays a major role during and after major natural disasters (e.g., hurricanes, large-scale fires, etc.), as people “on the ground” post useful information on what is actually happening. Given the large amounts of posts, a major challenge is identifying the information that is useful and actionable. Emergency responders are largely interested in finding out what events are taking place so they can properly plan and deploy resources. In this paper we address the problem of automatically identifying important sub-events (within a large-scale emergency “event”, such as a hurricane). In particular, we present a novel, unsupervised learning framework to detect sub-events in Tweets for retrospective crisis analysis. We first extract noun-verb pairs and phrases from raw tweets as sub-event candidates. Then, we learn a semantic embedding of extracted noun-verb pairs and phrases, and rank them against a crisis-specific ontology. We filter out noisy and irrelevant information then cluster the noun-verb pairs and phrases so that the top-ranked ones describe the most important sub-events. Through quantitative experiments on two large crisis data sets (Hurricane Harvey and the 2015 Nepal Earthquake), we demonstrate the effectiveness of our approach over the state-of-the-art. Our qualitative evaluation shows better performance compared to our baseline.

Download Full-text

Multipath2vec: Predicting Pathogenic Genes via Heterogeneous Network Embedding

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621103 ◽

2018 ◽

Author(s):

Bo Xu ◽

Yu Liu ◽

Shuo Yu ◽

Lei Wang ◽

Lei Liu ◽

...

Keyword(s):

Heterogeneous Network ◽

Network Embedding ◽

Pathogenic Genes

Download Full-text

Anchor Link Prediction across Attributed Networks via Network Embedding

Entropy ◽

10.3390/e21030254 ◽

2019 ◽

Vol 21 (3) ◽

pp. 254 ◽

Cited By ~ 4

Author(s):

Shaokai Wang ◽

Xutao Li ◽

Yunming Ye ◽

Shanshan Feng ◽

Raymond Lau ◽

...

Keyword(s):

Social Networks ◽

Link Prediction ◽

State Of The Art ◽

User Profiling ◽

Data Sets ◽

Network Embedding ◽

Real World Data ◽

Intrinsic Structure ◽

Multiple Networks ◽

Proposed Model

Presently, many users are involved in multiple social networks. Identifying the same user in different networks, also known as anchor link prediction, becomes an important problem, which can serve numerous applications, e.g., cross-network recommendation, user profiling, etc. Previous studies mainly use hand-crafted structure features, which, if not carefully designed, may fail to reflect the intrinsic structure regularities. Moreover, most of the methods neglect the attribute information of social networks. In this paper, we propose a novel semi-supervised network-embedding model to address the problem. In the model, each node of the multiple networks is represented by a vector for anchor link prediction, which is learnt with awareness of observed anchor links as semi-supervised information, and topology structure and attributes as input. Experimental results on the real-world data sets demonstrate the superiority of the proposed model compared to state-of-the-art techniques.

Download Full-text

1Click1View: Interactive Visualization Methodology for RNAi Cell-Based Microscopic Screening

BioMed Research International ◽

10.1155/2013/156932 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Lukasz Zwolinski ◽

Marta Kozak ◽

Karol Kozak

Keyword(s):

Large Scale ◽

State Of The Art ◽

Interactive Visualization ◽

Image Data ◽

Data Sets ◽

Visualization Method ◽

Multiple Cell ◽

Visualization Tools ◽

Screening Assays ◽

Numeric Data

Technological advancements are constantly increasing the size and complexity of data resulting from large-scale RNA interference screens. This fact has led biologists to ask complex questions, which the existing, fully automated analyses are often not adequate to answer. We present a concept of 1Click1View (1C1V) as a methodology for interactive analytic software tools. 1C1V can be applied for two-dimensional visualization of image-based screening data sets from High Content Screening (HCS). Through an easy-to-use interface, one-click, one-view concept, and workflow based architecture, visualization method facilitates the linking of image data with numeric data. Such method utilizes state-of-the-art interactive visualization tools optimized for fast visualization of large scale image data sets. We demonstrate our method on an HCS dataset consisting of multiple cell features from two screening assays.

Download Full-text

Factor Graph-aggregated Heterogeneous Network Embedding for Disease-gene Association Prediction

10.21203/rs.3.rs-124672/v1 ◽

2020 ◽

Author(s):

Ming He ◽

Chen Huang ◽

Bo Liu ◽

Yadong Wang ◽

Junyi Li

Keyword(s):

Heterogeneous Network ◽

Large Scale ◽

Disease Gene ◽

Heterogeneous Data ◽

Factor Graph ◽

Biological Entity ◽

Network Embedding ◽

Gene Association ◽

Semantic Relationships ◽

Entity Relationship

Abstract Background Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and de-veloping corresponding therapeutic measures. The prediction of dis-ease-gene association by computational methods accelerates the pro-cess.Results Many existing methods cannot fully utilize the multi-dimen-sional biological entity relationship to predict disease-gene association due to multi-source heterogeneous data. This paper proposes Fac-torHNE, a factor graph-aggregated heterogeneous network embedding method for disease-gene association prediction, which captures a vari-ety of semantic relationships between the heterogeneous nodes by fac-torization. It produces different semantic factor graphs and effectively aggregates a variety of semantic relationships, by using end-to-end multi-perspectives loss function to optimize model. Then it produces good nodes embedding to prediction disease-gene association.Conclusions Experimental verification and analysis show FactorHNE has better performance and scalability than the existing models. It also has good interpretability can be extended to large-scale biomedical net-work data analysis.

Download Full-text

Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction

BMC Bioinformatics ◽

10.1186/s12859-021-04099-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ming He ◽

Chen Huang ◽

Bo Liu ◽

Yadong Wang ◽

Junyi Li

Keyword(s):

Heterogeneous Network ◽

Large Scale ◽

Disease Gene ◽

Heterogeneous Data ◽

Factor Graph ◽

Biological Entity ◽

Network Embedding ◽

Gene Association ◽

Semantic Relationships ◽

Entity Relationship

Abstract Background Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process. Results Many existing methods cannot fully utilize the multi-dimensional biological entity relationship to predict disease-gene association due to multi-source heterogeneous data. This paper proposes FactorHNE, a factor graph-aggregated heterogeneous network embedding method for disease-gene association prediction, which captures a variety of semantic relationships between the heterogeneous nodes by factorization. It produces different semantic factor graphs and effectively aggregates a variety of semantic relationships, by using end-to-end multi-perspectives loss function to optimize model. Then it produces good nodes embedding to prediction disease-gene association. Conclusions Experimental verification and analysis show FactorHNE has better performance and scalability than the existing models. It also has good interpretability and can be extended to large-scale biomedical network data analysis.

Download Full-text