scholarly journals A Network of Networks Approach for Modeling Interconnected Brain Tissue-Specific Networks

2018 ◽  
Author(s):  
Hideko Kawakubo ◽  
Yusuke Matsui ◽  
Itaru Kushima ◽  
Norio Ozaki ◽  
Teppei Shimamura

AbstractMotivationRecent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits, and regions which are connected each other and affect behavior and cognitive processes.ResultsWe introduce a novel and efficient framework, called a “Network of Networks” (NoN) approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning (GOSPEL), a new sparse structural learning algorithm for network graph data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that GOSPEL outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the NoN model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain.AvailabilityOur software is available from https://github.com/infinite-point/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (17) ◽  
pp. 3092-3101 ◽  
Author(s):  
Hideko Kawakubo ◽  
Yusuke Matsui ◽  
Itaru Kushima ◽  
Norio Ozaki ◽  
Teppei Shimamura

Abstract Motivation Recent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits and regions which are connected each other and affect behavior and cognitive processes. Results We introduce a novel and efficient framework, called a ‘Network of Networks’ approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning, a new sparse structural learning algorithm for network data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that propose Graph-Oriented SParsE Learning outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the ‘Network of Networks’ model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain. Availability and implementation Our software is available from https://github.com/infinite-point/GOSPEL. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i831-i839
Author(s):  
Dong-gi Lee ◽  
Myungjun Kim ◽  
Sang Joon Son ◽  
Chang Hyung Hong ◽  
Hyunjung Shin

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


2020 ◽  
Vol 36 (19) ◽  
pp. 4894-4901 ◽  
Author(s):  
Yi Shi ◽  
Zehua Guo ◽  
Xianbin Su ◽  
Luming Meng ◽  
Mingxuan Zhang ◽  
...  

Abstract Motivation The mutations of cancers can encode the seeds of their own destruction, in the form of T-cell recognizable immunogenic peptides, also known as neoantigens. It is computationally challenging, however, to accurately prioritize the potential neoantigen candidates according to their ability of activating the T-cell immunoresponse, especially when the somatic mutations are abundant. Although a few neoantigen prioritization methods have been proposed to address this issue, advanced machine learning model that is specifically designed to tackle this problem is still lacking. Moreover, none of the existing methods considers the original DNA loci of the neoantigens in the perspective of 3D genome which may provide key information for inferring neoantigens’ immunogenicity. Results In this study, we discovered that DNA loci of the immunopositive and immunonegative MHC-I neoantigens have distinct spatial distribution patterns across the genome. We therefore used the 3D genome information along with an ensemble pMHC-I coding strategy, and developed a group feature selection-based deep sparse neural network model (DNN-GFS) that is optimized for neoantigen prioritization. DNN-GFS demonstrated increased neoantigen prioritization power comparing to existing sequence-based approaches. We also developed a webserver named deepAntigen (http://yishi.sjtu.edu.cn/deepAntigen) that implements the DNN-GFS as well as other machine learning methods. We believe that this work provides a new perspective toward more accurate neoantigen prediction which eventually contribute to personalized cancer immunotherapy. Availability and implementation Data and implementation are available on webserver: http://yishi.sjtu.edu.cn/deepAntigen. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 25 (14) ◽  
pp. 1540034 ◽  
Author(s):  
Wei Gao ◽  
Linli Zhu ◽  
Kaiyun Wang

Ontology, a model of knowledge representation and storage, has had extensive applications in pharmaceutics, social science, chemistry and biology. In the age of “big data”, the constructed concepts are often represented as higher-dimensional data by scholars, and thus the sparse learning techniques are introduced into ontology algorithms. In this paper, based on the alternating direction augmented Lagrangian method, we present an ontology optimization algorithm for ontological sparse vector learning, and a fast version of such ontology technologies. The optimal sparse vector is obtained by an iterative procedure, and the ontology function is then obtained from the sparse vector. Four simulation experiments show that our ontological sparse vector learning model has a higher precision ratio on plant ontology, humanoid robotics ontology, biology ontology and physics education ontology data for similarity measuring and ontology mapping applications.


2021 ◽  
Author(s):  
Guisheng Wang

<div>Sparse approximation is critical to the applications of signal or image processing, and it is conducive to estimate the sparse signals with the joint efforts of transformation analysis. In this study, a simultaneous Bayesian framework was extended for sparse approximation by structured shared support, and a simultaneous sparse learning algorithm of structured approximation (SSL-SA) is proposed with transformation analysis which leads to the feasible solutions more sensibly. Then the improvements of sparse Bayesian learning and iterative reweighting were embedded in the framework to achieve speedy convergence as well as high efficiency with robustness. Furthermore, the iterative optimization and transformation analysis were embedded in the overall learning process to obtain the relative optima for sparse approximation. Finally, compared to conventional reweighting algorithms for simultaneous sparse models with l1 and l2, simulation results present the preponderance of the proposed approach to solve the sparse structure and iterative redundancy in processing sparse signals. The fact indicates that proposed method will be effective to sparsely approximate the various signals and images, which does accurately analyse the target in optimal transformation. It is envisaged that the proposed model could be suitable for a wide range of data in sparse separation and signal denosing.</div>


2020 ◽  
Vol 36 (16) ◽  
pp. 4440-4448 ◽  
Author(s):  
Zhenqin Wu ◽  
Nilah M Ioannidis ◽  
James Zou

Abstract Summary Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. Availability and implementation Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i417-i426
Author(s):  
Assya Trofimov ◽  
Joseph Paul Cohen ◽  
Yoshua Bengio ◽  
Claude Perreault ◽  
Sébastien Lemieux

Abstract Motivation The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way disease is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Dimensionality reduction methods help with this task by extracting patterns from the data by compressing it into compact vector representations. Results We present the factorized embeddings (FE) model, a self-supervised deep learning algorithm that learns simultaneously, by tensor factorization, gene and sample representation spaces. We ran the model on RNA-Seq data from two large-scale cohorts and observed that the sample representation captures information on single gene and global gene expression patterns. Moreover, we found that the gene representation space was organized such that tissue-specific genes, highly correlated genes as well as genes participating in the same GO terms were grouped. Finally, we compared the vector representation of samples learned by the FE model to other similar models on 49 regression tasks. We report that the representations trained with FE rank first or second in all of the tasks, surpassing, sometimes by a considerable margin, other representations. Availability and implementation A toy example in the form of a Jupyter Notebook as well as the code and trained embeddings for this project can be found at: https://github.com/TrofimovAssya/FactorizedEmbeddings. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document