scholarly journals Network Inference with Granger Causality Ensembles on Single-Cell Transcriptomic Data

2019 ◽  
Author(s):  
Atul Deshpande ◽  
Li-Fang Chu ◽  
Ron Stewart ◽  
Anthony Gitter

AbstractAdvances in single-cell transcriptomics enable measuring the gene expression of individual cells, allowing cells to be ordered by their state in a dynamic biological process. Many algorithms assign ‘pseudotimes’ to each cell, representing the progress along the biological process. Ordering the expression data according to such pseudotimes can be valuable for understanding the underlying regulator-gene interactions in a biological process, such as differentiation. However, the distribution of cells sampled along a transitional process, and hence that of the pseudotimes assigned to them, is not uniform. This prevents using many standard mathematical methods for analyzing the ordered gene expression states. We present Single-Cell Inference of Networks using Granger Ensembles (SCINGE), an algorithm for gene regulatory network inference from single-cell gene expression data. Given ordered single-cell data, SCINGE uses kernel-based Granger Causality regression, which smooths the irregular pseudotimes and missing expression values. It then aggregates the predictions from an ensemble of regression analyses with a modified Borda method to compile a ranked list of candidate interactions between transcriptional regulators and their target genes. In two mouse embryonic stem cell differentiation case studies, SCINGE outperforms other contemporary algorithms for gene network reconstruction. However, a more detailed examination reveals caveats about transcriptional network reconstruction with single-cell RNA-seq data. Network inference methods, including SCINGE, may have near random performance for predicting the targets of many individual regulators even if the aggregate performance is good. In addition, in some cases including cells’ pseudotime values can hurt the performance of network reconstruction methods. A MATLAB implementation of SCINGE is available at https://github.com/gitter-lab/SCINGE.

2017 ◽  
Author(s):  
Princy Parsana ◽  
Claire Ruberman ◽  
Andrew E. Jaffe ◽  
Michael C. Schatz ◽  
Alexis Battle ◽  
...  

AbstractBackgroundGene co-expression networks capture diverse biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanisms. Functional interactions between genes have not been fully characterized for most organisms, and therefore reconstruction of gene co-expression networks has been of common interest in a variety of settings. However, methods routinely used for reconstruction of gene co-expression networks do not account for confounding artifacts known to affect high dimensional gene expression measurements.ResultsIn this study, we show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. Both theoretically and empirically, we demonstrate that removing the effects of top principal components from gene expression measurements prior to network inference can reduce false discoveries, especially when well annotated technical covariates are not available. Using expression data from the GTEx project in multiple tissues and hundreds of individuals, we show that this latent factor residualization approach often reduces false discoveries in the reconstructed networks.ConclusionNetwork reconstruction is susceptible to confounders that affect measurements of gene expression. Even controlling for major individual known technical covariates fails to fully eliminate confounding variation from the data. In studies where a wide range of annotated technical factors are measured and available, correcting gene expression data with multiple covariates can also improve network reconstruction, but such extensive annotations are not always available. Our study shows that principal component correction, which does not depend on study design or annotation of all relevant confounders, removes patterns of artifactual variation and improves network reconstruction in both simulated data, and gene expression data from GTEx project. We have implemented our PC correction approach in the Bioconductor package sva which can be used prior to network reconstruction with a range of methods.


2019 ◽  
Author(s):  
Junil Kim ◽  
Simon Toftholm Jakobsen ◽  
Kedar Nath Natarajan ◽  
Kyoung Jae Won

ABSTRACTGene expression data has been widely used to infer gene regulatory networks (GRNs). Recent single-cell RNA sequencing (scRNAseq) data, containing the expression information of the individual cells (or status), are highly useful in blindly reconstructing regulatory mechanisms. However, it is still not easy to understand transcriptional cascade from large amount of expression data. Besides, the reconstructed networks may not capture the major regulatory rules.Here, we propose a novel approach called TENET to reconstruct the GRNs from scRNAseq data by calculating causal relationships between genes using transfer entropy (TE). We show that known target genes have significantly higher TE values. Genes with higher TE values were more affected by various perturbations. Comprehensive benchmarking showed that TENET outperformed other GRN prediction algorithms. More importantly, TENET is uniquely capable of identifying key regulators. Applying TENET to scRNAseq during embryonic stem cell differentiation to neural cells, we show that Nme2 is a critical factor for 2i condition specific stem cell self-renewal.


2018 ◽  
Author(s):  
Arnaud Bonnaffoux ◽  
Ulysse Herbach ◽  
Angélique Richard ◽  
Anissa Guillemin ◽  
Sandrine Giraud ◽  
...  

AbstractInference of gene regulatory networks from gene expression data has been a long-standing and notoriously difficult task in systems biology. Recently, single-cell transcriptomic data have been massively used for gene regulatory network inference, with both successes and limitations. In the present work we propose an iterative algorithm called WASABI, dedicated to inferring a causal dynamical network from time-stamped single-cell data, which tackles some of the limitations associated with current approaches. We first introduce the concept of waves, which posits that the information provided by an external stimulus will affect genes one-by-one through a cascade, like waves spreading through a network. This concept allows us to infer the network one gene at a time, after genes have been ordered regarding their time of regulation. We then demonstrate the ability of WASABI to correctly infer small networks, which have been simulated in silico using a mechanistic model consisting of coupled piecewise-deterministic Markov processes for the proper description of gene expression at the single-cell level. We finally apply WASABI on in vitro generated data on an avian model of erythroid differentiation. The structure of the resulting gene regulatory network sheds a fascinating new light on the molecular mechanisms controlling this process. In particular, we find no evidence for hub genes and a much more distributed network structure than expected. Interestingly, we find that a majority of genes are under the direct control of the differentiation-inducing stimulus. In conclusion, WASABI is a versatile algorithm which should help biologists to fully exploit the power of time-stamped single-cell data.


2016 ◽  
Author(s):  
Thalia E. Chan ◽  
Michael P.H. Stumpf ◽  
Ann C. Babtie

AbstractWhile single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell data sets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available here:https://github.com/Tchanders/network_inference_tutorials. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data.


2014 ◽  
Author(s):  
Mahdi Zamanighomi ◽  
Mostafa Zamanian ◽  
Michael J Kimber ◽  
Zhengdao Wang

The reconstruction of gene regulatory networks from gene expression data has been the subject of intense research activity. A variety of models and methods have been developed to address different aspects of this important problem. However, these techniques are often difficult to scale, are narrowly focused on particular biological and experimental platforms, and require experimental data that are typically unavailable and difficult to ascertain. The more recent availability of higher-throughput sequencing platforms, combined with more precise modes of genetic perturbation, present an opportunity to formulate more robust and comprehensive approaches to gene network inference. Here, we propose a step-wise framework for identifying gene-gene regulatory interactions that expand from a known point of genetic or chemical perturbation using time series gene expression data. This novel approach sequentially identifies non-steady state genes post-perturbation and incorporates them into a growing series of low-complexity optimization problems. The governing ordinary differential equations of this model are rooted in the biophysics of stochastic molecular events that underlie gene regulation, delineating roles for both protein and RNA-mediated gene regulation. We show the successful application of our core algorithms for network inference using simulated and real datasets.


Sign in / Sign up

Export Citation Format

Share Document