scholarly journals Causal Network Inference from Gene Transcriptional Time Series Response to Glucocorticoids

2019 ◽  
Author(s):  
Jonathan Lu ◽  
Bianca Dumitrascu ◽  
Ian C. McDowell ◽  
Brian Jo ◽  
Alejandro Barrera ◽  
...  

AbstractGene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately paving the way for regulatory network re-engineering. Network inference from transcriptional time series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance but additionally infers whether the causal effects are activating or inhibitory. We apply BETS to transcriptional time series data of 2, 768 differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2, 768 genes and 31, 945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is freely available as an open source software package athttps://github.com/lujonathanh/BETS.

2021 ◽  
Vol 17 (1) ◽  
pp. e1008223
Author(s):  
Jonathan Lu ◽  
Bianca Dumitrascu ◽  
Ian C. McDowell ◽  
Brian Jo ◽  
Alejandro Barrera ◽  
...  

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.


2004 ◽  
Vol 02 (04) ◽  
pp. 765-783 ◽  
Author(s):  
GUILLAUME BOURQUE ◽  
DAVID SANKOFF

We present a method for gene network inference and revision based on time-series data. Gene networks are modeled using linear differential equations and a generalized stepwise multiple linear regression procedure is used to recover the interaction coefficients. Our system is designed for the recovery of gene interactions concurrently in many gene regulatory networks related by a tree or a more general graph. We show how this comparative framework can facilitate the recovery of the networks and improve the quality of the solutions inferred.


Author(s):  
Jose Eduardo H. da Silva ◽  
Heder S. Betnardino ◽  
Helio J.C. Barbosa ◽  
Alex B. Vieira ◽  
Luciana C.D. Campos ◽  
...  

2016 ◽  
Vol 26 (4) ◽  
pp. 043102 ◽  
Author(s):  
E. Bianco-Martinez ◽  
N. Rubido ◽  
Ch. G. Antonopoulos ◽  
M. S. Baptista

2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


2018 ◽  
Vol 115 (9) ◽  
pp. 2252-2257 ◽  
Author(s):  
Justin D. Finkle ◽  
Jia J. Wu ◽  
Neda Bagheri

Accurate inference of regulatory networks from experimental data facilitates the rapid characterization and understanding of biological systems. High-throughput technologies can provide a wealth of time-series data to better interrogate the complex regulatory dynamics inherent to organisms, but many network inference strategies do not effectively use temporal information. We address this limitation by introducing Sliding Window Inference for Network Generation (SWING), a generalized framework that incorporates multivariate Granger causality to infer network structure from time-series data. SWING moves beyond existing Granger methods by generating windowed models that simultaneously evaluate multiple upstream regulators at several potential time delays. We demonstrate that SWING elucidates network structure with greater accuracy in both in silico and experimentally validated in vitro systems. We estimate the apparent time delays present in each system and demonstrate that SWING infers time-delayed, gene–gene interactions that are distinct from baseline methods. By providing a temporal framework to infer the underlying directed network topology, SWING generates testable hypotheses for gene–gene influences.


2019 ◽  
Vol 125 ◽  
pp. 357-363 ◽  
Author(s):  
Zhihong Zhang ◽  
Genzhou Zhang ◽  
Zhonghao Zhang ◽  
Guo Chen ◽  
Yangbin Zeng ◽  
...  

2020 ◽  
Author(s):  
Sachin Heerah ◽  
Roberto Molinari ◽  
Stéphane Guerrier ◽  
Amy Marshall-Colon

AbstractMotivationIdentification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signaling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal.ResultsThis work delivers a statistical approach for modelling irregularly sampled bivariate signals which embeds functions from the domain of engineering that allow to adapt the model’s dependence structure to the specific sampling time. Using Maximum-Likelihood to estimate the parameters of this model for each bivariate time series, it is then possible to use bootstrap procedures for small samples (or asymptotics for large samples) in order to test for Granger-Causality. When applied to the Arabidopsis thaliana data, the proposed approach produced 3,078 significant interactions, in which 2,012 interactions have root causal genes and 1,066 interactions have shoot causal genes. Many of the predicted causal and target genes are known players in local and long-distance nitrogen signaling, including genes encoding transcription factors, hormones, and signaling peptides. Of the 1,007 total causal genes (either organ), 384 are either known or predicted mobile transcripts, suggesting that the identified causal genes may be directly involved in long-distance nitrogen signaling through intercellular interactions. The model predictions and subsequent network analysis identified nitrogen-responsive genes that can be further tested for their specific roles in long-distance nitrogen signaling.AvailabilityThe method was developed with the R statistical software and is made available thorugh the R package “irg” hosted on the GitHub repository https://github.com/SMAC-Group/irg. A sample data set is made available as an example to apply the method and the complete Arabidopsis thaliana data can be found at: https://www.ncbi.nlm.nih.gov/geo/query/[email protected]


Sign in / Sign up

Export Citation Format

Share Document