scholarly journals Identification of Gene Regulation Models from Single-Cell Data

2017 ◽  
Author(s):  
Lisa Weber ◽  
William Raymond ◽  
Brian Munsky

AbstractIn quantitative analyses of biological processes, one may use many different scales of models (e.g., spatial or non-spatial, deterministic or stochastic, time-varying or at steady-state) or many different approaches to match models to experimental data (e.g., model fitting or parameter uncertainty/sloppiness quantification with different experiment designs). These different analyses can lead to surprisingly different results, even when applied to the same data and the same model. We use a simplified gene regulation model to illustrate many of these concerns, especially for ODE analyses of deterministic processes, chemical master equation and finite state projection analyses of heterogeneous processes, and stochastic simulations. For each analysis, we employ Matlab and Python software to consider a time-dependent input signal (e.g., a kinase nuclear translocation) and several model hypotheses, along with simulated single-cell data. We illustrate different approaches (e.g., deterministic and stochastic) to identify the mechanisms and parameters of the same model from the same simulated data. For each approach, we explore how uncertainty in parameter space varies with respect to the chosen analysis approach or specific experiment design. We conclude with a discussion of how our simulated results relate to the integration of experimental and computational investigations to explore signal-activated gene expression models in yeast [1] and human cells [2]‡.PACS numbers: 87.10.+e, 87.15.Aa, 05.10.Gg, 05.40.Ca,02.50.-rSubmitted to: Phys. Biol.

2018 ◽  
Author(s):  
Huy D. Vo ◽  
Zachary Fox ◽  
Ania Baetica ◽  
Brian Munsky

AbstractThe finite state projection (FSP) approach to solving the chemical master equation has enabled successful inference of discrete stochastic models to predict single-cell gene regulation dynamics. Unfortunately, the FSP approach is highly computationally intensive for all but the simplest models, an issue that is highly problematic when parameter inference and uncertainty quantification takes enormous numbers of parameter evaluations. To address this issue, we propose two new computational methods for the Bayesian inference of stochastic gene expression parameters given single-cell experiments. We formulate and verify an Adaptive Delayed Acceptance Metropolis-Hastings (ADAMH) algorithm to utilize with reduced Krylov-basis projections of the FSP. We then introduce an extension of the ADAMH into a Hybrid scheme that consists of an initial phase to construct a reduced model and a faster second phase to sample from the approximate posterior distribution determined by the constructed model. We test and compare both algorithms to an adaptive Metropolis algorithm with full FSP-based likelihood evaluations on three example models and simulated data to show that the new ADAMH variants achieve substantial speedup in comparison to the full FSP approach. By reducing the computational costs of parameter estimation, we expect the ADAMH approach to enable efficient data-driven estimation for more complex gene regulation models.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009466
Author(s):  
Stephen Zhang ◽  
Anton Afanassiev ◽  
Laura Greenstreet ◽  
Tetsuya Matsumoto ◽  
Geoffrey Schiebinger

Understanding how cells change their identity and behaviour in living systems is an important question in many fields of biology. The problem of inferring cell trajectories from single-cell measurements has been a major topic in the single-cell analysis community, with different methods developed for equilibrium and non-equilibrium systems (e.g. haematopoeisis vs. embryonic development). We show that optimal transport analysis, a technique originally designed for analysing time-courses, may also be applied to infer cellular trajectories from a single snapshot of a population in equilibrium. Therefore, optimal transport provides a unified approach to inferring trajectories that is applicable to both stationary and non-stationary systems. Our method, StationaryOT, is mathematically motivated in a natural way from the hypothesis of a Waddington’s epigenetic landscape. We implement StationaryOT as a software package and demonstrate its efficacy in applications to simulated data as well as single-cell data from Arabidopsis thaliana root development.


2019 ◽  
Vol 2 (4) ◽  
pp. e201900443 ◽  
Author(s):  
Jun Woo ◽  
Boris J. Winterhoff ◽  
Timothy K. Starr ◽  
Constantin Aliferis ◽  
Jinhua Wang

Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.


2017 ◽  
Author(s):  
Carolin Loos ◽  
Katharina Moeller ◽  
Fabian Fröhlich ◽  
Tim Hucho ◽  
Jan Hasenauer

All biological systems exhibit cell-to-cell variability, and this variability often has functional implications. To gain a thorough understanding of biological processes, the latent causes and underlying mechanisms of this variability must be elucidated. Cell populations comprising multiple distinct subpopulations are commonplace in biology, yet no current methods allow the sources of variability between and within individual subpopulations to be identified. This limits the analysis of single-cell data, for example provided by flow cytometry and microscopy. In this study, we present a data-driven modeling framework for the analysis of populations comprising heterogeneous subpopulations. Our approach combines mixture modeling with frameworks for distribution approximation, facilitating the integration of multiple single-cell datasets and the detection of causal differences between and within subpopulations. The computational efficiency of our framework allows hundreds of competing hypotheses to be compared, giving unprecedented depth of a study. We demonstrated the ability of our method to capture multiple levels of heterogeneity in the analyzes of simulated data and data from highly heterogeneous sensory neurons involved in pain initiation. Our approach identified the sources of cell-to-cell variability and revealed mechanisms that underlie the modulation of nerve growth factor-induced Erk1/2 signaling by extracellular scaffolds.


2021 ◽  
Author(s):  
Stephen Zhang ◽  
Anton Afanassiev ◽  
Laura Greenstreet ◽  
Tetsuya Matsumoto ◽  
Geoffrey Schiebinger

AbstractUnderstanding how cells change their identity and behaviour in living systems is an important question in many fields of biology. The problem of inferring cell trajectories from single-cell measurements has been a major topic in the single-cell analysis community, with different methods developed for equilibrium and non-equilibrium systems (e.g. haematopoeisis vs. embryonic development). We show that optimal transport analysis, a technique originally designed for analysing time-courses, may also be applied to infer cellular trajectories from a single snapshot of a population in equilibrium. Therefore optimal transport provides a unified approach to inferring trajectories, applicable to both stationary and non-stationary systems. Our method, StationaryOT, is mathematically motivated in a natural way from the hypothesis of a Waddington’s epigenetic landscape. We implemented StationaryOT as a software package and demonstrate its efficacy when applied to simulated data as well as single-cell data from Arabidopsis thaliana root development.


2018 ◽  
Author(s):  
Bianca Dumitrascu ◽  
Karen Feng ◽  
Barbara E Engelhardt

We present the Good-Toulmin like estimator via Thompson sampling, a computational method for iterative experimental design in multi-tissue single-cell RNA-seq (scRNA-seq) data. Given a budget and modeling cell type information across tissues, GT-TS estimates how many cells are required for sampling from each tissue with the goal of maximizing cell type discovery across samples from multiple iterations. In both real and simulated data, we demonstrate the advantages of GT-TS in data collection planning when compared to a random strategy in the absence of experimental design.


2018 ◽  
Vol 15 (5) ◽  
pp. 055001 ◽  
Author(s):  
Lisa Weber ◽  
William Raymond ◽  
Brian Munsky

Sign in / Sign up

Export Citation Format

Share Document