scholarly journals Machine learning methods to reverse engineer dynamic gene regulatory networks governing cell state transitions

2018 ◽  
Author(s):  
P. Tsakanikas ◽  
D. Manatakis ◽  
E. S. Manolakos

ABSTRACTDeciphering the dynamic gene regulatory mechanisms driving cells to make fate decisions remains elusive. We present a novel unsupervised machine learning methodology that can be used to analyze a dataset of heterogeneous single-cell gene expressions profiles, determine the most probable number of states (major cellular phenotypes) represented and extract the corresponding cell sub-populations. Most importantly, for any transition of interest from a source to a destination state, our methodology can zoom in, identify the cells most specific for studying the dynamics of this transition, order them along a trajectory of biological progression in posterior probabilities space, determine the "key-player" genes governing the transition dynamics, partition the trajectory into consecutive phases (transition "micro-states"), and finally reconstruct causal gene regulatory networks for each phase. Application of the end-to-end methodology provides new insights on key-player genes and their dynamic interactions during the important HSC-to-LMPP cell state transition involved in hematopoiesis. Moreover, it allows us to reconstruct a probabilistic representation of the “epigenetic landscape” of transitions and identify correctly the major ones in the hematopoiesis hierarchy of states.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jiyoung Lee ◽  
Shuo Geng ◽  
Song Li ◽  
Liwu Li

Subclinical doses of LPS (SD-LPS) are known to cause low-grade inflammatory activation of monocytes, which could lead to inflammatory diseases including atherosclerosis and metabolic syndrome. Sodium 4-phenylbutyrate is a potential therapeutic compound which can reduce the inflammation caused by SD-LPS. To understand the gene regulatory networks of these processes, we have generated scRNA-seq data from mouse monocytes treated with these compounds and identified 11 novel cell clusters. We have developed a machine learning method to integrate scRNA-seq, ATAC-seq, and binding motifs to characterize gene regulatory networks underlying these cell clusters. Using guided regularized random forest and feature selection, our method achieved high performance and outperformed a traditional enrichment-based method in selecting candidate regulatory genes. Our method is particularly efficient in selecting a few candidate genes to explain observed expression pattern. In particular, among 531 candidate TFs, our method achieves an auROC of 0.961 with only 10 motifs. Finally, we found two novel subpopulations of monocyte cells in response to SD-LPS and we confirmed our analysis using independent flow cytometry experiments. Our results suggest that our new machine learning method can select candidate regulatory genes as potential targets for developing new therapeutics against low grade inflammation.


Patterns ◽  
2020 ◽  
Vol 1 (9) ◽  
pp. 100139
Author(s):  
Daniel Osorio ◽  
Yan Zhong ◽  
Guanxun Li ◽  
Jianhua Z. Huang ◽  
James J. Cai

2021 ◽  
Author(s):  
Ewen Burban ◽  
Maud Irene Tenaillon ◽  
Arnaud Le Rouzic

The domestication of plant and animal species lead to repeatable morphological evolution, often referred to as the phenotypic domestication syndrome. Domestication is also associated with important genomic changes, such as the loss of genetic diversity and modifications of gene expression patterns. Here, we explored theoretically the effect of domestication at the genomic level by characterizing the impact of a domestication-like scenario on gene regulatory networks. We ran population genetics simulations in which individuals were featured by their genotype (an interaction matrix encoding a gene regulatory network) and their gene expressions, representing the phenotypic level. Our domestication scenario included a population bottleneck and a selection switch (change in the optimal gene expression level) mimicking canalizing selection, i.e. evolution towards more stable expression to parallel enhanced environmental stability in man-made habitat. We showed that domestication profoundly alters genetic architectures. Based on the well-documented example of the maize (Zea mays ssp. mays) domestication, our simulations predicted (i) a drop in neutral allelic diversity, (ii) a change in gene expression variance that depended upon the domestication scenario, (iii) transient maladaptive plasticity, (iv) a deep rewiring of the gene regulatory networks, with a trend towards gain of regulatory interactions between genes, and (v) a global increase in the genetic correlations among gene expressions, with a loss of modularity in the resulting coexpression patterns and in the underlying networks. Extending the range of parameters, we provide empirically testable predictions on the differences of genetic architectures between wild and domesticated and forms. The characterization of such systematic evolutionary changes in the genetic architecture of traits contributes to define a molecular domestication syndrome.


Author(s):  
H. Chatrabgoun ◽  
A. R. Soltanian ◽  
H. Mahjub ◽  
F. Bahreini

Large amounts of research efforts have been focused on learning gene regulatory networks (GRNs) based on gene expression data to understand the functional basis of a living organism. Under the assumption that the joint distribution of the gene expressions of interest is a multivariate normal distribution, such networks can be constructed by assessing the nonzero elements of the inverse covariance matrix, the so-called precision matrix or concentration matrix. This may not reflect the true connectivity between genes by considering just pairwise linear correlations. To relax this limitative constraint, we employ Gaussian process (GP) model which is well known as computationally efficient non-parametric Bayesian machine learning technique. GPs are among a class of methods known as kernel machines which can be used to approximate complex problems by tuning their hyperparameters. In fact, GP creates the ability to use the capacity and potential of different kernels in constructing precision matrix and GRNs. In this paper, in the first step, we choose the GP with appropriate kernel to learn the considered GRNs from the observed genetic data, and then we estimate kernel hyperparameters using rule-of-thumb technique. Using these hyperparameters, we can also control the degree of sparseness in the precision matrix. Then we obtain kernel-based precision matrix similar to GLASSO to construct kernel-based GRN. The findings of our research are used to construct GRNs with high performance, for different species of Drosophila fly rather than simply using the assumption of multivariate normal distribution, and the GPs, despite the use of the kernels capacity, have a much better performance than the multivariate Gaussian distribution assumption.


2020 ◽  
Author(s):  
Daniel Osorio ◽  
Yan Zhong ◽  
Guanxun Li ◽  
Jianhua Z. Huang ◽  
James J. Cai

AbstractConstructing and comparing gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNAseq) data has the potential to reveal critical components in the underlying regulatory networks regulating different cellular transcriptional activities. Here, we present a robust and powerful machine learning workflow—scTenifoldNet—for comparative GRN analysis of single cells. The scTenifoldNet workflow, consisting of principal component regression, low-rank tensor approximation, and manifold alignment, constructs and compares transcriptome-wide single-cell GRNs (scGRNs) from different samples to identify gene expression signatures shifting with cellular activity changes such as those associated with pathophysiological processes and responses to environmental perturbations. We used simulated data to benchmark scTenifoldNet’s performance, and then applied scTenifoldNet to several real data sets. In real-data applications, scTenifoldNet identified highly specific changes in gene regulation in response to acute morphine treatment, an antibody anticancer drug, gene knockout, double-stranded RNA stimulus, and amyloid-beta plaques in various types of mouse and human cells. We anticipate that scTenifoldNet can help achieve breakthroughs through constructing and comparing scGRNs in poorly characterized biological systems, by deciphering the full cellular and molecular complexity of the data.HighlightsscTenifoldNet is a machine learning workflow built upon principal component regression, low-rank tensor approximation, and manifold alignmentscTenifoldNet uses single-cell RNA sequencing (scRNAseq) data to construct single-cell gene regulatory networks (scGRNs)scTenifoldNet compares scGRNs of different samples to identify differentially regulated genesReal-data applications demonstrate that scTenifoldNet accurately detects specific signatures of gene expression relevant to the cellular systems tested.Short abstractWe present scTenifoldNet—a machine learning workflow built upon principal component regression, low-rank tensor approximation, and manifold alignment—for constructing and comparing single-cell gene regulatory networks (scGRNs) using data from single-cell RNA sequencing (scRNAseq). scTenifoldNet reveals regulatory changes in gene expression between samples by comparing the constructed scGRNs. With real data, scTenifoldNet identifies specific gene expression programs associated with different biological processes, providing critical insights into the underlying mechanism of regulatory networks governing cellular transcriptional activities.


Sign in / Sign up

Export Citation Format

Share Document