scholarly journals Biological Process Activity Transformation of Single Cell Gene Expression for Cross-Species Alignment

2019 ◽  
Author(s):  
Hongxu Ding ◽  
Andrew Blair ◽  
Ying Yang ◽  
Joshua M. Stuart

ABSTRACTThe maintenance and transition of cellular states are controlled by biological processes. Here we present a gene set-based transformation of single cell RNA-Seq data into biological process activities that provides a robust description of cellular states. Moreover, as these activities represent species-independent descriptors, they facilitate the alignment of single cell states across different organisms.

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Hongxu Ding ◽  
Andrew Blair ◽  
Ying Yang ◽  
Joshua M. Stuart

Abstract The maintenance and transition of cellular states are controlled by biological processes. Here we present a gene set-based transformation of single cell RNA-Seq data into biological process activities that provides a robust description of cellular states. Moreover, as these activities represent species-independent descriptors, they facilitate the alignment of single cell states across different organisms.


2016 ◽  
Vol 64 (4) ◽  
pp. 977.2-977
Author(s):  
Z Jin ◽  
MA Jensen ◽  
JM Dorschner ◽  
DM Vsetecka ◽  
S Amin ◽  
...  

BackgroundOur previous studies have shown that different cell types from the same blood sample demonstrate diverse gene expression parameters. In follow up work, it seems that this diversity extends to cells of the same type from the same blood sample. In this study, we examine single cell gene expression in SLE patient monocytes and determine correlations with clinical features.MethodsCD14++CD16− classical monocytes (CLs) and CD14dimCD16+ non-classical monocytes (NCLs) from SLE patients were purified by magnetic separation. The Fluidigm single cell capture and pre-amplification system was used for single cell capture and target gene pre-amplification. Fluidigm Biomark system (Rt-PCR system) was used to quantify expression of 87 monocyte-related genes. IFN-induced genes in monocytes were identified by culturing monocytes isolated from whole blood of healthy controls with or without IFN-α. Genes significant up-regulated by IFN were identified as IFN-induced genes in current study. An individual cell IFN score was given based upon the sum of expression of IFN-induced genes.ResultsBoth CLs and NCLs demonstrated a wide range of expression of IFN-induced genes, and NCL monocytes had higher IFN scores than CL monocytes. Using unsupervised hierarchical clustering, we found four gene sets that clustered monocytes functionally. These included an IFN-induced gene set, two inflammatory gene sets, and one immunosuppressive gene set. Interestingly, we could define a large subset of NCL monocytes with upregulation of suppressive transcripts (including TGF-β and PDL1) and IFN-induced transcripts were also upregulated, while the two inflammatory gene sets were down-regulated. These cells were highly over-represented in a patient with inactive disease who was on immunosuppressants at the time of blood draw. The proportion of anti-inflammatory gene set expressing NCLs was inversely correlated with anti-dsDNA titers (rho=−0.77, p=0.0051) and positively correlated with C3 complement (rho=0.68, p=0.030) in the SLE patient group, suggesting that these cells are also associated with serological quiescence.ConclusionUsing single cell gene expression, we have identified a unique population of NCL monocytes in SLE patients with upregulation of a combination of anti-inflammatory and IFN-induced transcripts. These cells correspond with clinical and serological quiescence.


2020 ◽  
Author(s):  
Tianyi Sun ◽  
Dongyuan Song ◽  
Wei Vivian Li ◽  
Jingyi Jessica Li

AbstractIn the burgeoning field of single-cell transcriptomics, a pressing challenge is to benchmark various experimental protocols and numerous computational methods in an unbiased manner. Although dozens of simulators have been developed for single-cell RNA-seq (scRNA-seq) data, they lack the capacity to simultaneously achieve all the three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, here we propose scDesign2, an interpretable simulator that achieves all the three goals and generates high-fidelity synthetic data for multiple scRNA-seq protocols and other single-cell gene expression count-based technologies. Compared with existing simulators, scDesign2 is advantageous in its transparent use of probabilistic models and is unique in its ability to capture gene correlations via copula. We verify that scDesign2 generates more realistic synthetic data for four scRNA-seq protocols (10x Genomics, CEL-Seq2, Fluidigm C1, and Smart-Seq2) and two single-cell spatial transcriptomics protocols (MERFISH and pciSeq) than existing simulators do. Under two typical computational tasks, cell clustering and rare cell type detection, we demonstrate that scDesign2 provides informative guidance on deciding the optimal sequencing depth and cell number in single-cell RNA-seq experimental design, and that scDesign2 can effectively benchmark computational methods under varying sequencing depths and cell numbers. With these advantages, scDesign2 is a powerful tool for single-cell researchers to design experiments, develop computational methods, and choose appropriate methods for specific data analysis needs.


2021 ◽  
Author(s):  
Kun Qian ◽  
Shiwei Fu ◽  
Hongwei Li ◽  
Wei Vivian Li

The increasing number of scRNA-seq data emphasizes the need for integrative analysis to interpret similarities and differences between single-cell samples. Even though different batch effect removal methods have been developed, none of the existing methods is suitable for heterogeneous single-cell samples coming from multiple biological conditions. To address this challenge, we propose a method named scINSIGHT to learn coordinated gene expression patterns that are common among or specific to different biological conditions, offering a unique chance to identify cellular identities and key biological processes across single-cell samples. We have evaluated scINSIGHT in comparison with state-of-the-art methods using simulated and real data, which consistently demonstrate its improved performance. In addition, our results show the applicability of scINSIGHT in diverse biomedical and clinical problems.


Author(s):  
Bin Yu ◽  
Chen Chen ◽  
Ren Qi ◽  
Ruiqing Zheng ◽  
Patrick J Skillman-Lawrence ◽  
...  

Abstract The rapid development of single-cell RNA sequencing (scRNA-Seq) technology provides strong technical support for accurate and efficient analyzing single-cell gene expression data. However, the analysis of scRNA-Seq is accompanied by many obstacles, including dropout events and the curse of dimensionality. Here, we propose the scGMAI, which is a new single-cell Gaussian mixture clustering method based on autoencoder networks and the fast independent component analysis (FastICA). Specifically, scGMAI utilizes autoencoder networks to reconstruct gene expression values from scRNA-Seq data and FastICA is used to reduce the dimensions of reconstructed data. The integration of these computational techniques in scGMAI leads to outperforming results compared to existing tools, including Seurat, in clustering cells from 17 public scRNA-Seq datasets. In summary, scGMAI is an effective tool for accurately clustering and identifying cell types from scRNA-Seq data and shows the great potential of its applicative power in scRNA-Seq data analysis. The source code is available at https://github.com/QUST-AIBBDRC/scGMAI/.


2019 ◽  
Author(s):  
Nigatu A. Adossa ◽  
Leif Schauser ◽  
Vivi G. Gregersen ◽  
Laura L. Elo

AbstractBackgroundRecent advances in single-cell gene expression profiling technology have revolutionized the understanding of molecular processes underlying developmental cell and tissue differentiation, enabling the discovery of novel cell-types and molecular markers that characterize developmental trajectories. Common approaches for identifying marker genes are based on pairwise statistical testing for differential gene expression between cell-types in heterogeneous cell populations, which is challenging due to unequal sample sizes and variance between groups resulting in little statistical power and inflated type I errors.ResultsWe developed an alternative feature extraction method, Marker gene Identification for Cell-type Identity (MICTI) that encodes the cell-type specific expression information to each gene in every single-cell. This approach identifies features (genes) that are cell-type specific for a given cell-type in heterogeneous cell population. To validate this approach, we used (i) simulated single cell RNA-seq data, (ii) human pancreatic islet single-cell RNA-seq data and (iii) a simulated mixture of human single-cell RNA-seq data related to immune cells, particularly B cells, CD4+ memory cells, CD8+ memory cells, dendritic cells, fibroblast cells, and lymphoblast cells. For all cases, we were able to identify established cell-type-specific markers.ConclusionsOur approach represents a highly efficient and fast method as an alternative to differential expression analysis for molecular marker identification in heterogeneous single-cell RNA-seq data.


2017 ◽  
Author(s):  
Diego Calderon ◽  
Anand Bhaskar ◽  
David A. Knowles ◽  
David Golan ◽  
Towfique Raj ◽  
...  

AbstractPrevious studies have prioritized trait-relevant cell types by looking for an enrichment of GWAS signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet, hardly any work exists linking single-cell RNA-seq to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and single-cell RNA-seq. We demonstrate RolyPoly’s accuracy through simulation and validate previously known tissue-trait associations. We discover a significant association between microglia and late-onset Alzheimer’s disease, and an association between oligodendrocytes and replicating fetal cortical cells with schizophrenia. Additionally, RolyPoly computes a trait-relevance score for each gene which reflects the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of Alzheimer’s patients were significantly enriched for highly ranked genes by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.


2017 ◽  
Author(s):  
Tao Peng ◽  
Qing Nie

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data


2021 ◽  
Author(s):  
Elvis Han Cui ◽  
Weng Kee Wong ◽  
Dongyuan Song ◽  
Jingyi Jessica Li

Modeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models. Here we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped, or valley-shaped, along cell pseudotime. The scGTM has three advantages: (1) it can capture non-monotonic trends that are still easy to interpret, (2) its parameters are biologically interpretable and trend informative, and (3) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression data sets using the scGTM and show that it can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying the biological processes. We also provide an open-access Python package for fitting the scGTM at https://github. com/ElvisCuiHan/scGTM.


Sign in / Sign up

Export Citation Format

Share Document