scDD: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments

Mapping Intimacies ◽

10.1101/035501 ◽

2015 ◽

Cited By ~ 4

Author(s):

Keegan D. Korthauer ◽

Li-Fang Chu ◽

Michael A. Newton ◽

Yuan Li ◽

James Thomson ◽

...

Keyword(s):

Single Cell ◽

Expression Patterns ◽

Mean Shift ◽

R Package ◽

Study Data ◽

Cellular Heterogeneity ◽

Modeling Framework ◽

Wide Range ◽

Higher Power ◽

Novel Method

AbstractThe ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. Although understanding such heterogeneity is of primary interest in a number of studies, for convenience, statistical methods often treat cellular heterogeneity as a nuisance factor. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. Using simulated and case study data, we demonstrate that the modeling framework is able to detect differential expression patterns of interest under a wide range of settings. Compared to existing approaches, scDD has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and is able to characterize those differences. The freely available R package scDD implements the approach.

Download Full-text

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes ◽

10.3390/genes12020311 ◽

2021 ◽

Vol 12 (2) ◽

pp. 311

Author(s):

Zhenqiu Liu

Keyword(s):

Single Cell ◽

Free Parameter ◽

Graphical Model ◽

Expression Patterns ◽

Information Criterion ◽

Log P ◽

Rna Seq ◽

Clustering Methods ◽

Wide Range ◽

Free Parameters

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Download Full-text

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btab257 ◽

2021 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single-Cell Sequencing: Biological Insight and Potential Clinical Implications in Pediatric Leukemia

Cancers ◽

10.3390/cancers13225658 ◽

2021 ◽

Vol 13 (22) ◽

pp. 5658

Author(s):

Donát Alpár ◽

Bálint Egyed ◽

Csaba Bödör ◽

Gábor T. Kovács

Keyword(s):

High Resolution ◽

Single Cell ◽

Childhood Leukemia ◽

Clinical Decision Making ◽

Cellular Heterogeneity ◽

Therapeutic Interventions ◽

Cell Populations ◽

Pediatric Leukemia ◽

Single Cell Sequencing ◽

Wide Range

Single-cell sequencing (SCS) provides high-resolution insight into the genomic, epigenomic, and transcriptomic landscape of oncohematological malignancies including pediatric leukemia, the most common type of childhood cancer. Besides broadening our biological understanding of cellular heterogeneity, sub-clonal architecture, and regulatory network of tumor cell populations, SCS can offer clinically relevant, detailed characterization of distinct compartments affected by leukemia and identify therapeutically exploitable vulnerabilities. In this review, we provide an overview of SCS studies focused on the high-resolution genomic and transcriptomic scrutiny of pediatric leukemia. Our aim is to investigate and summarize how different layers of single-cell omics approaches can expectedly support clinical decision making in the future. Although the clinical management of pediatric leukemia underwent a spectacular improvement during the past decades, resistant disease is a major cause of therapy failure. Currently, only a small proportion of childhood leukemia patients benefit from genomics-driven therapy, as 15–20% of them meet the indication criteria of on-label targeted agents, and their overall response rate falls in a relatively wide range (40–85%). The in-depth scrutiny of various cell populations influencing the development, progression, and treatment resistance of different disease subtypes can potentially uncover a wider range of driver mechanisms for innovative therapeutic interventions.

Download Full-text

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Download Full-text

Beyond comparisons of means: understanding changes in gene expression at the single-cell level

10.1101/035949 ◽

2016 ◽

Author(s):

Catalina A Vallejos ◽

Sylvia Richardson ◽

John C Marioni

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Probabilistic Approach ◽

Embryonic Stem ◽

Cellular Heterogeneity ◽

Bayesian Hierarchical ◽

False Discovery ◽

Rich Information ◽

The Rich

Single-cell RNA sequencing (scRNA-seq) can be used to characterise differences in gene expression patterns between pre-specified populations of cells. Traditionally, differential expression tools are restricted to the study of changes in overall expression between cell populations. However, such analyses do not take full advantage of the rich information provided by scRNA-seq. In this article, we present a Bayesian hierarchical model which can be used to study changes in expression that lie beyond comparisons of means. In particular, our method can highlight genes that undergo changes in cell-to-cell heterogeneity between the populations but whose overall expression is preserved. Evidence supporting these changes is quantified using a probabilistic approach based on tail posterior probabilities, where a probability cut-off is calibrated through the expected false discovery rate. Our method incorporates a built-in normalisation strategy and quantifies technical artefacts by borrowing information from technical spike-in genes. Control experiments validate the performance of our approach. Finally, we compare expression patterns of mouse embryonic stem cells between different stages of the cell cycle, revealing substantial differences in cellular heterogeneity.

Download Full-text

A systematic dissection of human primary osteoblasts in vivo at single-cell resolution

10.1101/2020.05.12.091975 ◽

2020 ◽

Author(s):

Yun Gong ◽

Junxiao Yang ◽

Xiaohua Li ◽

Cui Zhou ◽

Yu Chen ◽

...

Keyword(s):

Bone Formation ◽

Single Cell ◽

Bone Cells ◽

Expression Patterns ◽

Cellular Heterogeneity ◽

Human Osteoblasts ◽

Primary Osteoblasts ◽

Human Primary Osteoblasts

AbstractOsteoblasts are multifunctional bone cells, which play essential roles in bone formation, angiogenesis regulation, as well as maintenance of hematopoiesis. Although both in vivo and in vitro studies on mice have identified several potential osteoblast subtypes based on their different transition stages or biological responses to external stimuli, the categorization of primary osteoblast subtypes in vivo in humans has not yet been achieved. Here, we used single-cell RNA sequencing (scRNA-seq) to perform a systematic cellular taxonomy dissection of freshly isolated human osteoblasts. Based on the gene expression patterns and cell lineage reconstruction, we identified three distinct cell clusters including preosteoblasts, mature osteoblasts, and an undetermined rare osteoblast subpopulation. This novel subtype was mainly characterized by the nuclear receptor subfamily 4 group A member 1 and 2 (NR4A1 and NR4A2), and its existence was confirmed by immunofluorescence staining. Trajectory inference analysis suggested that the undetermined cluster, together with the preosteoblasts, are involved in the regulation of osteoblastogenesis and also give rise to mature osteoblasts. Investigation of the biological processes and signaling pathways enriched in each subpopulation revealed that in addition to bone formation, preosteoblasts and undetermined osteoblasts may also regulate both angiogenesis and hemopoiesis. Finally, we demonstrated that there are systematic differences between the transcriptional profiles of human osteoblasts in vivo and mouse osteoblasts both in vivo and in vitro, highlighting the necessity for studying bone physiological processes in humans rather than solely relying on mouse models. Our findings provide novel insights into the cellular heterogeneity and potential biological functions of human primary osteoblasts at the single-cell level, which is an important and necessary step to further dissect the biological roles of osteoblasts in bone metabolism under various (patho-) physiological conditions.

Download Full-text

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

10.1101/2020.11.07.373043 ◽

2020 ◽

Author(s):

Yixuan Qiu ◽

Jiebiao Wang ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Correlation Pattern ◽

Tissue Samples ◽

Bulk Data ◽

Tissue Marker

AbstractMotivationMarker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern.ResultsTo capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list.Availability and implementationWe implement this method as an R package markerpen, hosted on https://github.com/yixuan/[email protected]

Download Full-text

Single-cell RNA-seq reveals the transcriptional landscape in ischemic stroke

Journal of Cerebral Blood Flow & Metabolism ◽

10.1177/0271678x211026770 ◽

2021 ◽

pp. 0271678X2110267

Author(s):

Kai Zheng ◽

Lingmin Lin ◽

Wei Jiang ◽

Lin Chen ◽

Xiyue Zhang ◽

...

Keyword(s):

Ischemic Stroke ◽

Single Cell ◽

Expression Patterns ◽

Cellular Heterogeneity ◽

Ependymal Cells ◽

Specific Gene ◽

Specific Cell ◽

Cell Type ◽

Cns Inflammation ◽

Cell Type Specific

Ischemic stroke (IS) is a detrimental neurological disease with limited treatments options. It has been challenging to define the roles of brain cell subsets in IS onset and progression due to cellular heterogeneity in the CNS. Here, we employed single-cell RNA sequencing (scRNA-seq) to comprehensively map the cell populations in the mouse model of MCAO (middle cerebral artery occlusion). We identified 17 principal brain clusters with cell-type specific gene expression patterns as well as specific cell subpopulations and their functions in various pathways. The CNS inflammation triggered upregulation of key cell type-specific genes unpublished before. Notably, microglia displayed a cell differentiation diversity after stroke among its five distinct subtypes. Importantly, we found the potential trajectory branches of the monocytes/macrophage’s subsets. Finally, we also identified distinct subclusters among brain vasculature cells, ependymal cells and other glia cells. Overall, scRNA-seq revealed the precise transcriptional changes during neuroinflammation at the single-cell level, opening up a new field for exploration of the disease mechanisms and drug discovery in stroke based on the cell-subtype specific molecules.

Download Full-text

Reference-based annotation of single cell transcriptomes identifies a profibrotic macrophage niche after tissue injury

10.1101/284604 ◽

2018 ◽

Cited By ~ 4

Author(s):

Dvir Aran ◽

Agnieszka P. Looney ◽

Leqian Liu ◽

Valerie Fong ◽

Austin Hsu ◽

...

Keyword(s):

Single Cell ◽

Lung Fibrosis ◽

Tissue Injury ◽

Granular Cell ◽

Cellular Heterogeneity ◽

Trophic Effect ◽

Peripheral Tissues ◽

Wide Range

AbstractMyeloid cells localize to peripheral tissues in a wide range of pathologic contexts. However, appreciation of distinct myeloid subtypes has been limited by the signal averaging inherent to bulk sequencing approaches. Here we applied single-cell RNA sequencing (scRNA-seq) to map cellular heterogeneity in lung fibrosis induced by bleomycin injury in mice. We first developed a computational framework that enables unbiased, granular cell-type annotation of scRNA-seq. This approach identified a macrophage subpopulation that was specific to injured lung and notable for high expression of Cx3cr1+ and MHCII genes. We found that these macrophages, which bear a gene expression profile consistent with monocytic origin, progressively acquire alveolar macrophage identity and localize to sites of fibroblast accumulation. Probing their functional role, in vitro studies showed a trophic effect of these cells on fibroblast activation, and ablation of Cx3cr1-expressing cells suppressed fibrosis in vivo. We also found by gene set analysis and immunofluorescence that markers of these macrophages were upregulated in samples from patients with lung fibrosis compared with healthy controls. Taken together, our results uncover a specific pathologic subgroup of macrophages with markers that could enable their therapeutic targeting for fibrosis.

Download Full-text