scholarly journals scDD: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments

2015 ◽  
Author(s):  
Keegan D. Korthauer ◽  
Li-Fang Chu ◽  
Michael A. Newton ◽  
Yuan Li ◽  
James Thomson ◽  
...  

AbstractThe ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. Although understanding such heterogeneity is of primary interest in a number of studies, for convenience, statistical methods often treat cellular heterogeneity as a nuisance factor. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. Using simulated and case study data, we demonstrate that the modeling framework is able to detect differential expression patterns of interest under a wide range of settings. Compared to existing approaches, scDD has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and is able to characterize those differences. The freely available R package scDD implements the approach.

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Rongxin Fang ◽  
Sebastian Preissl ◽  
Yang Li ◽  
Xiaomeng Hou ◽  
Jacinta Lucero ◽  
...  

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


Cancers ◽  
2021 ◽  
Vol 13 (22) ◽  
pp. 5658
Author(s):  
Donát Alpár ◽  
Bálint Egyed ◽  
Csaba Bödör ◽  
Gábor T. Kovács

Single-cell sequencing (SCS) provides high-resolution insight into the genomic, epigenomic, and transcriptomic landscape of oncohematological malignancies including pediatric leukemia, the most common type of childhood cancer. Besides broadening our biological understanding of cellular heterogeneity, sub-clonal architecture, and regulatory network of tumor cell populations, SCS can offer clinically relevant, detailed characterization of distinct compartments affected by leukemia and identify therapeutically exploitable vulnerabilities. In this review, we provide an overview of SCS studies focused on the high-resolution genomic and transcriptomic scrutiny of pediatric leukemia. Our aim is to investigate and summarize how different layers of single-cell omics approaches can expectedly support clinical decision making in the future. Although the clinical management of pediatric leukemia underwent a spectacular improvement during the past decades, resistant disease is a major cause of therapy failure. Currently, only a small proportion of childhood leukemia patients benefit from genomics-driven therapy, as 15–20% of them meet the indication criteria of on-label targeted agents, and their overall response rate falls in a relatively wide range (40–85%). The in-depth scrutiny of various cell populations influencing the development, progression, and treatment resistance of different disease subtypes can potentially uncover a wider range of driver mechanisms for innovative therapeutic interventions.


2020 ◽  
Author(s):  
Etienne Becht ◽  
Daniel Tolstrup ◽  
Charles-Antoine Dutertre ◽  
Florent Ginhoux ◽  
Evan W. Newell ◽  
...  

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.


2016 ◽  
Author(s):  
Catalina A Vallejos ◽  
Sylvia Richardson ◽  
John C Marioni

Single-cell RNA sequencing (scRNA-seq) can be used to characterise differences in gene expression patterns between pre-specified populations of cells. Traditionally, differential expression tools are restricted to the study of changes in overall expression between cell populations. However, such analyses do not take full advantage of the rich information provided by scRNA-seq. In this article, we present a Bayesian hierarchical model which can be used to study changes in expression that lie beyond comparisons of means. In particular, our method can highlight genes that undergo changes in cell-to-cell heterogeneity between the populations but whose overall expression is preserved. Evidence supporting these changes is quantified using a probabilistic approach based on tail posterior probabilities, where a probability cut-off is calibrated through the expected false discovery rate. Our method incorporates a built-in normalisation strategy and quantifies technical artefacts by borrowing information from technical spike-in genes. Control experiments validate the performance of our approach. Finally, we compare expression patterns of mouse embryonic stem cells between different stages of the cell cycle, revealing substantial differences in cellular heterogeneity.


2020 ◽  
Author(s):  
Yun Gong ◽  
Junxiao Yang ◽  
Xiaohua Li ◽  
Cui Zhou ◽  
Yu Chen ◽  
...  

AbstractOsteoblasts are multifunctional bone cells, which play essential roles in bone formation, angiogenesis regulation, as well as maintenance of hematopoiesis. Although both in vivo and in vitro studies on mice have identified several potential osteoblast subtypes based on their different transition stages or biological responses to external stimuli, the categorization of primary osteoblast subtypes in vivo in humans has not yet been achieved. Here, we used single-cell RNA sequencing (scRNA-seq) to perform a systematic cellular taxonomy dissection of freshly isolated human osteoblasts. Based on the gene expression patterns and cell lineage reconstruction, we identified three distinct cell clusters including preosteoblasts, mature osteoblasts, and an undetermined rare osteoblast subpopulation. This novel subtype was mainly characterized by the nuclear receptor subfamily 4 group A member 1 and 2 (NR4A1 and NR4A2), and its existence was confirmed by immunofluorescence staining. Trajectory inference analysis suggested that the undetermined cluster, together with the preosteoblasts, are involved in the regulation of osteoblastogenesis and also give rise to mature osteoblasts. Investigation of the biological processes and signaling pathways enriched in each subpopulation revealed that in addition to bone formation, preosteoblasts and undetermined osteoblasts may also regulate both angiogenesis and hemopoiesis. Finally, we demonstrated that there are systematic differences between the transcriptional profiles of human osteoblasts in vivo and mouse osteoblasts both in vivo and in vitro, highlighting the necessity for studying bone physiological processes in humans rather than solely relying on mouse models. Our findings provide novel insights into the cellular heterogeneity and potential biological functions of human primary osteoblasts at the single-cell level, which is an important and necessary step to further dissect the biological roles of osteoblasts in bone metabolism under various (patho-) physiological conditions.


2020 ◽  
Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

AbstractMotivationMarker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern.ResultsTo capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list.Availability and implementationWe implement this method as an R package markerpen, hosted on https://github.com/yixuan/[email protected]


2021 ◽  
pp. 0271678X2110267
Author(s):  
Kai Zheng ◽  
Lingmin Lin ◽  
Wei Jiang ◽  
Lin Chen ◽  
Xiyue Zhang ◽  
...  

Ischemic stroke (IS) is a detrimental neurological disease with limited treatments options. It has been challenging to define the roles of brain cell subsets in IS onset and progression due to cellular heterogeneity in the CNS. Here, we employed single-cell RNA sequencing (scRNA-seq) to comprehensively map the cell populations in the mouse model of MCAO (middle cerebral artery occlusion). We identified 17 principal brain clusters with cell-type specific gene expression patterns as well as specific cell subpopulations and their functions in various pathways. The CNS inflammation triggered upregulation of key cell type-specific genes unpublished before. Notably, microglia displayed a cell differentiation diversity after stroke among its five distinct subtypes. Importantly, we found the potential trajectory branches of the monocytes/macrophage’s subsets. Finally, we also identified distinct subclusters among brain vasculature cells, ependymal cells and other glia cells. Overall, scRNA-seq revealed the precise transcriptional changes during neuroinflammation at the single-cell level, opening up a new field for exploration of the disease mechanisms and drug discovery in stroke based on the cell-subtype specific molecules.


2018 ◽  
Author(s):  
Dvir Aran ◽  
Agnieszka P. Looney ◽  
Leqian Liu ◽  
Valerie Fong ◽  
Austin Hsu ◽  
...  

AbstractMyeloid cells localize to peripheral tissues in a wide range of pathologic contexts. However, appreciation of distinct myeloid subtypes has been limited by the signal averaging inherent to bulk sequencing approaches. Here we applied single-cell RNA sequencing (scRNA-seq) to map cellular heterogeneity in lung fibrosis induced by bleomycin injury in mice. We first developed a computational framework that enables unbiased, granular cell-type annotation of scRNA-seq. This approach identified a macrophage subpopulation that was specific to injured lung and notable for high expression of Cx3cr1+ and MHCII genes. We found that these macrophages, which bear a gene expression profile consistent with monocytic origin, progressively acquire alveolar macrophage identity and localize to sites of fibroblast accumulation. Probing their functional role, in vitro studies showed a trophic effect of these cells on fibroblast activation, and ablation of Cx3cr1-expressing cells suppressed fibrosis in vivo. We also found by gene set analysis and immunofluorescence that markers of these macrophages were upregulated in samples from patients with lung fibrosis compared with healthy controls. Taken together, our results uncover a specific pathologic subgroup of macrophages with markers that could enable their therapeutic targeting for fibrosis.


Sign in / Sign up

Export Citation Format

Share Document