scholarly journals VPAC: Variational projection for accurate clustering of single-cell transcriptomic data

2019 ◽  
Author(s):  
Shengquan Chen ◽  
Kui Hua ◽  
Hongfei Cui ◽  
Rui Jiang

AbstractBackgroundSingle-cell RNA-sequencing (scRNA-seq) technologies have advanced rapidly in recent years and enabled the quantitative characterization at a microscopic resolution. With the exponential growth of the number of cells profiled in individual scRNA-seq experiments, the demand for identifying putative cell types from the data has become a great challenge that appeals for novel computational methods. Although a variety of algorithms have recently been proposed for single-cell clustering, such limitations as low accuracy, inferior robustness, and inadequate stability greatly impede the scope of applications of these methods.ResultsWe propose a novel model-based algorithm, named VPAC, for accurate clustering of single-cell transcriptomic data through variational projection, which assumes that single-cell samples follow a Gaussian mixture distribution in a latent space. Through comprehensive validation experiments, we demonstrate that VPAC can not only be applied to datasets of discrete counts and normalized continuous data, but also scale up well to various data dimensionality, different dataset size and different data sparsity. We further illustrate the ability of VPAC to detect genes with strong unique signatures of a specific cell type, which may shed light on the studies in system biology. We have released a user-friendly python package of VPAC in Github (https://github.com/ShengquanChen/VPAC). Users can directly import our VPAC class and conduct clustering without tedious installation of dependency packages.ConclusionsVPAC enables highly accurate clustering of single-cell transcriptomic data via a statistical model. We expect to see wide applications of our method to not only transcriptome studies for fully understanding the cell identity and functionality, but also the clustering of more general data.

2019 ◽  
Author(s):  
Xiaoyang Chen ◽  
Shengquan Chen ◽  
Rui Jiang

AbstractBackgroundIn recent years, the rapid development of single-cell RNA-sequencing (scRNA-seq) techniques enables the quantitative characterization of cell types at a single-cell resolution. With the explosive growth of the number of cells profiled in individual scRNA-seq experiments, there is a demand for novel computational methods for classifying newly-generated scRNA-seq data onto annotated labels. Although several methods have recently been proposed for the cell-type classification of single-cell transcriptomic data, such limitations as inadequate accuracy, inferior robustness, and low stability greatly limit their wide applications.ResultsWe propose a novel ensemble approach, named EnClaSC, for accurate and robust cell-type classification of single-cell transcriptomic data. Through comprehensive validation experiments, we demonstrate that EnClaSC can not only be applied to the self-projection within a specific dataset and the cell-type classification across different datasets, but also scale up well to various data dimensionality and different data sparsity. We further illustrate the ability of EnClaSC to effectively make cross-species classification, which may shed light on the studies in correlation of different species. EnClaSC is freely available at https://github.com/xy-chen16/EnClaSC.ConclusionsEnClaSC enables highly accurate and robust cell-type classification of single-cell transcriptomic data via an ensemble learning method. We expect to see wide applications of our method to not only transcriptome studies, but also the classification of more general data.


2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Xiaoyang Chen ◽  
Shengquan Chen ◽  
Rui Jiang

Abstract Background In recent years, the rapid development of single-cell RNA-sequencing (scRNA-seq) techniques enables the quantitative characterization of cell types at a single-cell resolution. With the explosive growth of the number of cells profiled in individual scRNA-seq experiments, there is a demand for novel computational methods for classifying newly-generated scRNA-seq data onto annotated labels. Although several methods have recently been proposed for the cell-type classification of single-cell transcriptomic data, such limitations as inadequate accuracy, inferior robustness, and low stability greatly limit their wide applications. Results We propose a novel ensemble approach, named EnClaSC, for accurate and robust cell-type classification of single-cell transcriptomic data. Through comprehensive validation experiments, we demonstrate that EnClaSC can not only be applied to the self-projection within a specific dataset and the cell-type classification across different datasets, but also scale up well to various data dimensionality and different data sparsity. We further illustrate the ability of EnClaSC to effectively make cross-species classification, which may shed light on the studies in correlation of different species. EnClaSC is freely available at https://github.com/xy-chen16/EnClaSC. Conclusions EnClaSC enables highly accurate and robust cell-type classification of single-cell transcriptomic data via an ensemble learning method. We expect to see wide applications of our method to not only transcriptome studies, but also the classification of more general data.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Deepa Bhartiya

AbstractLife-long tissue homeostasis of adult tissues is supposedly maintained by the resident stem cells. These stem cells are quiescent in nature and rarely divide to self-renew and give rise to tissue-specific “progenitors” (lineage-restricted and tissue-committed) which divide rapidly and differentiate into tissue-specific cell types. However, it has proved difficult to isolate these quiescent stem cells as a physical entity. Recent single-cell RNAseq studies on several adult tissues including ovary, prostate, and cardiac tissues have not been able to detect stem cells. Thus, it has been postulated that adult cells dedifferentiate to stem-like state to ensure regeneration and can be defined as cells capable to replace lost cells through mitosis. This idea challenges basic paradigm of development biology regarding plasticity that a cell enters point of no return once it initiates differentiation. The underlying reason for this dilemma is that we are putting stem cells and somatic cells together while processing for various studies. Stem cells and adult mature cell types are distinct entities; stem cells are quiescent, small in size, and with minimal organelles whereas the mature cells are metabolically active and have multiple organelles lying in abundant cytoplasm. As a result, they do not pellet down together when centrifuged at 100–350g. At this speed, mature cells get collected but stem cells remain buoyant and can be pelleted by centrifuging at 1000g. Thus, inability to detect stem cells in recently published single-cell RNAseq studies is because the stem cells were unknowingly discarded while processing and were never subjected to RNAseq. This needs to be kept in mind before proposing to redefine adult stem cells.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kip D. Zimmerman ◽  
Mark A. Espeland ◽  
Carl D. Langefeld

AbstractCells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ryan B. Patterson-Cross ◽  
Ariel J. Levine ◽  
Vilas Menon

Abstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.


2019 ◽  
Vol 2 (1) ◽  
pp. 97-109 ◽  
Author(s):  
Jinchu Vijay ◽  
Marie-Frédérique Gauthier ◽  
Rebecca L. Biswell ◽  
Daniel A. Louiselle ◽  
Jeffrey J. Johnston ◽  
...  

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Prashant Rajbhandari ◽  
Douglas Arneson ◽  
Sydney K Hart ◽  
In Sook Ahn ◽  
Graciel Diamante ◽  
...  

Immune cells are vital constituents of the adipose microenvironment that influence both local and systemic lipid metabolism. Mice lacking IL10 have enhanced thermogenesis, but the roles of specific cell types in the metabolic response to IL10 remain to be defined. We demonstrate here that selective loss of IL10 receptor α in adipocytes recapitulates the beneficial effects of global IL10 deletion, and that local crosstalk between IL10-producing immune cells and adipocytes is a determinant of thermogenesis and systemic energy balance. Single Nuclei Adipocyte RNA-sequencing (SNAP-seq) of subcutaneous adipose tissue defined a metabolically-active mature adipocyte subtype characterized by robust expression of genes involved in thermogenesis whose transcriptome was selectively responsive to IL10Rα deletion. Furthermore, single-cell transcriptomic analysis of adipose stromal populations identified lymphocytes as a key source of IL10 production in response to thermogenic stimuli. These findings implicate adaptive immune cell-adipocyte communication in the maintenance of adipose subtype identity and function.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chunxiang Wang ◽  
Xin Gao ◽  
Juntao Liu

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Yehuda Schlesinger ◽  
Oshri Yosefov-Levi ◽  
Dror Kolodkin-Gal ◽  
Roy Zvi Granit ◽  
Luriano Peters ◽  
...  

Abstract Acinar metaplasia is an initial step in a series of events that can lead to pancreatic cancer. Here we perform single-cell RNA-sequencing of mouse pancreas during the progression from preinvasive stages to tumor formation. Using a reporter gene, we identify metaplastic cells that originated from acinar cells and express two transcription factors, Onecut2 and Foxq1. Further analyses of metaplastic acinar cell heterogeneity define six acinar metaplastic cell types and states, including stomach-specific cell types. Localization of metaplastic cell types and mixture of different metaplastic cell types in the same pre-malignant lesion is shown. Finally, single-cell transcriptome analyses of tumor-associated stromal, immune, endothelial and fibroblast cells identify signals that may support tumor development, as well as the recruitment and education of immune cells. Our findings are consistent with the early, premalignant formation of an immunosuppressive environment mediated by interactions between acinar metaplastic cells and other cells in the microenvironment.


2019 ◽  
Vol 30 (11) ◽  
pp. 2159-2176 ◽  
Author(s):  
Zhenyuan Yu ◽  
Jinling Liao ◽  
Yang Chen ◽  
Chunlin Zou ◽  
Haiying Zhang ◽  
...  

BackgroundHaving a comprehensive map of the cellular anatomy of the normal human bladder is vital to understanding the cellular origins of benign bladder disease and bladder cancer.MethodsWe used single-cell RNA sequencing (scRNA-seq) of 12,423 cells from healthy human bladder tissue samples taken from patients with bladder cancer and 12,884 cells from mouse bladders to classify bladder cell types and their underlying functions.ResultsWe created a single-cell transcriptomic map of human and mouse bladders, including 16 clusters of human bladder cells and 15 clusters of mouse bladder cells. The homology and heterogeneity of human and mouse bladder cell types were compared and both conservative and heterogeneous aspects of human and mouse bladder evolution were identified. We also discovered two novel types of human bladder cells. One type is ADRA2A+ and HRH2+ interstitial cells which may be associated with nerve conduction and allergic reactions. The other type is TNNT1+ epithelial cells that may be involved with bladder emptying. We verify these TNNT1+ epithelial cells also occur in rat and mouse bladders.ConclusionsThis transcriptomic map provides a resource for studying bladder cell types, specific cell markers, signaling receptors, and genes that will help us to learn more about the relationship between bladder cell types and diseases.


Sign in / Sign up

Export Citation Format

Share Document