scholarly journals Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

2019 ◽  
Author(s):  
Meenakshi Venkatasubramanian ◽  
Kashish Chetal ◽  
Gowtham Atluri ◽  
Nathan Salomonis

ABSTRACTThe rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity have increased, most existing algorithms require significant user-tuning, are heavily reliant on dimensionality reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Here, we describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster “fitness”, SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from the Human Cell Atlas, we show that the PageRank algorithm effectively down samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar distinct cell-types and while recovering novel transcriptionally unique cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.HighlightsICGS2 outperforms alternative approaches in small and ultra-large benchmark datasetsIntegrates multiple solutions for cell-type detection with supervised refinementScales effectively to resolve rare cell-states from ultra-large datasets using PageRank sampling with a low memory footprintIntegrated into AltAnalyze to enable sophisticated and automated downstream analysis

2020 ◽  
Vol 36 (12) ◽  
pp. 3773-3780 ◽  
Author(s):  
Meenakshi Venkatasubramanian ◽  
Kashish Chetal ◽  
Daniel J Schnell ◽  
Gowtham Atluri ◽  
Nathan Salomonis

Abstract Motivation The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Results We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. Availability and implementation ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Tamim Abdelaal ◽  
Vincent van Unen ◽  
Thomas Höllt ◽  
Frits Koning ◽  
Marcel J.T. Reinders ◽  
...  

AbstractMotivationMass cytometry (CyTOF) is a valuable technology for high-dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, however, they are time consuming, often involve a manual step, and lack reproducibility when new data is included in the analysis. Learning cell types from an annotated set of cells solves these problems. However, currently available mass cytometry classifiers are either complex, dependent on prior knowledge of the cell type markers during the learning process, or can only identify canonical cell types.ResultsWe propose to use a Linear Discriminant Analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA shows comparable results with two state-of-the-art algorithms on four benchmark datasets and also outperforms a non-linear classifier such as the k-nearest neighbour classifier. To illustrate its scalability to large datasets with deeply annotated cell subtypes, we apply LDA to a dataset of ~3.5 million cells representing 57 cell types. LDA has high performance on abundant cell types as well as the majority of rare cell types, and provides accurate estimates of cell type frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify cell types that were not encountered during training. Altogether, reproducible prediction of cell type compositions using LDA opens up possibilities to analyse large cohort studies based on mass cytometry data.AvailabilityImplementation is available on GitHub (https://github.com/tabdelaal/CyTOF-Linear-Classifier)[email protected]


2021 ◽  
Vol 7 (10) ◽  
pp. eabc5464
Author(s):  
Kiya W. Govek ◽  
Emma C. Troisi ◽  
Zhen Miao ◽  
Rachael G. Aubin ◽  
Steven Woodhouse ◽  
...  

Highly multiplexed immunohistochemistry (mIHC) enables the staining and quantification of dozens of antigens in a tissue section with single-cell resolution. However, annotating cell populations that differ little in the profiled antigens or for which the antibody panel does not include specific markers is challenging. To overcome this obstacle, we have developed an approach for enriching mIHC images with single-cell RNA sequencing data, building upon recent experimental procedures for augmenting single-cell transcriptomes with concurrent antigen measurements. Spatially-resolved Transcriptomics via Epitope Anchoring (STvEA) performs transcriptome-guided annotation of highly multiplexed cytometry datasets. It increases the level of detail in histological analyses by enabling the systematic annotation of nuanced cell populations, spatial patterns of transcription, and interactions between cell types. We demonstrate the utility of STvEA by uncovering the architecture of poorly characterized cell types in the murine spleen using published cytometry and mIHC data of this organ.


2020 ◽  
Author(s):  
Feng Tian ◽  
Fan Zhou ◽  
Xiang Li ◽  
Wenping Ma ◽  
Honggui Wu ◽  
...  

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM


2021 ◽  
Author(s):  
Tallulah S Andrews ◽  
Jawairia Atif ◽  
Jeff C Liu ◽  
Catia T Perciani ◽  
Xue-Zhong Ma ◽  
...  

The critical functions of the human liver are coordinated through the interactions of hepatic parenchymal and non-parenchymal cells. Recent advances in single cell transcriptional approaches have enabled an examination of the human liver with unprecedented resolution. However, dissociation related cell perturbation can limit the ability to fully capture the human liver's parenchymal cell fraction, which limits the ability to comprehensively profile this organ. Here, we report the transcriptional landscape of 73,295 cells from the human liver using matched single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq). The addition of snRNA-seq enabled the characterization of interzonal hepatocytes at single-cell resolution, revealed the presence of rare subtypes of hepatic stellate cells previously only seen in disease, and detection of cholangiocyte progenitors that had only been observed during in vitro differentiation experiments. However, T and B lymphocytes and NK cells were only distinguishable using scRNA-seq, highlighting the importance of applying both technologies to obtain a complete map of tissue-resident cell-types. We validated the distinct spatial distribution of the hepatocyte, cholangiocyte and stellate cell populations by an independent spatial transcriptomics dataset and immunohistochemistry. Our study provides a systematic comparison of the transcriptomes captured by scRNA-seq and snRNA-seq and delivers a high-resolution map of the parenchymal cell populations in the healthy human liver.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lixing Huang ◽  
Ying Qiao ◽  
Wei Xu ◽  
Linfeng Gong ◽  
Rongchao He ◽  
...  

Fish is considered as a supreme model for clarifying the evolution and regulatory mechanism of vertebrate immunity. However, the knowledge of distinct immune cell populations in fish is still limited, and further development of techniques advancing the identification of fish immune cell populations and their functions are required. Single cell RNA-seq (scRNA-seq) has provided a new approach for effective in-depth identification and characterization of cell subpopulations. Current approaches for scRNA-seq data analysis usually rely on comparison with a reference genome and hence are not suited for samples without any reference genome, which is currently very common in fish research. Here, we present an alternative, i.e. scRNA-seq data analysis with a full-length transcriptome as a reference, and evaluate this approach on samples from Epinephelus coioides-a teleost without any published genome. We show that it reconstructs well most of the present transcripts in the scRNA-seq data achieving a sensitivity equivalent to approaches relying on genome alignments of related species. Based on cell heterogeneity and known markers, we characterized four cell types: T cells, B cells, monocytes/macrophages (Mo/MΦ) and NCC (non-specific cytotoxic cells). Further analysis indicated the presence of two subsets of Mo/MΦ including M1 and M2 type, as well as four subsets in B cells, i.e. mature B cells, immature B cells, pre B cells and early-pre B cells. Our research will provide new clues for understanding biological characteristics, development and function of immune cell populations of teleost. Furthermore, our approach provides a reliable alternative for scRNA-seq data analysis in teleost for which no reference genome is currently available.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1306 ◽  
Author(s):  
Clarence K. Mah ◽  
Alexander T. Wenzel ◽  
Edwin F. Juarez ◽  
Thorin Tabor ◽  
Michael M. Reich ◽  
...  

Single-cell RNA sequencing (scRNA-seq) has emerged as a popular method to profile gene expression at the resolution of individual cells. While there have been methods and software specifically developed to analyze scRNA-seq data, they are most accessible to users who program. We have created a scRNA-seq clustering analysis GenePattern Notebook that provides an interactive, easy-to-use interface for data analysis and exploration of scRNA-Seq data, without the need to write or view any code. The notebook provides a standard scRNA-seq analysis workflow for pre-processing data, identification of sub-populations of cells by clustering, and exploration of biomarkers to characterize heterogeneous cell populations and delineate cell types.


2021 ◽  
Author(s):  
Jinyue Liao ◽  
Hoi Ching Suen ◽  
Shitao Rao ◽  
Alfred Chun Shui Luk ◽  
Ruoyu Zhang ◽  
...  

AbstractSpermatogenesis depends on an orchestrated series of developing events in germ cells and full maturation of the somatic microenvironment. To date, the majority of efforts to study cellular heterogeneity in testis has been focused on single-cell gene expression rather than the chromatin landscape shaping gene expression. To advance our understanding of the regulatory programs underlying testicular cell types, we analyzed single-cell chromatin accessibility profiles in more than 25,000 cells from mouse developing testis. We showed that scATAC-Seq allowed us to deconvolve distinct cell populations and identify cis-regulatory elements (CREs) underlying cell type specification. We identified sets of transcription factors associated with cell type-specific accessibility, revealing novel regulators of cell fate specification and maintenance. Pseudotime reconstruction revealed detailed regulatory dynamics coordinating the sequential developmental progressions of germ cells and somatic cells. This high-resolution data also revealed putative stem cells within the Sertoli and Leydig cell populations. Further, we defined candidate target cell types and genes of several GWAS signals, including those associated with testosterone levels and coronary artery disease. Collectively, our data provide a blueprint of the ‘regulon’ of the mouse male germline and supporting somatic cells.


Author(s):  
Congcong Cao ◽  
Qian Ma ◽  
Shaomei Mo ◽  
Ge Shu ◽  
Qunlong Liu ◽  
...  

Androgen receptor (AR) signaling is essential for maintaining spermatogenesis and male fertility. However, the molecular mechanisms by which AR acts between male germ cells and somatic cells during spermatogenesis have not begun to be revealed until recently. With the advances obtained from the use of transgenic mice lacking AR in Sertoli cells (SCARKO) and single-cell transcriptomic sequencing (scRNA-seq), the cell specific targets of AR action as well as the genes and signaling pathways that are regulated by AR are being identified. In this study, we collected scRNA-seq data from wild-type (WT) and SCARKO mice testes at p20 and identified four somatic cell populations and two male germ cell populations. Further analysis identified that the distribution of Sertoli cells was completely different and uncovered the cellular heterogeneity and transcriptional changes between WT and SCARKO Sertoli cells. In addition, several differentially expressed genes (DEGs) in SCARKO Sertoli cells, many of which have been previously implicated in cell cycle, apoptosis and male infertility, have also been identified. Together, our research explores a novel perspective on the changes in the transcription level of various cell types between WT and SCARKO mice testes, providing new insights for the investigations of the molecular and cellular processes regulated by AR signaling in Sertoli cells.


2019 ◽  
Author(s):  
Andrea J De Micheli ◽  
Jacob B Swanson ◽  
Nathaniel P Disser ◽  
Leandro M Martinez ◽  
Nicholas R Walker ◽  
...  

AbstractTendon is a connective tissue that transmits forces between muscles and bones. Cellular heterogeneity is increasingly recognized as an important factor in the biological basis of tissue homeostasis and disease, but little is known about the diversity of cells that populate tendon. Our objective was to explore the heterogeneity of cells in mouse Achilles tendons using single-cell RNA sequencing. We assembled a transcriptomic atlas and identified 11 distinct cell types in tendons, including 3 previously undescribed populations of fibroblasts. Using trajectory inference analysis, we provide additional support for the notion that pericytes are progenitor cells for the fibroblasts that compose adult tendons. We also modeled cell-interactions and identified ligand-receptor pairs involved in tendon homeostasis. Our findings highlight notable heterogeneity between and within tendon cell populations, which may contribute to our understanding of tendon extracellular matrix assembly and maintenance, and inform the design of therapies to treat tendinopathies.


Sign in / Sign up

Export Citation Format

Share Document