Mapping Human Hematopoietic Hierarchy at Single Cell Resolution by Microwell-seq

Mapping Intimacies ◽

10.1101/127217 ◽

2017 ◽

Author(s):

Shujing Lai ◽

Yang Xu ◽

Wentao Huang ◽

Mengmeng Jiang ◽

Haide Chen ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Low Cost ◽

Single Cells ◽

Cell Types ◽

Hematopoietic Stem ◽

Adult Human ◽

Type Definition ◽

Cellular Hierarchy ◽

Differentiation Pathways

SummaryThe classical hematopoietic hierarchy, which is mainly built with fluorescence-activated cell sorting (FACS) technology, proves to be inaccurate in recent studies. Single cell RNA-seq (scRNA-seq) analysis provides a solution to overcome the limit of FACS-based cell type definition system for the dissection of complex cellular hierarchy. However, large-scale scRNA-seq is constrained by the throughput and cost of traditional methods. Here, we developed Microwell-seq, a high-throughput and low-cost scRNA-seq platform using extremely simple devices. Using Microwell-seq, we constructed a single-cell resolution transcriptome atlas of human hematopoietic differentiation hierarchy by profiling more than 50,000 single cells throughout adult human hematopoietic system. We found that adult human hematopoietic stem and progenitor cell (HSPC) compartment is dominated by progenitors primed with lineage specific regulators. Our analysis revealed differentiation pathways for each cell types, through which HSPCs directly progress to lineage biased progenitors before differentiation. We propose a revised adult human hematopoietic hierarchy independent of oligopotent progenitors. Our study also demonstrates the broad applicability of Microwell-seq technology.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz063 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1209-1223 ◽

Cited By ~ 13

Author(s):

Raphael Petegrosso ◽

Zhuliu Li ◽

Rui Kuang

Keyword(s):

Machine Learning ◽

Single Cell ◽

Statistical Methods ◽

Large Scale ◽

Time Series Data ◽

Single Cells ◽

Transcriptome Profiling ◽

Cell Types ◽

Series Data ◽

Sequencing Data

Abstract Single-cell RNAsequencing (scRNA-seq) technologies have enabled the large-scale whole-transcriptome profiling of each individual single cell in a cell population. A core analysis of the scRNA-seq transcriptome profiles is to cluster the single cells to reveal cell subtypes and infer cell lineages based on the relations among the cells. This article reviews the machine learning and statistical methods for clustering scRNA-seq transcriptomes developed in the past few years. The review focuses on how conventional clustering techniques such as hierarchical clustering, graph-based clustering, mixture models, $k$-means, ensemble learning, neural networks and density-based clustering are modified or customized to tackle the unique challenges in scRNA-seq data analysis, such as the dropout of low-expression genes, low and uneven read coverage of transcripts, highly variable total mRNAs from single cells and ambiguous cell markers in the presence of technical biases and irrelevant confounding biological variations. We review how cell-specific normalization, the imputation of dropouts and dimension reduction methods can be applied with new statistical or optimization strategies to improve the clustering of single cells. We will also introduce those more advanced approaches to cluster scRNA-seq transcriptomes in time series data and multiple cell populations and to detect rare cell types. Several software packages developed to support the cluster analysis of scRNA-seq data are also reviewed and experimentally compared to evaluate their performance and efficiency. Finally, we conclude with useful observations and possible future directions in scRNA-seq data analytics. Availability All the source code and data are available at https://github.com/kuanglab/single-cell-review.

Download Full-text

Saturating Single-Cell atlas Datasets

10.1101/218370 ◽

2017 ◽

Cited By ~ 2

Author(s):

Aparna Bhaduri ◽

Tomasz J. Nowakowski ◽

Alex A. Pollen ◽

Arnold R. Kriegstein

Keyword(s):

Population Structure ◽

Single Cell ◽

Mouse Brain ◽

Large Scale ◽

Single Cells ◽

Cost Effective ◽

Cell Types ◽

Cell Number ◽

Cell Type ◽

The Relationship

AbstractHigh throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. Efficient generation of such an atlas will depend on sufficient sampling of the diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. To examine the relationship between cell number and transcriptional heterogeneity in the context of unbiased cell type classification, we explicitly explored the population structure of a publically available 1.3 million cell dataset from the E18.5 mouse brain. We propose a computational framework for inferring the saturation point of cluster discovery in a single cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a “complexity index”, which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells (20,000). Together, these findings suggest that most of the biologically interpretable insights from the 1.3 million cells can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high “cellular coverage”, the much anticipated cell atlasing studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage.Recent efforts seek to create a comprehensive cell atlas of the human body1,2 Current technology, however, makes it precipitously expensive to perform analysis of every cell. Therefore, designing effective sampling strategies be critical to generate a working atlas in an efficient, cost-effective, and streamlined manner. The advent of single cell and single nucleus mRNA sequencing (RNAseq) in droplet format3,4 now enables large scale sampling of cells from any tissue, and a recently released publicly available dataset of 1.3 million single cells from the E18.5 mouse brain generated with the 10X Chromium5 provides an opportunity to explore the relationship between population structure and the number of sampled cells necessary to reveal the underlying diversity of cell types. Here, we present a framework for how researchers can evaluate whether a dataset has reached saturation, and we estimate how many cells would be required to generate an atlas of the sample analyzed here. This framework can be applied to any organ or cell type specific atlas for any organism.

Download Full-text

A universal approach for integrating super large-scale single-cell transcriptomes by exploring gene rankings

10.1101/2021.08.23.457305 ◽

2021 ◽

Author(s):

Hongru Shen ◽

Xilin Shen ◽

Mengyao Feng ◽

Dan Wu ◽

Chao Zhang ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cells ◽

Gene Interaction ◽

Cell Types ◽

Specific Cell ◽

Expression Data ◽

Gene Interaction Networks ◽

Universal Approach ◽

Cell Expression

Advancement in single-cell RNA sequencing leads to exponential accumulation of single-cell expression data. However, there is still lack of tools that could integrate these unlimited accumulation of single-cell expression data. Here, we presented a universal approach iSEEEK for integrating super large-scale single-cell expression via exploring expression rankings of top-expressing genes. We developed iSEEEK with 13.7 million single-cells. We demonstrated the efficiency of iSEEEK with canonical single-cell downstream tasks on five heterogenous datasets encompassing human and mouse samples. iSEEEK achieved good clustering performance benchmarked against well-annotated cell labels. In addition, iSEEEK could transfer its knowledge learned from large-scale expression data on new dataset that was not involved in its development. iSEEEK enables identification of gene-gene interaction networks that are characteristic of specific cell types. Our study presents a simple and yet effective method to integrate super large-scale single-cell transcriptomes and would facilitate translational single-cell research from bench to bedside.

Download Full-text

Emergent Statistical Laws in Single-Cell Transcriptomic Data

10.1101/2021.06.16.448706 ◽

2021 ◽

Author(s):

Silvia Lazzardi ◽

Filippo Valle ◽

Andrea Mazzolini ◽

Antonio Scialdone ◽

Michele Caselle ◽

...

Keyword(s):

Single Cell ◽

Messenger Rna ◽

Large Scale ◽

Single Cells ◽

Building Blocks ◽

Cell Types ◽

Ecological Niches ◽

Mathematical Framework ◽

Transcriptomic Data ◽

Statistical Laws

Large scale data on single-cell gene expression have the potential to unravel the specific transcriptional programs of different cell types. The structure of these expression datasets suggests a similarity with several other complex systems that can be analogously described through the statistics of their basic building blocks. Transcriptomes of single cells are collections of messenger RNA abundances transcribed from a common set of genes just as books are different collections of words from a shared vocabulary, genomes of different species are specific compositions of genes belonging to evolutionary families, and ecological niches can be described by their species abundances. Following this analogy, we identify several emergent statistical laws in single-cell transcriptomic data closely similar to regularities found in linguistics, ecology or genomics. A simple mathematical framework can be used to analyze the relations between different laws and the possible mechanisms behind their ubiquity. Importantly, treatable statistical models can be useful tools in transcriptomics to disentangle the actual biological variability from general statistical effects present in most component systems and from the consequences of the sampling process inherent to the experimental technique.

Download Full-text

Comparative Transcriptomic Analysis of the Hematopoietic System between Human and Mouse by Single Cell RNA Sequencing

Cells ◽

10.3390/cells10050973 ◽

2021 ◽

Vol 10 (5) ◽

pp. 973

Author(s):

Shouguo Gao ◽

Zhijie Wu ◽

Jeerthi Kannan ◽

Liza Mathews ◽

Xingmin Feng ◽

...

Keyword(s):

Single Cell ◽

Gene Networks ◽

Single Cells ◽

Cell Types ◽

Regulatory Elements ◽

Regulatory Sequence ◽

Sequence Motifs ◽

Hematopoietic Stem ◽

Stem And Progenitor Cells ◽

Human And Mouse

(1) Background: mouse models are fundamental to the study of hematopoiesis, but comparisons between mouse and human in single cells have been limited in depth. (2) Methods: we constructed a single-cell resolution transcriptomic atlas of hematopoietic stem and progenitor cells (HSPCs) of human and mouse, from a total of 32,805 single cells. We used Monocle to examine the trajectories of hematopoietic differentiation, and SCENIC to analyze gene networks underlying hematopoiesis. (3) Results: After alignment with Seurat 2, the cells of mouse and human could be separated by same cell type categories. Cells were grouped into 17 subpopulations; cluster-specific genes were species-conserved and shared functional themes. The clustering dendrogram indicated that cell types were highly conserved between human and mouse. A visualization of the Monocle results provided an intuitive representation of HSPC differentiation to three dominant branches (Erythroid/megakaryocytic, Myeloid, and Lymphoid), derived directly from the hematopoietic stem cell and the long-term hematopoietic stem cells in both human and mouse. Gene regulation was similarly conserved, reflected by comparable transcriptional factors and regulatory sequence motifs in subpopulations of cells. (4) Conclusions: our analysis has confirmed evolutionary conservation in the hematopoietic systems of mouse and human, extending to cell types, gene expression and regulatory elements.

Download Full-text

Scalable pooled CRISPR screens with single-cell chromatin accessibility profiling

10.1101/2020.11.20.390971 ◽

2020 ◽

Author(s):

Noa Liscovitch-Brauer ◽

Antonino Montalbano ◽

Jiale Deng ◽

Alejandro Méndez-Mancilla ◽

Hans-Hermann Wessels ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Expression Profiles ◽

Low Cost ◽

Single Cells ◽

Gene Expression Profiles ◽

Chromatin Accessibility ◽

Myelogenous Leukemia ◽

Chromatin Remodelers ◽

Genetic Perturbations

AbstractPooled CRISPR screens have been used to identify genes responsible for specific phenotypes and diseases, and, more recently, to connect genetic perturbations with multi-dimensional gene expression profiles. Here, we describe a method to link genome-wide chromatin accessibility to genetic perturbations in single cells. This scalable, cost-effective method combines pooled CRISPR perturbations with a single-cell combinatorial indexing assay for transposase-accessible chromatin (CRISPR-sciATAC). Using a human and mouse species-mixing experiment, we show that CRISPR-sciATAC separates single cells with a low doublet rate. Then, in human myelogenous leukemia cells, we apply CRISPR-sciATAC to target 21 chromatin-related genes that are frequently mutated in cancer and 84 subunits and cofactors of chromatin remodeling complexes, generating chromatin accessibility data for ~30,000 single cells. Using this large-scale atlas, we correlate loss of specific chromatin remodelers with changes in accessibility — globally and at the binding sites of individual transcription factors. For example, we show that loss of the H3K27 methyltransferase EZH2 leads to increased accessibility at heterochromatic regions involved in embryonic development and triggers expression of multiple genes in the HOXA and HOXD clusters. At a subset of regulatory sites, we also analyze dynamic changes in nucleosome spacing upon loss of chromatin remodelers. CRISPR-sciATAC is a high-throughput, low-cost single-cell method that can be applied broadly to study the role of genetic perturbations on chromatin in normal and disease states.

Download Full-text

One Cell At a Time: A Unified Framework to Integrate and Analyze Single-cell RNA-seq Data

10.1101/2021.05.12.443814 ◽

2021 ◽

Author(s):

Chloe Xueqi Wang ◽

Lin Zhang ◽

Bo Wang

Keyword(s):

Single Cell ◽

Large Scale ◽

Gene Selection ◽

De Novo ◽

Single Cells ◽

Cell Types ◽

Biological Information ◽

Rna Seq ◽

Unified Framework ◽

Gene Expressions

The surge of single-cell RNA sequencing technologies enables the accessibility to large single-cell RNA-seq datasets at the scale of hundreds of thousands of single cells. Integrative analysis of large-scale scRNA-seq datasets has the potential of revealing de novo cell types as well as aggregating biological information. However, most existing methods fail to integrate multiple large-scale scRNA-seq datasets in a computational and memory efficient way. We hereby propose OCAT, One Cell At a Time, a graph-based method that sparsely encodes single-cell gene expressions to integrate data from multiple sources without most variable gene selection or explicit batch effect correction. We demonstrate that OCAT efficiently integrates multiple scRNA-seq datasets and achieves the state-of-the-art performance in cell-type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT facilitates a variety of downstream analyses, such as gene prioritization, trajectory inference, pseudotime inference and cell inference. OCAT is a unifying tool to simplify and expedite single-cell data analysis.

Download Full-text

Molecular characteristics and spatial distribution of adult human corneal cell subtypes

Scientific Reports ◽

10.1038/s41598-021-94933-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ann J. Ligocki ◽

Wen Fury ◽

Christian Gutierrez ◽

Christina Adler ◽

Tao Yang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cross Sections ◽

Cell Types ◽

Marker Genes ◽

Molecular Characteristics ◽

Transcriptional Level ◽

Human Cornea ◽

Adult Human ◽

And Migration

AbstractBulk RNA sequencing of a tissue captures the gene expression profile from all cell types combined. Single-cell RNA sequencing identifies discrete cell-signatures based on transcriptomic identities. Six adult human corneas were processed for single-cell RNAseq and 16 cell clusters were bioinformatically identified. Based on their transcriptomic signatures and RNAscope results using representative cluster marker genes on human cornea cross-sections, these clusters were confirmed to be stromal keratocytes, endothelium, several subtypes of corneal epithelium, conjunctival epithelium, and supportive cells in the limbal stem cell niche. The complexity of the epithelial cell layer was captured by eight distinct corneal clusters and three conjunctival clusters. These were further characterized by enriched biological pathways and molecular characteristics which revealed novel groupings related to development, function, and location within the epithelial layer. Moreover, epithelial subtypes were found to reflect their initial generation in the limbal region, differentiation, and migration through to mature epithelial cells. The single-cell map of the human cornea deepens the knowledge of the cellular subsets of the cornea on a whole genome transcriptional level. This information can be applied to better understand normal corneal biology, serve as a reference to understand corneal disease pathology, and provide potential insights into therapeutic approaches.

Download Full-text

HiCImpute: A Bayesian Hierarchical Model for Identifying Structural Zeros and Enhancing Single Cell Hi-C Data.

10.1101/2021.09.01.458575 ◽

2021 ◽

Author(s):

Qing Xie ◽

Chengong Han ◽

Victor Jin ◽

Shili Lin

Keyword(s):

Quality Improvement ◽

Data Quality ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Real Data ◽

Cell Types ◽

Sequencing Depth ◽

2D Data ◽

Structural Zeros

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.

Download Full-text