Addressing the looming identity crisis in single cell RNA-seq

Mapping Intimacies ◽

10.1101/150524 ◽

2017 ◽

Cited By ~ 3

Author(s):

Megan Crow ◽

Anirban Paul ◽

Sara Ballouz ◽

Z. Josh Huang ◽

Jesse Gillis

Keyword(s):

Single Cell ◽

Large Scale ◽

Ad Hoc ◽

Meta Analysis ◽

High Specificity ◽

Cell Types ◽

Marker Genes ◽

Rna Seq ◽

Cortical Interneuron ◽

Large Sets

AbstractSingle cell RNA-sequencing technology (scRNA-seq) provides a new avenue to discover and characterize cell types, but the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine the replicability of these studies. Meta-analysis of rapidly accumulating data is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that allows researchers to quantify the degree to which cell types replicate across datasets, and to rapidly identify clusters with high similarity for further testing. We first measure the replicability of neuronal identity by comparing more than 13 thousand individual scRNA-seq transcriptomes, sampling with high specificity from within the data to define a range of robust practices. We then assess cross-dataset evidence for novel cortical interneuron subtypes identified by scRNA-seq and find that 24/45 cortical interneuron subtypes have evidence of replication in at least one other study. Identifying these putative replicates allows us to re-analyze the data for differential expression and provide lists of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types and subtypes with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.

Download Full-text

Meta-Analysis of cortical inhibitory interneurons markers landscape and their performances in scRNA-seq studies.

10.1101/2021.11.03.467049 ◽

2021 ◽

Author(s):

Lorenzo Martini ◽

Roberta Bardini ◽

Stefano Di Carlo

Keyword(s):

Single Cell ◽

Meta Analysis ◽

Cell Types ◽

Cellular Heterogeneity ◽

Marker Genes ◽

Inhibitory Interneurons ◽

Rna Seq ◽

Circuit Function ◽

The Brain

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.

Download Full-text

scQuery: a web server for comparative analysis of single-cell RNA-seq data

10.1101/323238 ◽

2018 ◽

Author(s):

Amir Alavi ◽

Matthew Ruffalo ◽

Aiyappa Parvangada ◽

Zhilin Huang ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Large Scale ◽

Web Server ◽

Cell Types ◽

Marker Genes ◽

Heterogeneous Environments ◽

Rna Seq ◽

Cell Type ◽

Small Set ◽

Unique Cell

SummarySingle cell RNA-Seq (scRNA-seq) studies often profile upward of thousands of cells in heterogeneous environments. Current methods for characterizing cells perform unsupervised analysis followed by assignment using a small set of known marker genes. Such approaches are limited to a few, well characterized cell types. To enable large scale supervised characterization we developed an automated pipeline to download, process, and annotate publicly available scRNA-seq datasets. We extended supervised neural networks to obtain efficient and accurate representations for scRNA-seq data. We applied our pipeline to analyze data from over 500 different studies with over 300 unique cell types and show that supervised methods greatly outperform unsupervised methods for cell type identification. A case study of neural degeneration data highlights the ability of these methods to identify differences between cell type distributions in healthy and diseased mice. We implemented a web server that compares new datasets to collected data employing fast matching methods in order to determine cell types, key genes, similar prior studies, and more.

Download Full-text

An overview of algorithms and associated applications for single cell RNA-Seq data imputation

Current Genomics ◽

10.2174/1389202921999200716104916 ◽

2020 ◽

Vol 21 ◽

Author(s):

Zarrin Basharat ◽

Sania Majeed ◽

Humaira Saleem ◽

Ishtiaq Ahmad Khan ◽

Azra Yasmin

Keyword(s):

Single Cell ◽

Large Scale ◽

Missing Values ◽

Ad Hoc ◽

Cell Types ◽

Learning Approaches ◽

Data Imputation ◽

Rna Seq ◽

Accurate Analysis ◽

Heterogeneous Datasets

: Single cell RNA-Seq technology enables assessment of RNA expression in individual cells. This makes it popular in experimental biology for gleaning specifications of novel cell types as well as inferring heterogeneity. Experimental data conventionally contains zero counts or dropout events for many single cell transcripts. Such missing data hampers the accurate analysis using standard workflows, designed for massive RNA-Seq datasets. Imputation for single cell datasets is done to infer the missing values. This was traditionally done with ad-hoc code but later customized pipelines, workflows and specialized softwares appeared for the purpose. This made it easy to benchmark and cluster things in an organized manner. In this review, we have assembled a catalog of available RNA-Seq single cell imputation algorithms/workflows and associated softwares for the scientific community performing single-cell RNA-Seq data analysis. Continued development of imputation methods, especially using deep learning approaches would be necessary for eradicating associated pitfalls and addressing challenges associated with future large scale and heterogeneous datasets.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Single-nuclei RNA-seq on human retinal tissue provides improved transcriptome profiling

Nature Communications ◽

10.1038/s41467-019-12917-9 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 16

Author(s):

Qingnan Liang ◽

Rachayata Dharmat ◽

Leah Owen ◽

Akbar Shakoor ◽

Yumei Li ◽

...

Keyword(s):

Single Cell ◽

Transcriptome Profiling ◽

Cell Types ◽

Retinal Cell ◽

Peripheral Retina ◽

Marker Genes ◽

Rna Seq ◽

Cell Type ◽

Retinal Tissue ◽

The Individual

AbstractSingle-cell RNA-seq is a powerful tool in decoding the heterogeneity in complex tissues by generating transcriptomic profiles of the individual cell. Here, we report a single-nuclei RNA-seq (snRNA-seq) transcriptomic study on human retinal tissue, which is composed of multiple cell types with distinct functions. Six samples from three healthy donors are profiled and high-quality RNA-seq data is obtained for 5873 single nuclei. All major retinal cell types are observed and marker genes for each cell type are identified. The gene expression of the macular and peripheral retina is compared to each other at cell-type level. Furthermore, our dataset shows an improved power for prioritizing genes associated with human retinal diseases compared to both mouse single-cell RNA-seq and human bulk RNA-seq results. In conclusion, we demonstrate that obtaining single cell transcriptomes from human frozen tissues can provide insight missed by either human bulk RNA-seq or animal models.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

10.1101/2020.06.15.151910 ◽

2020 ◽

Author(s):

Van Hoan Do ◽

Francisca Rojas Ringeling ◽

Stefan Canzar

Keyword(s):

Single Cell ◽

Large Scale ◽

Linear Time ◽

Cell Types ◽

Substantial Improvement ◽

Rna Seq ◽

Sampling Step ◽

Protein Marker ◽

Cluster Ensembles ◽

Sequencing Technologies

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.

Download Full-text

LRcell: detecting the source of differential expression at the sub-cell type level from bulk RNA-seq data

10.1101/2021.08.10.455821 ◽

2021 ◽

Author(s):

Wenjing Ma ◽

Sumeet Sharma ◽

Peng Jin ◽

Shannon L Gourley ◽

Zhaohui Qin

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Bioconductor Package ◽

Rna Seq ◽

Cell Type ◽

Reference Dataset ◽

Cell Type Composition ◽

Type Composition ◽

Differential Gene

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text

IKAP - Identifying K mAjor cell Population groups in single-cell RNA-seq analysis

10.1101/596817 ◽

2019 ◽

Author(s):

Yun-Ching Chen ◽

Abhilash Suresh ◽

Chingiz Underbayev ◽

Clare Sun ◽

Komudi Singh ◽

...

Keyword(s):

Single Cell ◽

Cell Population ◽

Cell Types ◽

Marker Genes ◽

Rna Seq ◽

Population Groups ◽

Tuning Parameters ◽

Multiple Datasets ◽

Cell Groups ◽

Cell Ontology

AbstractIn single-cell RNA-seq analysis, clustering cells into groups and differentiating cell groups by marker genes are two separate steps for investigating cell identity. However, results in clustering greatly affect the ability to differentiate between cell groups. We develop IKAP – an algorithm identifying major cell groups that improves differentiating by tuning parameters for clustering. Using multiple datasets, we demonstrate IKAP improves identification of major cell types and facilitates cell ontology curation.

Download Full-text