Missing Data and Technical Variability in Single-Cell RNA- Sequencing Experiments

Mapping Intimacies ◽

10.1101/025528 ◽

2015 ◽

Cited By ~ 32

Author(s):

Stephanie C Hicks ◽

F. William Townes ◽

Mingxiang Teng ◽

Rafael A Irizarry

Keyword(s):

Gene Expression ◽

Missing Data ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cells ◽

Systematic Errors ◽

Gene Expression Measurement ◽

Rna Seq ◽

Batch Effects

Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-Seq and scRNA-seq data are markedly different. In particular, unlike RNA-Seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, gene expressing RNA, but not at a sufficient level to detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.

Download Full-text

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

10.1101/052225 ◽

2016 ◽

Cited By ~ 7

Author(s):

Bo Wang ◽

Junjie Zhu ◽

Emma Pierson ◽

Daniele Ramazzotti ◽

Serafim Batzoglou

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Cell Populations ◽

Gene Expression Measurement ◽

Data Sets ◽

Similarity Learning ◽

Rna Seq ◽

High Level ◽

Cell Data

AbstractSingle-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. Here, we propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications. Benchmarking against state-of-the-art methods for these applications, we used SIMLR to re-analyse seven representative single-cell data sets, including high-throughput droplet-based data sets with tens of thousands of cells. We show that SIMLR greatly improves clustering sensitivity and accuracy, as well as the visualization and interpretability of the data.

Download Full-text

CCSN: Single Cell RNA Sequencing Data Analysis by Conditional Cell-specific Network

10.1101/2020.01.25.919829 ◽

2020 ◽

Author(s):

Lin Li ◽

Hao Dai ◽

Zhaoyuan Fang ◽

Luonan Chen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Network Flow ◽

Single Cells ◽

Cellular Heterogeneity ◽

Rna Seq ◽

Sequencing Data ◽

Cell Clustering ◽

A Cell

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.

Download Full-text

Cell Dissociation from Butterfly Pupal Wing Tissues for Single-Cell RNA Sequencing

Methods and Protocols ◽

10.3390/mps3040072 ◽

2020 ◽

Vol 3 (4) ◽

pp. 72

Author(s):

Anupama Prakash ◽

Antónia Monteiro

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Expression Patterns ◽

Cell Types ◽

Rna Seq ◽

Bicyclus Anynana ◽

Single Cell Sequencing ◽

Pupal Wing

Butterflies are well known for their beautiful wings and have been great systems to understand the ecology, evolution, genetics, and development of patterning and coloration. These color patterns are mosaics on the wing created by the tiling of individual units called scales, which develop from single cells. Traditionally, bulk RNA sequencing (RNA-seq) has been used extensively to identify the loci involved in wing color development and pattern formation. RNA-seq provides an averaged gene expression landscape of the entire wing tissue or of small dissected wing regions under consideration. However, to understand the gene expression patterns of the units of color, which are the scales, and to identify different scale cell types within a wing that produce different colors and scale structures, it is necessary to study single cells. This has recently been facilitated by the advent of single-cell sequencing. Here, we provide a detailed protocol for the dissociation of cells from Bicyclus anynana pupal wings to obtain a viable single-cell suspension for downstream single-cell sequencing. We outline our experimental design and the use of fluorescence-activated cell sorting (FACS) to obtain putative scale-building and socket cells based on size. Finally, we discuss some of the current challenges of this technique in studying single-cell scale development and suggest future avenues to address these challenges.

Download Full-text

Cryopreservation of human cancers conserves tumour heterogeneity for single-cell multi-omics analysis

Genome Medicine ◽

10.1186/s13073-021-00885-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Sunny Z. Wu ◽

Daniel L. Roden ◽

Ghamdan Al-Eryani ◽

Nenad Bartonicek ◽

Kate Harvey ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cells ◽

Cellular Heterogeneity ◽

Tumour Heterogeneity ◽

Fresh Tissue ◽

Human Cancers ◽

Cryopreserved Cell ◽

Single Cell Rna Sequencing

Abstract Background High throughput single-cell RNA sequencing (scRNA-Seq) has emerged as a powerful tool for exploring cellular heterogeneity among complex human cancers. scRNA-Seq studies using fresh human surgical tissue are logistically difficult, preclude histopathological triage of samples, and limit the ability to perform batch processing. This hindrance can often introduce technical biases when integrating patient datasets and increase experimental costs. Although tissue preservation methods have been previously explored to address such issues, it is yet to be examined on complex human tissues, such as solid cancers and on high throughput scRNA-Seq platforms. Methods Using the Chromium 10X platform, we sequenced a total of ~ 120,000 cells from fresh and cryopreserved replicates across three primary breast cancers, two primary prostate cancers and a cutaneous melanoma. We performed detailed analyses between cells from each condition to assess the effects of cryopreservation on cellular heterogeneity, cell quality, clustering and the identification of gene ontologies. In addition, we performed single-cell immunophenotyping using CITE-Seq on a single breast cancer sample cryopreserved as solid tissue fragments. Results Tumour heterogeneity identified from fresh tissues was largely conserved in cryopreserved replicates. We show that sequencing of single cells prepared from cryopreserved tissue fragments or from cryopreserved cell suspensions is comparable to sequenced cells prepared from fresh tissue, with cryopreserved cell suspensions displaying higher correlations with fresh tissue in gene expression. We showed that cryopreservation had minimal impacts on the results of downstream analyses such as biological pathway enrichment. For some tumours, cryopreservation modestly increased cell stress signatures compared to freshly analysed tissue. Further, we demonstrate the advantage of cryopreserving whole-cells for detecting cell-surface proteins using CITE-Seq, which is impossible using other preservation methods such as single nuclei-sequencing. Conclusions We show that the viable cryopreservation of human cancers provides high-quality single-cells for multi-omics analysis. Our study guides new experimental designs for tissue biobanking for future clinical single-cell RNA sequencing studies.

Download Full-text

P02.10 FocuSCOPE: a single cell, multi-omics solution to simultaneously analyze tumor variants and microenvironment

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-itoc8.22 ◽

2021 ◽

Vol 9 (Suppl 1) ◽

pp. A12.1-A12

Author(s):

Y Arjmand Abbassi ◽

N Fang ◽

W Zhu ◽

Y Zhou ◽

Y Chen ◽

...

Keyword(s):

Gene Expression ◽

Tumor Microenvironment ◽

Single Cell ◽

High Throughput ◽

Immune Cells ◽

Genetic Variants ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Single Cell Sequencing

Recent advances of high-throughput single cell sequencing technologies have greatly improved our understanding of the complex biological systems. Heterogeneous samples such as tumor tissues commonly harbor cancer cell-specific genetic variants and gene expression profiles, both of which have been shown to be related to the mechanisms of disease development, progression, and responses to treatment. Furthermore, stromal and immune cells within tumor microenvironment interact with cancer cells to play important roles in tumor responses to systematic therapy such as immunotherapy or cell therapy. However, most current high-throughput single cell sequencing methods detect only gene expression levels or epigenetics events such as chromatin conformation. The information on important genetic variants including mutation or fusion is not captured. To better understand the mechanisms of tumor responses to systematic therapy, it is essential to decipher the connection between genotype and gene expression patterns of both tumor cells and cells in the tumor microenvironment. We developed FocuSCOPE, a high-throughput multi-omics sequencing solution that can detect both genetic variants and transcriptome from same single cells. FocuSCOPE has been used to successfully perform single cell analysis of both gene expression profiles and point mutations, fusion genes, or intracellular viral sequences from thousands of cells simultaneously, delivering comprehensive insights of tumor and immune cells in tumor microenvironment at single cell resolution.Disclosure InformationY. Arjmand Abbassi: None. N. Fang: None. W. Zhu: None. Y. Zhou: None. Y. Chen: None. U. Deutsch: None.

Download Full-text

Microbial single-cell RNA sequencing by split-pool barcoding

Science ◽

10.1126/science.aba5257 ◽

2020 ◽

Vol 371 (6531) ◽

pp. eaba5257 ◽

Cited By ~ 2

Author(s):

Anna Kuchina ◽

Leandra M. Brettner ◽

Luana Paleologu ◽

Charles M. Roco ◽

Alexander B. Rosenberg ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cell Analysis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Growth Stages ◽

High Throughput Analysis ◽

Single Cell Rna Sequencing

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.

Download Full-text

SHERRY2: A method for rapid and sensitive single cell RNA-seq

10.1101/2021.12.25.474161 ◽

2021 ◽

Author(s):

Lin Di ◽

Bo Liu ◽

Yuzhu Lyu ◽

Shihui Zhao ◽

Yuhong Pang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Dynamic Range ◽

Single Cells ◽

Rna Seq ◽

Wide Dynamic Range ◽

Uniform Coverage ◽

Optimized Protocol ◽

Tn5 Transposase ◽

Higher Sensitivity

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.

Download Full-text

RNA splicing programs define tissue compartments and cell types at single cell resolution

10.1101/2021.05.01.442281 ◽

2021 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Peter Wang ◽

SoRi Jang ◽

Antoine de Morree ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Rna Splicing ◽

Single Cells ◽

Cell Types ◽

Mouse Lemur ◽

Cell Type ◽

Multiple Organs ◽

Single Cell Pcr

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.

Download Full-text

Bayesian inference of the gene expression states of single cells from scRNA-seq data

10.1101/2019.12.28.889956 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jérémie Breda ◽

Mihaela Zavolan ◽

Erik van Nimwegen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Downstream Processing ◽

Noise Removal ◽

Rna Seq ◽

Expression Of Genes ◽

Normalization Methods ◽

Quantify Gene Expression ◽

Selection Of

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.

Download Full-text

Single-cell RNA-seq data reveals TNBC tumor heterogeneity through characterizing subclone compositions and proportions

10.1101/858290 ◽

2019 ◽

Author(s):

Weida Wang ◽

Jinyuan Xu ◽

Shuyuan Wang ◽

Peng Xia ◽

Li Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Single Cells ◽

Rna Seq ◽

Biological Functions ◽

Gene Markers ◽

Gene Expression Matrix ◽

Deconvolution Algorithm ◽

Expression Matrix

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.

Download Full-text