Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data

Mapping Intimacies ◽

10.1101/119784 ◽

2017 ◽

Author(s):

Aaron T. L. Lun ◽

Fernando J. Calero-Nieto ◽

Liora Haim-Vilmovsky ◽

Berthold Göttgens ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Capture Efficiency ◽

Cellular Heterogeneity ◽

Sequencing Data ◽

Constant Amount ◽

Technical Noise ◽

Data Analyses ◽

Single Cell Rna Sequencing ◽

The Cost

AbstractBy profiling the transcriptomes of individual cells, single-cell RNA sequencing provides unparalleled resolution to study cellular heterogeneity. However, this comes at the cost of high technical noise, including cell-specific biases in capture efficiency and library generation. One strategy for removing these biases is to add a constant amount of spike-in RNA to each cell, and to scale the observed expression values so that the coverage of spike-in RNA is constant across cells. This approach has previously been criticized as its accuracy depends on the precise addition of spike-in RNA to each sample, and on similarities in behaviour (e.g., capture efficiency) between the spike-in and endogenous transcripts. Here, we perform mixture experiments using two different sets of spike-in RNA to quantify the variance in the amount of spike-in RNA added to each well in a plate-based protocol. We also obtain an upper bound on the variance due to differences in behaviour between the two spike-in sets. We demonstrate that both factors are small contributors to the total technical variance and have only minor effects on downstream analyses such as detection of highly variable genes and clustering. Our results suggest that spike-in normalization is reliable enough for routine use in single-cell RNA sequencing data analyses.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

Millefy: visualizing cell-to-cell heterogeneity in read coverage of single-cell RNA sequencing datasets

10.1101/537936 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haruka Ozaki ◽

Tetsutaro Hayashi ◽

Mana Umeda ◽

Itoshi Nikaido

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cellular Heterogeneity ◽

Specific Cell ◽

Read Coverage ◽

Cell Heterogeneity ◽

Sequencing Data ◽

Rna Transcription ◽

Link Type ◽

Single Cell Rna Sequencing

AbstractBackgroundRead coverage of RNA sequencing data reflects gene expression and RNA processing events. Single-cell RNA sequencing (scRNA-seq) methods, particularly “full-length” ones, provide read coverage of many individual cells and have the potential to reveal cellular heterogeneity in RNA transcription and processing. However, visualization tools suited to highlighting cell-to-cell heterogeneity in read coverage are still lacking.ResultsHere, we have developed Millefy, a tool for visualizing read coverage of scRNA-seq data in genomic contexts. Millefy is designed to show read coverage of all individual cells at once in genomic contexts and to highlight cell-to-cell heterogeneity in read coverage. By visualizing read coverage of all cells as a heat map and dynamically reordering cells based on diffusion maps, Millefy facilitates discovery of “local” region-specific, cell-to-cell heterogeneity in read coverage, including variability of transcribed regions.ConclusionsMillefy simplifies the examination of cellular heterogeneity in RNA transcription and processing events using scRNA-seq data. Millefy is available as an R package (https://github.com/yuifu/millefy) and a Docker image to help use Millefy on the Jupyter notebook (https://hub.docker.com/r/yuifu/datascience-notebook-millefy).

Download Full-text

Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data

Genome Biology ◽

10.1186/s13059-019-1863-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 12

Author(s):

Fenglin Liu ◽

Yuanyuan Zhang ◽

Lei Zhang ◽

Ziyi Li ◽

Qiao Fang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Allele Frequencies ◽

Cellular Heterogeneity ◽

Variant Allele ◽

Detection Methods ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Single Cell Rna Sequencing

Abstract Background Systematic interrogation of single-nucleotide variants (SNVs) is one of the most promising approaches to delineate the cellular heterogeneity and phylogenetic relationships at the single-cell level. While SNV detection from abundant single-cell RNA sequencing (scRNA-seq) data is applicable and cost-effective in identifying expressed variants, inferring sub-clones, and deciphering genotype-phenotype linkages, there is a lack of computational methods specifically developed for SNV calling in scRNA-seq. Although variant callers for bulk RNA-seq have been sporadically used in scRNA-seq, the performances of different tools have not been assessed. Results Here, we perform a systematic comparison of seven tools including SAMtools, the GATK pipeline, CTAT, FreeBayes, MuTect2, Strelka2, and VarScan2, using both simulation and scRNA-seq datasets, and identify multiple elements influencing their performance. While the specificities are generally high, with sensitivities exceeding 90% for most tools when calling homozygous SNVs in high-confident coding regions with sufficient read depths, such sensitivities dramatically decrease when calling SNVs with low read depths, low variant allele frequencies, or in specific genomic contexts. SAMtools shows the highest sensitivity in most cases especially with low supporting reads, despite the relatively low specificity in introns or high-identity regions. Strelka2 shows consistently good performance when sufficient supporting reads are provided, while FreeBayes shows good performance in the cases of high variant allele frequencies. Conclusions We recommend SAMtools, Strelka2, FreeBayes, or CTAT, depending on the specific conditions of usage. Our study provides the first benchmarking to evaluate the performances of different SNV detection tools for scRNA-seq data.

Download Full-text

A monotonicity-based gene clustering algorithm for enhancing clarity in single-cell RNA sequencing data

10.1101/2020.12.20.423308 ◽

2020 ◽

Author(s):

Victor Wang ◽

Pietro Antonio Cicalese ◽

Chandra Mohan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Expression Patterns ◽

Gene Clustering ◽

Sequencing Data ◽

Technical Noise ◽

Cell Clustering ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies and analysis tools have allowed for meaningful insight into the roles and relationships of cells. However, high dimensionality, frequent dropout values, and technical noise remain prevalent challenges for scRNA-seq data, obscuring the already complex expression patterns. To address several shortcomings in commonly used distance metrics, we present a monotonicity-based distance metric designed to enhance the clarity of scRNA-seq data. We apply our metric in a gene clustering algorithm, which we run on several biological datasets. We compare our results to those generated by popular clustering algorithms to demonstrate that our algorithm has substantial ability to improve the accuracy of subsequent cell clustering.

Download Full-text

Screen technical noise in single cell RNA sequencing data

Genomics ◽

10.1016/j.ygeno.2019.02.014 ◽

2020 ◽

Vol 112 (1) ◽

pp. 346-355

Author(s):

Yu-Long Bai ◽

Melody Baddoo ◽

Erik K. Flemington ◽

Hani N. Nakhoul ◽

Yao-Zhong Liu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text

IMMU-27. SINGLE CELL RNA-SEQUENCING IDENTIFIES NOVEL BONE MARROW DERIVED MYELOID CELLS IN GLIOBLASTOMA ASSOCIATED WITH TUMOR AGGRESSION

Neuro-Oncology ◽

10.1093/neuonc/noaa215.457 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii110-ii110

Author(s):

Christina Jackson ◽

Christopher Cherry ◽

Sadhana Bom ◽

Hao Zhang ◽

John Choi ◽

...

Keyword(s):

Bone Marrow ◽

Single Cell ◽

Tumor Cells ◽

Rna Sequencing ◽

Metabolic Pathways ◽

Myeloid Cells ◽

Tumor Grade ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Two Populations

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.

Download Full-text

Cryopreservation of human cancers conserves tumour heterogeneity for single-cell multi-omics analysis

Genome Medicine ◽

10.1186/s13073-021-00885-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Sunny Z. Wu ◽

Daniel L. Roden ◽

Ghamdan Al-Eryani ◽

Nenad Bartonicek ◽

Kate Harvey ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cells ◽

Cellular Heterogeneity ◽

Tumour Heterogeneity ◽

Fresh Tissue ◽

Human Cancers ◽

Cryopreserved Cell ◽

Single Cell Rna Sequencing

Abstract Background High throughput single-cell RNA sequencing (scRNA-Seq) has emerged as a powerful tool for exploring cellular heterogeneity among complex human cancers. scRNA-Seq studies using fresh human surgical tissue are logistically difficult, preclude histopathological triage of samples, and limit the ability to perform batch processing. This hindrance can often introduce technical biases when integrating patient datasets and increase experimental costs. Although tissue preservation methods have been previously explored to address such issues, it is yet to be examined on complex human tissues, such as solid cancers and on high throughput scRNA-Seq platforms. Methods Using the Chromium 10X platform, we sequenced a total of ~ 120,000 cells from fresh and cryopreserved replicates across three primary breast cancers, two primary prostate cancers and a cutaneous melanoma. We performed detailed analyses between cells from each condition to assess the effects of cryopreservation on cellular heterogeneity, cell quality, clustering and the identification of gene ontologies. In addition, we performed single-cell immunophenotyping using CITE-Seq on a single breast cancer sample cryopreserved as solid tissue fragments. Results Tumour heterogeneity identified from fresh tissues was largely conserved in cryopreserved replicates. We show that sequencing of single cells prepared from cryopreserved tissue fragments or from cryopreserved cell suspensions is comparable to sequenced cells prepared from fresh tissue, with cryopreserved cell suspensions displaying higher correlations with fresh tissue in gene expression. We showed that cryopreservation had minimal impacts on the results of downstream analyses such as biological pathway enrichment. For some tumours, cryopreservation modestly increased cell stress signatures compared to freshly analysed tissue. Further, we demonstrate the advantage of cryopreserving whole-cells for detecting cell-surface proteins using CITE-Seq, which is impossible using other preservation methods such as single nuclei-sequencing. Conclusions We show that the viable cryopreservation of human cancers provides high-quality single-cells for multi-omics analysis. Our study guides new experimental designs for tissue biobanking for future clinical single-cell RNA sequencing studies.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text

MBRS-46. CHARTING NEOPLASTIC AND IMMUNE CELL HETEROGENEITY IN HUMAN AND GEM MODELS OF MEDULLOBLASTOMA USING scRNAseq

Neuro-Oncology ◽

10.1093/neuonc/noaa222.555 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii406-iii406

Author(s):

Andrew Donson ◽

Kent Riemondy ◽

Sujatha Venkataraman ◽

Ahmed Gilani ◽

Bridget Sanford ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Immune Cell ◽

Genetically Engineered ◽

Cellular Heterogeneity ◽

Cell Heterogeneity ◽

Transcriptomic Data ◽

Single Cell Rna Sequencing ◽

Transcript Profiles

Abstract We explored cellular heterogeneity in medulloblastoma using single-cell RNA sequencing (scRNAseq), immunohistochemistry and deconvolution of bulk transcriptomic data. Over 45,000 cells from 31 patients from all main subgroups of medulloblastoma (2 WNT, 10 SHH, 9 GP3, 11 GP4 and 1 GP3/4) were clustered using Harmony alignment to identify conserved subpopulations. Each subgroup contained subpopulations exhibiting mitotic, undifferentiated and neuronal differentiated transcript profiles, corroborating other recent medulloblastoma scRNAseq studies. The magnitude of our present study builds on the findings of existing studies, providing further characterization of conserved neoplastic subpopulations, including identification of a photoreceptor-differentiated subpopulation that was predominantly, but not exclusively, found in GP3 medulloblastoma. Deconvolution of MAGIC transcriptomic cohort data showed that neoplastic subpopulations are associated with major and minor subgroup subdivisions, for example, photoreceptor subpopulation cells are more abundant in GP3-alpha. In both GP3 and GP4, higher proportions of undifferentiated subpopulations is associated with shorter survival and conversely, differentiated subpopulation is associated with longer survival. This scRNAseq dataset also afforded unique insights into the immune landscape of medulloblastoma, and revealed an M2-polarized myeloid subpopulation that was restricted to SHH medulloblastoma. Additionally, we performed scRNAseq on 16,000 cells from genetically engineered mouse (GEM) models of GP3 and SHH medulloblastoma. These models showed a level of fidelity with corresponding human subgroup-specific neoplastic and immune subpopulations. Collectively, our findings advance our understanding of the neoplastic and immune landscape of the main medulloblastoma subgroups in both humans and GEM models.

Download Full-text