Assessment of single cell RNA-seq normalization methods

Assessment of Single Cell RNA-Seq Normalization Methods

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.040683 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2039-2045 ◽

Cited By ~ 6

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

Abstract We have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in External RNA Control Consortium (ERCC) RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text

Accounting for technical noise in single-cell RNA-seq experiments

Nature Methods ◽

10.1038/nmeth.2645 ◽

2013 ◽

Vol 10 (11) ◽

pp. 1093-1095 ◽

Cited By ~ 576

Author(s):

Philip Brennecke ◽

Simon Anders ◽

Jong Kyoung Kim ◽

Aleksandra A Kołodziejczyk ◽

Xiuwei Zhang ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise

Download Full-text

Bayesian inference of the gene expression states of single cells from scRNA-seq data

10.1101/2019.12.28.889956 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jérémie Breda ◽

Mihaela Zavolan ◽

Erik van Nimwegen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Downstream Processing ◽

Noise Removal ◽

Rna Seq ◽

Expression Of Genes ◽

Normalization Methods ◽

Quantify Gene Expression ◽

Selection Of

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.

Download Full-text

A step-by-step workflow for low-level analysis of single-cell RNA-seq data

F1000Research ◽

10.12688/f1000research.9501.1 ◽

2016 ◽

Vol 5 ◽

pp. 2122 ◽

Cited By ~ 9

Author(s):

Aaron T.L. Lun ◽

Davis J. McCarthy ◽

John C. Marioni

Keyword(s):

Stem Cells ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Embryonic Stem ◽

Cycle Phase ◽

Data Sets ◽

Rna Seq ◽

Technical Noise ◽

Low Level

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available data sets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

Download Full-text

Erratum: Corrigendum: Accounting for technical noise in single-cell RNA-seq experiments

Nature Methods ◽

10.1038/nmeth0214-210b ◽

2014 ◽

Vol 11 (2) ◽

pp. 210-210 ◽

Cited By ~ 2

Author(s):

Philip Brennecke ◽

Simon Anders ◽

Jong Kyoung Kim ◽

Aleksandra A Kołodziejczyk ◽

Xiuwei Zhang ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise

Download Full-text

SCnorm: A quantile-regression based approach for robust normalization of single-cell RNA-seq data

10.1101/090167 ◽

2016 ◽

Cited By ~ 1

Author(s):

Rhonda Bacher ◽

Li-Fang Chu ◽

Ning Leng ◽

Audrey P. Gasch ◽

James A. Thomson ◽

...

Keyword(s):

Quantile Regression ◽

Single Cell ◽

Rna Sequencing ◽

Rna Seq ◽

Sequencing Data ◽

Normalization Methods

SummaryNormalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data.

Download Full-text

Optimal Gene Filtering for Single-Cell data (OGFSC)—a gene filtering algorithm for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/bty1016 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2602-2609 ◽

Cited By ~ 3

Author(s):

Jie Hao ◽

Wei Cao ◽

Jian Huang ◽

Xin Zou ◽

Ze-Guang Han

Keyword(s):

Single Cell ◽

Supplementary Information ◽

Rna Seq ◽

Aging Research ◽

Technical Noise ◽

Transcriptomic Data ◽

Knowledge Based ◽

Gene Filtering ◽

Cell Data ◽

Gene Expression Levels

Abstract Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, e.g. by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as Optimal Gene Filtering for Single-Cell data, to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability and implementation https://github.com/XZouProjects/OGFSC.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey

Frontiers in Genetics ◽

10.3389/fgene.2020.00041 ◽

2020 ◽

Vol 11 ◽

Cited By ~ 1

Author(s):

Nicholas Lytal ◽

Di Ran ◽

Lingling An

Keyword(s):

Single Cell ◽

Rna Seq ◽

Empirical Survey ◽

Normalization Methods

Download Full-text

Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-seq

10.1101/235382 ◽

2017 ◽

Cited By ~ 16

Author(s):

Michael B. Cole ◽

Davide Risso ◽

Allon Wagner ◽

David DeTomaso ◽

John Ngai ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

R Software ◽

Validation Data ◽

Trade Offs ◽

Large Numbers ◽

Normalization Methods ◽

Flexible Framework ◽

Measurement Biases ◽

Selection Of

AbstractSystematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods.We have developed scone — a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.

Download Full-text