scholarly journals Assessment of single cell RNA-seq normalization methods

2016 ◽  
Author(s):  
Bo Ding ◽  
Lina Zheng ◽  
Wei Wang

AbstractWe have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in ERCC RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

2017 ◽  
Vol 7 (7) ◽  
pp. 2039-2045 ◽  
Author(s):  
Bo Ding ◽  
Lina Zheng ◽  
Wei Wang

Abstract We have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in External RNA Control Consortium (ERCC) RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.


2020 ◽  
Author(s):  
Snehalika Lall ◽  
Abhik Ghosh ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF


2013 ◽  
Vol 10 (11) ◽  
pp. 1093-1095 ◽  
Author(s):  
Philip Brennecke ◽  
Simon Anders ◽  
Jong Kyoung Kim ◽  
Aleksandra A Kołodziejczyk ◽  
Xiuwei Zhang ◽  
...  
Keyword(s):  
Rna Seq ◽  

Author(s):  
Jérémie Breda ◽  
Mihaela Zavolan ◽  
Erik van Nimwegen

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2122 ◽  
Author(s):  
Aaron T.L. Lun ◽  
Davis J. McCarthy ◽  
John C. Marioni

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available data sets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.


2014 ◽  
Vol 11 (2) ◽  
pp. 210-210 ◽  
Author(s):  
Philip Brennecke ◽  
Simon Anders ◽  
Jong Kyoung Kim ◽  
Aleksandra A Kołodziejczyk ◽  
Xiuwei Zhang ◽  
...  
Keyword(s):  
Rna Seq ◽  

2016 ◽  
Author(s):  
Rhonda Bacher ◽  
Li-Fang Chu ◽  
Ning Leng ◽  
Audrey P. Gasch ◽  
James A. Thomson ◽  
...  

SummaryNormalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data.


2018 ◽  
Vol 35 (15) ◽  
pp. 2602-2609 ◽  
Author(s):  
Jie Hao ◽  
Wei Cao ◽  
Jian Huang ◽  
Xin Zou ◽  
Ze-Guang Han

Abstract Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, e.g. by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as Optimal Gene Filtering for Single-Cell data, to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability and implementation https://github.com/XZouProjects/OGFSC.git Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Michael B. Cole ◽  
Davide Risso ◽  
Allon Wagner ◽  
David DeTomaso ◽  
John Ngai ◽  
...  

AbstractSystematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods.We have developed scone — a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.


Sign in / Sign up

Export Citation Format

Share Document