Accounting for technical noise in single-cell RNA-seq experiments

Philip Brennecke; Simon Anders; Jong Kyoung Kim; Aleksandra A Kołodziejczyk; Xiuwei Zhang; Valentina Proserpio; Bianka Baying; Vladimir Benes; Sarah A Teichmann; John C Marioni; Marcus G Heisler

doi:10.1038/nmeth.2645

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text

A step-by-step workflow for low-level analysis of single-cell RNA-seq data

F1000Research ◽

10.12688/f1000research.9501.1 ◽

2016 ◽

Vol 5 ◽

pp. 2122 ◽

Cited By ~ 9

Author(s):

Aaron T.L. Lun ◽

Davis J. McCarthy ◽

John C. Marioni

Keyword(s):

Stem Cells ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Embryonic Stem ◽

Cycle Phase ◽

Data Sets ◽

Rna Seq ◽

Technical Noise ◽

Low Level

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available data sets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

Download Full-text

Erratum: Corrigendum: Accounting for technical noise in single-cell RNA-seq experiments

Nature Methods ◽

10.1038/nmeth0214-210b ◽

2014 ◽

Vol 11 (2) ◽

pp. 210-210 ◽

Cited By ~ 2

Author(s):

Philip Brennecke ◽

Simon Anders ◽

Jong Kyoung Kim ◽

Aleksandra A Kołodziejczyk ◽

Xiuwei Zhang ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise

Download Full-text

Optimal Gene Filtering for Single-Cell data (OGFSC)—a gene filtering algorithm for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/bty1016 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2602-2609 ◽

Cited By ~ 3

Author(s):

Jie Hao ◽

Wei Cao ◽

Jian Huang ◽

Xin Zou ◽

Ze-Guang Han

Keyword(s):

Single Cell ◽

Supplementary Information ◽

Rna Seq ◽

Aging Research ◽

Technical Noise ◽

Transcriptomic Data ◽

Knowledge Based ◽

Gene Filtering ◽

Cell Data ◽

Gene Expression Levels

Abstract Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, e.g. by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as Optimal Gene Filtering for Single-Cell data, to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability and implementation https://github.com/XZouProjects/OGFSC.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Assessment of single cell RNA-seq normalization methods

10.1101/064329 ◽

2016 ◽

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

AbstractWe have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in ERCC RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

F1000Research ◽

10.12688/f1000research.9501.2 ◽

2016 ◽

Vol 5 ◽

pp. 2122 ◽

Cited By ~ 104

Author(s):

Aaron T.L. Lun ◽

Davis J. McCarthy ◽

John C. Marioni

Keyword(s):

Stem Cells ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Embryonic Stem ◽

Cycle Phase ◽

Rna Seq ◽

Technical Noise ◽

Low Level ◽

Bioconductor Project

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

Download Full-text

Assessment of Single Cell RNA-Seq Normalization Methods

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.040683 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2039-2045 ◽

Cited By ~ 6

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

Abstract We have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in External RNA Control Consortium (ERCC) RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text

Transferable representations of single-cell transcriptomic data

10.1101/2021.04.13.439707 ◽

2021 ◽

Author(s):

Ethan Weinberger ◽

Su-In Lee

Keyword(s):

Single Cell ◽

Large Scale ◽

Future Research ◽

Rna Seq ◽

Cell Type ◽

Experimental Conditions ◽

Technical Noise ◽

Transcriptomic Data ◽

Low Dimensional ◽

Computational Resources

Advances in single-cell RNA-seq (scRNA-seq) technologies are enabling the construction of large-scale, human-annotated reference cell atlases, creating unprecedented opportunities to accelerate future research. However, effectively leveraging information from these atlases, such as clustering labels or cell type annotations, remains challenging due to substantial technical noise and sparsity in scRNA-seq measurements. To address this problem, we present HD-AE, a deep autoencoder designed to extract integrated low-dimensional representations of scRNA-seq measurements across datasets from different labs and experimental conditions. Unlike previous approaches, HD-AE's representations successfully transfer to new query datasets without needing to retrain the model. Researchers without substantial computational resources or machine learning expertise can thus leverage the robust representations learned by pretrained HD-AE models to compare embeddings of their own data with previously generated sets of reference embeddings.

Download Full-text

Non-linear Normalization for Non-UMI Single Cell RNA-Seq

Frontiers in Genetics ◽

10.3389/fgene.2021.612670 ◽

2021 ◽

Vol 12 ◽

Author(s):

Zhijin Wu ◽

Kenong Su ◽

Hao Wu

Keyword(s):

Single Cell ◽

Amplification Efficiency ◽

Size Factor ◽

Rna Seq ◽

Normalization Factor ◽

Sequencing Technology ◽

Technical Noise ◽

Non Linear ◽

A Cell ◽

Unique Molecular Identifier

Single cell RNA-seq data, like data from other sequencing technology, contain systematic technical noise. Such noise results from a combined effect of unequal efficiencies in the capturing and counting of mRNA molecules, such as extraction/amplification efficiency and sequencing depth. We show that such technical effects are not only cell-specific, but also affect genes differently, thus a simple cell-wise size factor adjustment may not be sufficient. We present a non-linear normalization approach that provides a cell- and gene-specific normalization factor for each gene in each cell. We show that the proposed normalization method (implemented in “SC2P" package) reduces more technical variation than competing methods, without reducing biological variation. When technical effects such as sequencing depths are not balanced between cell populations, SC2P normalization also removes the bias due to uneven technical noise. This method is applicable to scRNA-seq experiments that do not use unique molecular identifier (UMI) thus retain amplification biases.

Download Full-text

ECBN: Ensemble Clustering based on Bayesian Network inference for Single-cell RNA-seq Data

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9188589 ◽

2020 ◽

Author(s):

Dexin Zhang ◽

Yuan Zhu

Keyword(s):

Bayesian Network ◽

Single Cell ◽

Network Inference ◽

Ensemble Clustering ◽

Rna Seq ◽

Bayesian Network Inference

Download Full-text