Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-seq

Mapping Intimacies ◽

10.1101/235382 ◽

2017 ◽

Cited By ~ 16

Author(s):

Michael B. Cole ◽

Davide Risso ◽

Allon Wagner ◽

David DeTomaso ◽

John Ngai ◽

...

Keyword(s):

Single Cell ◽

Rna Seq ◽

R Software ◽

Validation Data ◽

Trade Offs ◽

Large Numbers ◽

Normalization Methods ◽

Flexible Framework ◽

Measurement Biases ◽

Selection Of

AbstractSystematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods.We have developed scone — a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.

Download Full-text

Bayesian inference of the gene expression states of single cells from scRNA-seq data

10.1101/2019.12.28.889956 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jérémie Breda ◽

Mihaela Zavolan ◽

Erik van Nimwegen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Downstream Processing ◽

Noise Removal ◽

Rna Seq ◽

Expression Of Genes ◽

Normalization Methods ◽

Quantify Gene Expression ◽

Selection Of

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.

Download Full-text

Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq

Cell Systems ◽

10.1016/j.cels.2019.03.010 ◽

2019 ◽

Vol 8 (4) ◽

pp. 315-328.e8 ◽

Cited By ~ 29

Author(s):

Michael B. Cole ◽

Davide Risso ◽

Allon Wagner ◽

David DeTomaso ◽

John Ngai ◽

...

Keyword(s):

Performance Assessment ◽

Single Cell ◽

Rna Seq ◽

Selection Of

Download Full-text

The Impact of Normalization Methods on RNA-Seq Data Analysis

BioMed Research International ◽

10.1155/2015/621690 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

J. Zyprych-Walczak ◽

A. Szabelska ◽

L. Handschuh ◽

K. Górczak ◽

K. Klamecka ◽

...

Keyword(s):

High Throughput Sequencing ◽

Data Sets ◽

Complex Data ◽

Rna Seq ◽

Medical Problems ◽

Data Set ◽

Normalization Methods ◽

Wide Range ◽

The Impact ◽

Selection Of

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Download Full-text

Identification of transcriptional signatures for cell types from single-cell RNA-Seq

10.1101/258566 ◽

2018 ◽

Cited By ~ 7

Author(s):

Vasilis Ntranos ◽

Lynn Yi ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Cell Types ◽

Accurate Method ◽

Marker Genes ◽

Differential Analysis ◽

Rna Seq ◽

Large Numbers

AbstractSingle-cell RNA-Seq makes it possible to characterize the transcriptomes of cell types and identify their transcriptional signatures via differential analysis. We present a fast and accurate method for discriminating cell types that takes advantage of the large numbers of cells that are assayed. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3’ single-cell RNA-Seq that can identify previously undetectable marker genes.

Download Full-text

Dissecting the Invasion-Associated Long Non-coding RNAs Using Single-Cell RNA-Seq Data of Glioblastoma

Frontiers in Genetics ◽

10.3389/fgene.2020.633455 ◽

2021 ◽

Vol 11 ◽

Author(s):

Bo Pang ◽

Fei Quan ◽

Yanyan Ping ◽

Jing Hu ◽

Yujia Lan ◽

...

Keyword(s):

Tumor Cell ◽

Single Cell ◽

Cell Invasion ◽

Molecular Mechanisms ◽

Single Cells ◽

Tumor Cell Invasion ◽

Rna Seq ◽

Validation Data ◽

Invasion And Migration ◽

Non Coding Rnas

Glioblastoma (GBM) is characterized by rapid and lethal infiltration of brain tissue, which is the primary cause of treatment failure and deaths for GBM. Therefore, understanding the molecular mechanisms of tumor cell invasion is crucial for the treatment of GBM. In this study, we dissected the single-cell RNA-seq data of 3345 cells from four patients and identified dysregulated genes including long non-coding RNAs (lncRNAs), which were involved in the development and progression of GBM. Based on co-expression network analysis, we identified a module (M1) that significantly overlapped with the largest number of dysregulated genes and was confirmed to be associated with GBM invasion by integrating EMT signature, experiment-validated invasive marker and pseudotime trajectory analysis. Further, we denoted invasion-associated lncRNAs which showed significant correlations with M1 and revealed their gradually increased expression levels along the tumor cell invasion trajectory, such as VIM-AS1, WWTR1-AS1, and NEAT1. We also observed the contribution of higher expression of these lncRNAs to poorer survival of GBM patients. These results were mostly recaptured in another validation data of 7930 single cells from 28 GBM patients. Our findings identified lncRNAs that played critical roles in regulating or controlling cell invasion and migration of GBM and provided new insights into the molecular mechanisms underlying GBM invasion as well as potential targets for the treatment of GBM.

Download Full-text

Tn5Prime, a Tn5 based 5’ Capture Method for Single Cell RNA-seq

10.1101/217117 ◽

2017 ◽

Cited By ~ 1

Author(s):

Charles Cole ◽

Ashley Byrne ◽

Anna E. Beaudin ◽

E. Camilla Forsberg ◽

Christopher Vollmers

Keyword(s):

Single Cell ◽

Single Cells ◽

Adaptive Immune System ◽

Cost Effective ◽

Rna Seq ◽

Transcription Start ◽

Capture Process ◽

Transcription Start Sites ◽

Large Numbers ◽

Basic And Applied Research

AbstractRNA-seq is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase based Smartseq2 protocol to create RNA-seq libraries that capture the 5’ end of transcripts. The Tn5Prime method dramatically streamlines the 5’ capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3’ end based high-throughput methods like Drop-Seq and 10X Genomics Chromium, the 5’ capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3’ end based methods, Tn5Prime also enables the assembly of the variable 5’ ends of antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.

Download Full-text

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

10.1101/206573 ◽

2017 ◽

Cited By ~ 2

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Open Source ◽

Rapid Development ◽

Analysis Tool ◽

Rna Seq ◽

Link Type ◽

Analysis Tools ◽

Rapid Pace ◽

Cell Data ◽

Selection Of

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.

Download Full-text

SCnorm: A quantile-regression based approach for robust normalization of single-cell RNA-seq data

10.1101/090167 ◽

2016 ◽

Cited By ~ 1

Author(s):

Rhonda Bacher ◽

Li-Fang Chu ◽

Ning Leng ◽

Audrey P. Gasch ◽

James A. Thomson ◽

...

Keyword(s):

Quantile Regression ◽

Single Cell ◽

Rna Sequencing ◽

Rna Seq ◽

Sequencing Data ◽

Normalization Methods

SummaryNormalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data.

Download Full-text

Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey

Frontiers in Genetics ◽

10.3389/fgene.2020.00041 ◽

2020 ◽

Vol 11 ◽

Cited By ~ 1

Author(s):

Nicholas Lytal ◽

Di Ran ◽

Lingling An

Keyword(s):

Single Cell ◽

Rna Seq ◽

Empirical Survey ◽

Normalization Methods

Download Full-text

Assessment of single cell RNA-seq normalization methods

10.1101/064329 ◽

2016 ◽

Author(s):

Bo Ding ◽

Lina Zheng ◽

Wei Wang

Keyword(s):

Single Cell ◽

Rna Seq ◽

Technical Noise ◽

Rna Molecules ◽

Normalization Methods ◽

Using Data

AbstractWe have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in ERCC RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.

Download Full-text