scholarly journals Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-seq

2017 ◽  
Author(s):  
Michael B. Cole ◽  
Davide Risso ◽  
Allon Wagner ◽  
David DeTomaso ◽  
John Ngai ◽  
...  

AbstractSystematic measurement biases make data normalization an essential preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis. There may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. Because normalization can have a large impact on downstream results (e.g., clustering and differential expression), it is critically important that practitioners assess the performance of competing methods.We have developed scone — a flexible framework for assessing normalization performance based on a comprehensive panel of data-driven metrics. Through graphical summaries and quantitative reports, scone summarizes performance trade-offs and ranks large numbers of normalization methods by aggregate panel performance. The method is implemented in the open-source Bioconductor R software package scone. We demonstrate the effectiveness of scone on a collection of scRNA-seq datasets, generated with different protocols, including Fluidigm C1 and 10x platforms. We show that top-performing normalization methods lead to better agreement with independent validation data.

Author(s):  
Jérémie Breda ◽  
Mihaela Zavolan ◽  
Erik van Nimwegen

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.


Cell Systems ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 315-328.e8 ◽  
Author(s):  
Michael B. Cole ◽  
Davide Risso ◽  
Allon Wagner ◽  
David DeTomaso ◽  
John Ngai ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
J. Zyprych-Walczak ◽  
A. Szabelska ◽  
L. Handschuh ◽  
K. Górczak ◽  
K. Klamecka ◽  
...  

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.


2018 ◽  
Author(s):  
Vasilis Ntranos ◽  
Lynn Yi ◽  
Páll Melsted ◽  
Lior Pachter

AbstractSingle-cell RNA-Seq makes it possible to characterize the transcriptomes of cell types and identify their transcriptional signatures via differential analysis. We present a fast and accurate method for discriminating cell types that takes advantage of the large numbers of cells that are assayed. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3’ single-cell RNA-Seq that can identify previously undetectable marker genes.


2021 ◽  
Vol 11 ◽  
Author(s):  
Bo Pang ◽  
Fei Quan ◽  
Yanyan Ping ◽  
Jing Hu ◽  
Yujia Lan ◽  
...  

Glioblastoma (GBM) is characterized by rapid and lethal infiltration of brain tissue, which is the primary cause of treatment failure and deaths for GBM. Therefore, understanding the molecular mechanisms of tumor cell invasion is crucial for the treatment of GBM. In this study, we dissected the single-cell RNA-seq data of 3345 cells from four patients and identified dysregulated genes including long non-coding RNAs (lncRNAs), which were involved in the development and progression of GBM. Based on co-expression network analysis, we identified a module (M1) that significantly overlapped with the largest number of dysregulated genes and was confirmed to be associated with GBM invasion by integrating EMT signature, experiment-validated invasive marker and pseudotime trajectory analysis. Further, we denoted invasion-associated lncRNAs which showed significant correlations with M1 and revealed their gradually increased expression levels along the tumor cell invasion trajectory, such as VIM-AS1, WWTR1-AS1, and NEAT1. We also observed the contribution of higher expression of these lncRNAs to poorer survival of GBM patients. These results were mostly recaptured in another validation data of 7930 single cells from 28 GBM patients. Our findings identified lncRNAs that played critical roles in regulating or controlling cell invasion and migration of GBM and provided new insights into the molecular mechanisms underlying GBM invasion as well as potential targets for the treatment of GBM.


2017 ◽  
Author(s):  
Charles Cole ◽  
Ashley Byrne ◽  
Anna E. Beaudin ◽  
E. Camilla Forsberg ◽  
Christopher Vollmers

AbstractRNA-seq is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase based Smartseq2 protocol to create RNA-seq libraries that capture the 5’ end of transcripts. The Tn5Prime method dramatically streamlines the 5’ capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3’ end based high-throughput methods like Drop-Seq and 10X Genomics Chromium, the 5’ capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3’ end based methods, Tn5Prime also enables the assembly of the variable 5’ ends of antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.


2017 ◽  
Author(s):  
Luke Zappia ◽  
Belinda Phipson ◽  
Alicia Oshlack

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.


2016 ◽  
Author(s):  
Rhonda Bacher ◽  
Li-Fang Chu ◽  
Ning Leng ◽  
Audrey P. Gasch ◽  
James A. Thomson ◽  
...  

SummaryNormalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data.


2016 ◽  
Author(s):  
Bo Ding ◽  
Lina Zheng ◽  
Wei Wang

AbstractWe have assessed the performance of seven normalization methods for single cell RNA-seq using data generated from dilution of RNA samples. Our analyses showed that methods considering spike-in ERCC RNA molecules significantly outperformed those not considering ERCCs. This work provides a guidance of selecting normalization methods to remove technical noise in single cell RNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document