Valid post-clustering differential analysis for single-cell RNA-Seq

Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis

10.1101/2021.05.21.445165 ◽

2021 ◽

Author(s):

Marine Gauthier ◽

Denis Agniel ◽

Rodolphe Thiébaut ◽

Boris P. Hejblum

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Permutation Test ◽

Differential Expression Analysis ◽

Cumulative Distribution ◽

Rna Seq ◽

Distribution Free ◽

Art Methods

State-of-the-art methods for single-cell RNA-seq (scRNA-seq) Differential Expression Analysis (DEA) often rely on strong distributional assumptions that are difficult to verify in practice. Furthermore, while the increasing complexity of clinical and biological single-cell studies calls for greater tool versatility, the majority of existing methods only tackle the comparison between two conditions. We propose a novel, distribution-free, and flexible approach to DEA for single-cell RNA-seq data. This new method, called ccdf, tests the association of each gene expression with one or many variables of interest (that can be either continuous or discrete), while potentially adjusting for additional covariates. To test such complex hypotheses, ccdf uses a conditional independence test relying on the conditional cumulative distribution function, estimated through multiple regressions. We provide the asymptotic distribution of the ccdf test statistic as well as a permutation test (when the number of observed cells is not sufficiently large). ccdf substantially expands the possibilities for scRNA-seq DEA studies: it obtains good statistical performance in various simulation scenarios considering complex experimental designs i.e. beyond the two condition comparison), while retaining competitive performance with state-of-the-art methods in a two-condition benchmark.

Download Full-text

Two-phase differential expression analysis for single cell RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/bty329 ◽

2018 ◽

Vol 34 (19) ◽

pp. 3340-3348 ◽

Cited By ~ 11

Author(s):

Zhijin Wu ◽

Yi Zhang ◽

Michael L Stitzel ◽

Hao Wu

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Two Phase

Download Full-text

Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

10.1101/143289 ◽

2017 ◽

Cited By ~ 16

Author(s):

Charlotte Soneson ◽

Mark D. Robinson

Keyword(s):

Single Cell ◽

Differential Expression ◽

Statistical Methods ◽

Expression Analysis ◽

Method Development ◽

Differential Expression Analysis ◽

Data Sets ◽

Rna Seq ◽

Data Set ◽

Extensive Evaluation

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.

Download Full-text

A tool for the comparison of transcript differential expression analysis pipelines

10.7287/peerj.preprints.2212 ◽

2016 ◽

Author(s):

Stefano Beretta ◽

Yuri Pirola ◽

Valeria Ranzani ◽

Grazisa Rossetti ◽

Raoul Bonnal ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

A Priori ◽

Differential Expression Analysis ◽

Workflow Management ◽

Transcript Level ◽

Rna Seq ◽

Art Methods ◽

Transcript Assembly

MOTIVATION Long non-coding RNAs (lncRNAs) have recently gained interest, especially for their involvement in controlling several cell processes, but a full understanding of their role is lacking. Differential Expression (DE) analysis is one of the most important tasks in the analysis of RNA-seq data, since it potentially points out genes involved in the regulation of the condition under study. However, a classical analysis at gene level may disregard the role of Alternative Splicing (AS) in regulating cell conditions. This is the case, for example, when a given gene is expressed in all the different conditions, but the expressed isoform is significantly diverse in the different conditions (that is an isoform switch). A transcript level analysis may better shed light on this case, especially in studies having as goal, for example, a better understanding of the behavior of lncRNAs in lymphocytes T cells, which are fundamental in studies of specific diseases, such as cancer. After Cufflinks/Cuffdiff, several approaches for DE analysis at isoform/transcript level have been proposed. However, their results are often sensitive to the upstream analysis such as read mapping, transcript reconstruction and quantification, and it is often hard to choose "a priori" the most appropriate combination of tools. This work presents a tool for assisting the user in this choice, and poses the bases for a study devoted to the characterization of lncRNAs and the identification of of isoform switch events. Our tool includes a framework for the description and the execution of a set of DE pipelines over the same input dataset, as well a set of tools for reconciling and comparing the results. METHOD We designed an automated and easily customizable tool which is able to execute a set of existing pipelines for DE analysis at transcript level starting from RNA-seq data. Our method is built upon Snakemake, a workflow management system, with the specific goal of reducing the complexity of creating workflows. This approach guarantees that the experimentation is fully replicable and easy to customize. Each considered pipeline is structured in three steps: (i) transcript assembly, (ii) quantification, and (iii) DE analysis. By default, our tool builds and compares 9 different pipelines, each taking as input the same set of RNA-seq reads, obtained by combining different state-of-the-art methods to perform the transcript assembly (TA step) with different state-of-the-art methods to perform quantification and differential expression analysis (Q+DE step). More precisely, the 9 pipelines are obtained by combining two tools (Cufflinks and StringTie) and a Reference Annotation (Ensembl annotated transcripts) for the TA step, with three tools (Cuffquant+Cuffdiff, StringTie-B+Ballgown and Kallisto+Sleuth) for the Q+DE step. Abstract truncated at 3,000 characters - the full version is available in the pdf file

Download Full-text

A discriminative learning approach to differential expression analysis for single-cell RNA-seq

Nature Methods ◽

10.1038/s41592-018-0303-9 ◽

2019 ◽

Vol 16 (2) ◽

pp. 163-166 ◽

Cited By ~ 26

Author(s):

Vasilis Ntranos ◽

Lynn Yi ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Discriminative Learning ◽

Learning Approach ◽

Rna Seq

Download Full-text

SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data

Genomics ◽

10.1016/j.ygeno.2021.02.014 ◽

2021 ◽

Vol 113 (3) ◽

pp. 1308-1324

Author(s):

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Statistical Approach ◽

Differential Expression Analysis ◽

Rna Seq

Download Full-text

Confronting false discoveries in single-cell differential expression

10.1101/2021.03.12.435024 ◽

2021 ◽

Author(s):

Jordan W. Squair ◽

Matthieu Gautier ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Nicholas D. James ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Differentially Expressed Genes ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Experimental Manipulation ◽

Differentially Expressed ◽

Cell Type Specific ◽

False Discoveries ◽

Cell Data

Differential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulation. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. Our results suggest an urgent need for a paradigm shift in the methods used to perform differential expression analysis in single-cell data.

Download Full-text

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

10.1101/157982 ◽

2017 ◽

Cited By ~ 7

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Michael I. Love ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Negative Binomial Model ◽

Binomial Model ◽

Rna Seq ◽

Zero Inflation ◽

Zero Counts

AbstractDropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Download Full-text

A tool for the comparison of transcript differential expression analysis pipelines

10.7287/peerj.preprints.2212v1 ◽

2016 ◽

Author(s):

Stefano Beretta ◽

Yuri Pirola ◽

Valeria Ranzani ◽

Grazisa Rossetti ◽

Raoul Bonnal ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

A Priori ◽

Differential Expression Analysis ◽

Workflow Management ◽

Transcript Level ◽

Rna Seq ◽

Art Methods ◽

Transcript Assembly

MOTIVATION Long non-coding RNAs (lncRNAs) have recently gained interest, especially for their involvement in controlling several cell processes, but a full understanding of their role is lacking. Differential Expression (DE) analysis is one of the most important tasks in the analysis of RNA-seq data, since it potentially points out genes involved in the regulation of the condition under study. However, a classical analysis at gene level may disregard the role of Alternative Splicing (AS) in regulating cell conditions. This is the case, for example, when a given gene is expressed in all the different conditions, but the expressed isoform is significantly diverse in the different conditions (that is an isoform switch). A transcript level analysis may better shed light on this case, especially in studies having as goal, for example, a better understanding of the behavior of lncRNAs in lymphocytes T cells, which are fundamental in studies of specific diseases, such as cancer. After Cufflinks/Cuffdiff, several approaches for DE analysis at isoform/transcript level have been proposed. However, their results are often sensitive to the upstream analysis such as read mapping, transcript reconstruction and quantification, and it is often hard to choose "a priori" the most appropriate combination of tools. This work presents a tool for assisting the user in this choice, and poses the bases for a study devoted to the characterization of lncRNAs and the identification of of isoform switch events. Our tool includes a framework for the description and the execution of a set of DE pipelines over the same input dataset, as well a set of tools for reconciling and comparing the results. METHOD We designed an automated and easily customizable tool which is able to execute a set of existing pipelines for DE analysis at transcript level starting from RNA-seq data. Our method is built upon Snakemake, a workflow management system, with the specific goal of reducing the complexity of creating workflows. This approach guarantees that the experimentation is fully replicable and easy to customize. Each considered pipeline is structured in three steps: (i) transcript assembly, (ii) quantification, and (iii) DE analysis. By default, our tool builds and compares 9 different pipelines, each taking as input the same set of RNA-seq reads, obtained by combining different state-of-the-art methods to perform the transcript assembly (TA step) with different state-of-the-art methods to perform quantification and differential expression analysis (Q+DE step). More precisely, the 9 pipelines are obtained by combining two tools (Cufflinks and StringTie) and a Reference Annotation (Ensembl annotated transcripts) for the TA step, with three tools (Cuffquant+Cuffdiff, StringTie-B+Ballgown and Kallisto+Sleuth) for the Q+DE step. Abstract truncated at 3,000 characters - the full version is available in the pdf file

Download Full-text

DEAR-O: Differential Expression Analysis based on RNA-seq data - Online

10.1101/069807 ◽

2016 ◽

Author(s):

Zong-Hong Zhang ◽

Naomi R. Wray ◽

Qiong-Yi Zhao

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Online Discussion ◽

Differential Expression Analysis ◽

Rna Seq ◽

Timely Manner ◽

Link Type ◽

Online Discussion Forum ◽

User Friendly ◽

Transcriptomic Studies

AbstractSummaryDifferential expression analysis using high-throughput RNA sequencing (RNA-seq) data is widely applied in transcriptomic studies and many software tools have been developed for this purpose. Active development of existing popular tools, together with emergence of new tools means that studies comparing the performance of differential expression analysis methods become rapidly out-of-date. In order to enable researchers to evaluate new and updated software in a timely manner, we developed DEAR-O, a user-friendly platform for performance evaluation of differential expression analysis based on RNA-seq data. The platform currently includes four of the most popular tools: DESeq, DESeq2, edgeR and Cuffdiff2. Based on the DEAR-O platform, researchers can evaluate the performance of different tools, or the same tool with different versions, with a customised number of biological replicates using already curated RNA-seq datasets. We also initiated an online forum for discussion of RNA-seq differential expression analysis. Through this forum, new useful tools and benchmarking datasets can be introduced. Our platform will be actively maintained to ensure new major versions of existing tools and new popular tools are included. DEAR-O will serve the community by providing timely evaluations of tools, versions and number of replicates for RNA-seq differential expression analysis.Availability and implementationThe DEAR-O platform is available at http://cnsgenomics.com/software/dear-o; the online discussion forum is https://groups.google.com/d/forum/[email protected] and [email protected]

Download Full-text