Are dropout imputation methods for scRNA-seq effective for scHi-C data?

Briefings in Bioinformatics ◽

10.1093/bib/bbaa289 ◽

2020 ◽

Author(s):

Chenggong Han ◽

Qing Xie ◽

Shili Lin

Keyword(s):

Single Cell ◽

Real Data ◽

Considerable Improvement ◽

Rna Seq ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Data Coverage ◽

Downstream Analysis ◽

Structural Zeros ◽

Made In

Abstract The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.

Download Full-text

scRecover: Discriminating true and false zeros in single-cell RNA-seq data for imputation

10.1101/665323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zhun Miao ◽

Jiaqi Li ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

High Throughput ◽

Real Data ◽

Imputation Method ◽

Rna Seq ◽

Imputation Methods ◽

Zero Values ◽

Downstream Analysis

AbstractHigh-throughput single-cell RNA-seq (scRNA-seq) data contains excess zero values, including those of genes not expressed in the cell, and those produced due to dropout events. Existing imputation methods do not distinguish these two types of zeros. We present a modest imputation method scRecover to only impute the dropout zeros. It estimates the zero dropout probability of each gene in each cell, and predicts the number of truly expressed genes in the cell. scRecover is combined with other imputation methods like scImpute, SAVER and MAGIC to fulfil the imputation. Down-sampling experiments show that it recovers dropout zeros with higher accuracy and avoids over-imputing true zero values. Experiments on real data illustrate scRecover improves downstream analysis and visualization.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Comparison of computational methods for imputing single-cell RNA-sequencing data

10.1101/241190 ◽

2017 ◽

Cited By ~ 10

Author(s):

Lihua Zhang ◽

Shihua Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Real Data ◽

Cell Types ◽

Biological Functions ◽

Sequencing Data ◽

Imputation Methods ◽

Future Studies ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA-sequencing (scRNA-seq) is a recent breakthrough technology, which paves the way for measuring RNA levels at single cell resolution to study precise biological functions. One of the main challenges when analyzing scRNA-seq data is the presence of zeros or dropout events, which may mislead downstream analyses. To compensate the dropout effect, several methods have been developed to impute gene expression since the first Bayesian-based method being proposed in 2016. However, these methods have shown very diverse characteristics in terms of model hypothesis and imputation performance. Thus, large-scale comparison and evaluation of these methods is urgently needed now. To this end, we compared eight imputation methods, evaluated their power in recovering original real data, and performed broad analyses to explore their effects on clustering cell types, detecting differentially expressed genes, and reconstructing lineage trajectories in the context of both simulated and real data. Simulated datasets and case studies highlight that there are no one method performs the best in all the situations. Some defects of these methods such as scalability, robustness and unavailability in some situations need to be addressed in future studies.

Download Full-text

A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods

10.1101/2020.01.29.925974 ◽

2020 ◽

Cited By ~ 9

Author(s):

Wenpin Hou ◽

Zhicheng Ji ◽

Hongkai Ji ◽

Stephanie C. Hicks

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Rapid Development ◽

Systematic Evaluation ◽

Rna Seq ◽

Experimental Protocol ◽

Improve Performance ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Experimental Protocols

ABSTRACTThe rapid development of single-cell RNA-sequencing (scRNA-seq) technology, with increased sparsity compared to bulk RNA-sequencing (RNA-seq), has led to the emergence of many methods for preprocessing, including imputation methods. Here, we systematically evaluate the performance of 18 state-of-the-art scRNA-seq imputation methods using cell line and tissue data measured across experimental protocols. Specifically, we assess the similarity of imputed cell profiles to bulk samples as well as investigate whether methods recover relevant biological signals or introduce spurious noise in three downstream analyses: differential expression, unsupervised clustering, and inferring pseudotemporal trajectories. Broadly, we found significant variability in the performance of the methods across evaluation settings. While most scRNA-seq imputation methods recover biological expression observed in bulk RNA-seq data, the majority of the methods do not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. Furthermore, we find that the performance of scRNA-seq imputation methods depends on many factors including the experimental protocol, the sparsity of the data, the number of cells in the dataset, and the magnitude of the effect sizes. We summarize our results and provide a key set of recommendations for users and investigators to navigate the current space of scRNA-seq imputation methods.

Download Full-text

Benchmarking UMI-based single-cell RNA-seq preprocessing workflows

Genome Biology ◽

10.1186/s13059-021-02552-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yue You ◽

Luyi Tian ◽

Shian Su ◽

Xueyi Dong ◽

Jafar S. Jabbari ◽

...

Keyword(s):

Single Cell ◽

Ground Truth ◽

Rna Seq ◽

Clustering Methods ◽

Biological Complexity ◽

Cell Type ◽

Analysis Process ◽

Single Cell Rna Sequencing ◽

Downstream Analysis ◽

Almost All

Abstract Background Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. Results Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. Conclusions In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.

Download Full-text

PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

10.1101/2020.11.17.387779 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dongyuan Song ◽

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Real Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Ill Posed ◽

Differential Gene ◽

Cell Trajectory ◽

Downstream Analysis

AbstractIn the investigation of molecular mechanisms underlying cell state changes, a crucial analysis is to identify differentially expressed (DE) genes along a continuous cell trajectory, which can be estimated by pseudotime inference from single-cell RNA-sequencing (scRNA-seq) data. However, existing methods that identify DE genes based on inferred pseudotime do not account for the uncertainty in pseudotime inference. Also, they either have ill-posed p-values that hinder the control of false discovery rate (FDR) or have restrictive models that reduce the power of DE gene identification. To overcome these drawbacks, we propose PseudotimeDE, a robust method that accounts for the uncertainty in pseudotime inference and thus identifies DE genes along cell pseudotime with well-calibrated p-values. PseudotimeDE is flexible in allowing users to specify the pseudotime inference method and to choose the appropriate model for scRNA-seq data. Comprehensive simulations and real-data applications verify that PseudotimeDE provides well-calibrated p-values essential for controlling FDR and downstream analysis and that PseudotimeDE is more powerful than existing methods to identify DE genes.

Download Full-text

2DImpute: imputation in single-cell RNA-seq data from correlations in two dimensions

Bioinformatics ◽

10.1093/bioinformatics/btaa148 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3588-3589 ◽

Cited By ~ 1

Author(s):

Kaiyi Zhu ◽

Dimitris Anastassiou

Keyword(s):

Single Cell ◽

R Package ◽

Two Dimensions ◽

Imputation Method ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Expression Matrix

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks

10.1101/2020.02.05.935296 ◽

2020 ◽

Cited By ~ 3

Author(s):

Jiahua Rao ◽

Xiang Zhou ◽

Yutong Lu ◽

Huiying Zhao ◽

Yuedong Yang

Keyword(s):

Single Cell ◽

Clustering Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Gene Interactions ◽

Rna Seq ◽

Sequencing Technology ◽

Imputation Methods ◽

Downstream Analysis ◽

Low Dimensional

AbstractSingle-cell RNA sequencing technology promotes the profiling of single-cell transcriptomes at an unprecedented throughput and resolution. However, in scRNA-seq studies, only a low amount of sequenced mRNA in each cell leads to missing detection for a portion of mRNA molecules, i.e. the dropout problem. The dropout event hinders various downstream analysis, such as clustering analysis, differential expression analysis, and inference of gene-to-gene relationships. Therefore, it is necessary to develop robust and effective imputation methods for the increasing scRNA-seq data. In this study, we have developed an imputation method (GraphSCI) to impute the dropout events in scRNA-seq data based on the graph convolution networks. The method takes advantage of low-dimensional representations of similar cells and gene-gene interactions to impute the dropouts. Extensive experiments demonstrated that GraphSCI outperforms other state-of-the-art methods for imputation on both simulated and real scRNA-seq data. Meanwhile, GraphSCI is able to accurately infer gene-to-gene relationships by utilizing the imputed matrix that are concealed by dropout events in raw data.

Download Full-text

Critical downstream analysis steps for single-cell RNA sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbab105 ◽

2021 ◽

Author(s):

Zilong Zhang ◽

Feifei Cui ◽

Chen Lin ◽

Lingling Zhao ◽

Chunyu Wang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Noisy Data ◽

Single Cell Level ◽

Cell Type ◽

Sequencing Data ◽

Cell Level ◽

Bioinformatics Tool ◽

Single Cell Rna Sequencing ◽

Downstream Analysis

Abstract Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.

Download Full-text

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text