scRecover: Discriminating true and false zeros in single-cell RNA-seq data for imputation

Mapping Intimacies ◽

10.1101/665323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zhun Miao ◽

Jiaqi Li ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

High Throughput ◽

Real Data ◽

Imputation Method ◽

Rna Seq ◽

Imputation Methods ◽

Zero Values ◽

Downstream Analysis

AbstractHigh-throughput single-cell RNA-seq (scRNA-seq) data contains excess zero values, including those of genes not expressed in the cell, and those produced due to dropout events. Existing imputation methods do not distinguish these two types of zeros. We present a modest imputation method scRecover to only impute the dropout zeros. It estimates the zero dropout probability of each gene in each cell, and predicts the number of truly expressed genes in the cell. scRecover is combined with other imputation methods like scImpute, SAVER and MAGIC to fulfil the imputation. Down-sampling experiments show that it recovers dropout zeros with higher accuracy and avoids over-imputing true zero values. Experiments on real data illustrate scRecover improves downstream analysis and visualization.

Download Full-text

Are dropout imputation methods for scRNA-seq effective for scHi-C data?

Briefings in Bioinformatics ◽

10.1093/bib/bbaa289 ◽

2020 ◽

Author(s):

Chenggong Han ◽

Qing Xie ◽

Shili Lin

Keyword(s):

Single Cell ◽

Real Data ◽

Considerable Improvement ◽

Rna Seq ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Data Coverage ◽

Downstream Analysis ◽

Structural Zeros ◽

Made In

Abstract The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.

Download Full-text

scIGANs: single-cell RNA-seq imputation using generative adversarial networks

10.1101/2020.01.20.913384 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yungang Xu ◽

Zhigang Zhang ◽

Lei You ◽

Jiajia Liu ◽

Zhiwei Fan ◽

...

Keyword(s):

Single Cell ◽

Generative Adversarial Networks ◽

Rna Seq ◽

Adversarial Networks ◽

Zero Values ◽

Downstream Analysis ◽

Cell Variance ◽

Many Sources ◽

Natural Cell

ABSTRACTSingle-cell RNA-sequencing (scRNA-seq) enables the characterization of transcriptomic profiles at the single-cell resolution with increasingly high throughput. However, it suffers from many sources of technical noises, including insufficient mRNA molecules that lead to excess false zero values, termed dropouts. Computational approaches have been proposed to recover the biologically meaningful expression by borrowing information from similar cells in the observed dataset. However, these methods suffer from oversmoothing and removal of natural cell-to-cell stochasticity in gene expression. Here, we propose the generative adversarial networks (GANs) for scRNA-seq imputation (scIGANs), which uses generated cells rather than observed cells to avoid these limitations and balances the performance between major and rare cell populations. Evaluations based on a variety of simulated and real scRNA-seq datasets show that scIGANs is effective for dropout imputation and enhances various downstream analysis. ScIGANs is robust to small datasets that have very few genes with low expression and/or cell-to-cell variance. ScIGANs works equally well on datasets from different scRNA-seq protocols and is scalable to datasets with over 100,000 cells. We demonstrated in many ways with compelling evidence that scIGANs is not only an application of GANs in omics data but also represents a competing imputation method for the scRNA-seq data.

Download Full-text

AdImpute: An Imputation Method for Single-Cell RNA-Seq Data Based on Semi-Supervised Autoencoders

Frontiers in Genetics ◽

10.3389/fgene.2021.739677 ◽

2021 ◽

Vol 12 ◽

Author(s):

Li Xu ◽

Yin Xu ◽

Tong Xue ◽

Xinyu Zhang ◽

Jin Li

Keyword(s):

Single Cell ◽

Missing Values ◽

Simulated Data ◽

Real Data ◽

Imputation Method ◽

Data Sets ◽

Silent Genes ◽

Downstream Analysis ◽

The Cost ◽

Simulated Data Sets

Motivation: The emergence of single-cell RNA sequencing (scRNA-seq) technology has paved the way for measuring RNA levels at single-cell resolution to study precise biological functions. However, the presence of a large number of missing values in its data will affect downstream analysis. This paper presents AdImpute: an imputation method based on semi-supervised autoencoders. The method uses another imputation method (DrImpute is used as an example) to fill the results as imputation weights of the autoencoder, and applies the cost function with imputation weights to learn the latent information in the data to achieve more accurate imputation.Results: As shown in clustering experiments with the simulated data sets and the real data sets, AdImpute is more accurate than other four publicly available scRNA-seq imputation methods, and minimally modifies the biologically silent genes. Overall, AdImpute is an accurate and robust imputation method.

Download Full-text

scIGANs: single-cell RNA-seq imputation using generative adversarial networks

Nucleic Acids Research ◽

10.1093/nar/gkaa506 ◽

2020 ◽

Vol 48 (15) ◽

pp. e85-e85 ◽

Cited By ~ 2

Author(s):

Yungang Xu ◽

Zhigang Zhang ◽

Lei You ◽

Jiajia Liu ◽

Zhiwei Fan ◽

...

Keyword(s):

Single Cell ◽

Generative Adversarial Networks ◽

Rna Seq ◽

Adversarial Networks ◽

Zero Values ◽

Downstream Analysis ◽

Cell Variance ◽

Many Sources ◽

Natural Cell

Abstract Single-cell RNA-sequencing (scRNA-seq) enables the characterization of transcriptomic profiles at the single-cell resolution with increasingly high throughput. However, it suffers from many sources of technical noises, including insufficient mRNA molecules that lead to excess false zero values, termed dropouts. Computational approaches have been proposed to recover the biologically meaningful expression by borrowing information from similar cells in the observed dataset. However, these methods suffer from oversmoothing and removal of natural cell-to-cell stochasticity in gene expression. Here, we propose the generative adversarial networks (GANs) for scRNA-seq imputation (scIGANs), which uses generated cells rather than observed cells to avoid these limitations and balances the performance between major and rare cell populations. Evaluations based on a variety of simulated and real scRNA-seq datasets show that scIGANs is effective for dropout imputation and enhances various downstream analysis. ScIGANs is robust to small datasets that have very few genes with low expression and/or cell-to-cell variance. ScIGANs works equally well on datasets from different scRNA-seq protocols and is scalable to datasets with over 100 000 cells. We demonstrated in many ways with compelling evidence that scIGANs is not only an application of GANs in omics data but also represents a competing imputation method for the scRNA-seq data.

Download Full-text

2DImpute: imputation in single-cell RNA-seq data from correlations in two dimensions

Bioinformatics ◽

10.1093/bioinformatics/btaa148 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3588-3589 ◽

Cited By ~ 1

Author(s):

Kaiyi Zhu ◽

Dimitris Anastassiou

Keyword(s):

Single Cell ◽

R Package ◽

Two Dimensions ◽

Imputation Method ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Imputation Methods ◽

Single Cell Rna Sequencing ◽

Expression Matrix

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks

10.1101/2020.02.05.935296 ◽

2020 ◽

Cited By ~ 3

Author(s):

Jiahua Rao ◽

Xiang Zhou ◽

Yutong Lu ◽

Huiying Zhao ◽

Yuedong Yang

Keyword(s):

Single Cell ◽

Clustering Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Gene Interactions ◽

Rna Seq ◽

Sequencing Technology ◽

Imputation Methods ◽

Downstream Analysis ◽

Low Dimensional

AbstractSingle-cell RNA sequencing technology promotes the profiling of single-cell transcriptomes at an unprecedented throughput and resolution. However, in scRNA-seq studies, only a low amount of sequenced mRNA in each cell leads to missing detection for a portion of mRNA molecules, i.e. the dropout problem. The dropout event hinders various downstream analysis, such as clustering analysis, differential expression analysis, and inference of gene-to-gene relationships. Therefore, it is necessary to develop robust and effective imputation methods for the increasing scRNA-seq data. In this study, we have developed an imputation method (GraphSCI) to impute the dropout events in scRNA-seq data based on the graph convolution networks. The method takes advantage of low-dimensional representations of similar cells and gene-gene interactions to impute the dropouts. Extensive experiments demonstrated that GraphSCI outperforms other state-of-the-art methods for imputation on both simulated and real scRNA-seq data. Meanwhile, GraphSCI is able to accurately infer gene-to-gene relationships by utilizing the imputed matrix that are concealed by dropout events in raw data.

Download Full-text

Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling

BMC Genomics ◽

10.1186/s12864-020-07358-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tracy M. Yamawaki ◽

Daniel R. Lu ◽

Daniel C. Ellwanger ◽

Dev Bhatt ◽

Paolo Manzanillo ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Immune Cell ◽

Cell Types ◽

Data Interpretation ◽

Detection Sensitivity ◽

Rna Seq ◽

Cell Recovery

Abstract Background Elucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation. Results Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluated methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5′ v1 and 3′ v3 methods. We demonstrate that these methods have fewer dropout events, which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures. Conclusion Overall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo.

Download Full-text

High-Throughput Single-Cell RNA-Seq of Large Cells and Nuclei

Genetic Engineering & Biotechnology News ◽

10.1089/gen.37.17.06 ◽

2017 ◽

Vol 37 (17) ◽

pp. 12-13

Author(s):

Jennifer Chew ◽

Adam Bemis ◽

Ronald Lebofsky ◽

Anna Quinlan ◽

Kelly Kaihara

Keyword(s):

Single Cell ◽

High Throughput ◽

Rna Seq

Download Full-text

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text

RgCop-A regularized copula based method for gene selection in single cell rna-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009464 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009464

Author(s):

Snehalika Lall ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Real Life ◽

Classification Performance ◽

Rna Seq ◽

Scale Invariant ◽

Dependence Measure ◽

Highly Expressed Genes ◽

The Stability ◽

Downstream Analysis

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We raise an objective function by adding a l1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop can able to annotate the unknown cells with high accuracy.

Download Full-text