scholarly journals scDoc: Correcting Drop-out Events in Single-cell RNA-seq Data

2019 ◽  
Author(s):  
Di Ran ◽  
Shanshan Zhang ◽  
Nicholas Lytal ◽  
Lingling An

AbstractSingle-cell RNA sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of “drop-out” events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this paper, we present a novel Single-Cell RNA-seq Drop-Out Correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. scDoc is the first method that involves drop-out information to account for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc can impute the drop-out events more accurately and robustly; specifically, it outperforms all available imputation methods in reference to data visualization, cell subpopulation identification, and differential expression detection in scRNA-seq data.

2020 ◽  
Vol 36 (15) ◽  
pp. 4233-4239
Author(s):  
Di Ran ◽  
Shanshan Zhang ◽  
Nicholas Lytal ◽  
Lingling An

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of ‘drop-out’ events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this article, we present a novel single-cell RNA-seq drop-out correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. Results scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification and differential expression detection in scRNA-seq data. Availability and implementation R code is available at https://github.com/anlingUA/scDoc. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Jixing Zhong ◽  
Gen Tang ◽  
Jiacheng Zhu ◽  
Xin Qiu ◽  
Weiying Wu ◽  
...  

AbstractParkinson’s disease (PD) is a neurodegenerative disease leading to the impairment of execution of movement. PD pathogenesis has been largely investigated, but either restricted in bulk level or at certain cell types, which failed to capture cellular heterogeneity and intrinsic interplays among distinct cell types. To overcome this, we applied single-nucleus RNA-seq and single cell ATAC-seq on cerebellum, midbrain and striatum of PD mouse and matched control. With 74,493 cells in total, we comprehensively depicted the dysfunctions under PD pathology covering proteostasis, neuroinflammation, calcium homeostasis and extracellular neurotransmitter homeostasis. Besides, by multi-omics approach, we identified putative biomarkers for early stage of PD, based on the relationships between transcriptomic and epigenetic profiles. We located certain cell types that primarily contribute to PD early pathology, narrowing the gap between genotypes and phenotypes. Taken together, our study provides a valuable resource to dissect the molecular mechanism of PD pathogenesis at single cell level, which could facilitate the development of novel methods regarding diagnosis, monitoring and practical therapies against PD at early stage.


2018 ◽  
Author(s):  
Xuran Wang ◽  
Jihwan Park ◽  
Katalin Susztak ◽  
Nancy R. Zhang ◽  
Mingyao Li

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.


2021 ◽  
Author(s):  
Lorenzo Martini ◽  
Roberta Bardini ◽  
Stefano Di Carlo

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.


2018 ◽  
Author(s):  
Marmar Moussa ◽  
Ion I. Măndoiu

AbstractOne of the most notable challenges in single cell RNA-Seq data analysis is the so called drop-out effect, where only a fraction of the transcriptome of each cell is captured. The random nature of drop-outs, however, makes it possible to consider imputation methods as means of correcting for drop-outs. In this paper we study some existing scRNA-Seq imputation methods and propose a novel iterative imputation approach based on efficiently computing highly similar cells. We then present the results of a comprehensive assessment of existing and proposed methods on real scRNA-Seq datasets with varying per cell sequencing depth.


2020 ◽  
Author(s):  
Chi-Ming Kevin Li ◽  
Tracy M Yamawaki ◽  
Daniel R Lu ◽  
Daniel C Ellwanger ◽  
Dev Bhatt ◽  
...  

Abstract Background: Elucidation of immune populations with single-cell RNA-seq has greatly benefited the fieldof immunology by deepening the characterization of immune heterogeneity and leading to thediscovery of new subtypes. However, single-cell methods inherently suffer from limitations in therecovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropoutevents. This issue is often compounded by limited sample availability and limited prior knowledge ofheterogeneity, which can confound data interpretation.Results: Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. Weprepared 21 libraries under identical conditions of a defined mixture of two human and two murinelymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluatemethods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expressionsignatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5’v1 and 3’ v3 methods. We demonstrate that these methods have fewer drop-out events whichfacilitates the identification of differentially-expressed genes and improves the concordance of singlecellprofiles to immune bulk RNA-seq signatures.Conclusion: Overall, our characterization of immune cell mixtures provides useful metrics, which canguide selection of a high-throughput single-cell RNA-seq method for profiling more complex immunecellheterogeneity usually found in vivo.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1740 ◽  
Author(s):  
Tallulah S. Andrews ◽  
Martin Hemberg

Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.


2020 ◽  
Author(s):  
Tracy M Yamawaki ◽  
Daniel R Lu ◽  
Daniel C Ellwanger ◽  
Dev Bhatt ◽  
Paolo Manzanillo ◽  
...  

Abstract Background: Elucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation. Results: Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluate methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5’ v1 and 3’ v3 methods. We demonstrate that these methods have fewer drop-out events which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures.Conclusion: Overall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo.


2020 ◽  
Author(s):  
Tracy M. Yamawaki ◽  
Daniel R. Lu ◽  
Daniel C. Ellwanger ◽  
Dev Bhatt ◽  
Paolo Manzanillo ◽  
...  

AbstractBackgroundElucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation.ResultsHere, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluate methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5’ v1 and 3’ v3 methods. We demonstrate that these methods have fewer drop-out events which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures.ConclusionOverall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo.


Author(s):  
Hananeh Aliee ◽  
Fabian Theis

AbstractTissues are complex systems of interacting cell types. Knowing cell-type proportions in a tissue is very important to identify which cells or cell types are targeted by a disease or perturbation. When measuring such responses using RNA-seq, bulk RNA-seq masks cellular heterogeneity. Hence, several computational methods have been proposed to infer cell-type proportions from bulk RNA samples. Their performance with noisy reference profiles highly depends on the set of genes undergoing deconvolution. These genes are often selected based on prior knowledge or a single-criterion test that might not be useful to dissect closely correlated cell types. In this work, we introduce AutoGeneS, a tool that automatically extracts informative genes and reveals the cellular heterogeneity of bulk RNA samples. AutoGeneS requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types. It can be applied to reference profiles from various sources like single-cell experiments or sorted cell populations. Results from human samples of peripheral blood illustrate that AutoGeneS outperforms other methods. Our results also highlight the impact of our approach on analyzing bulk RNA samples with noisy single-cell reference profiles and closely correlated cell types. Ground truth cell proportions analyzed by flow cytometry confirmed the accuracy of the predictions of AutoGeneS in identifying cell-type proportions. AutoGeneS is available for use via a standalone Python package (https://github.com/theislab/AutoGeneS).


Sign in / Sign up

Export Citation Format

Share Document