scholarly journals Accuracy, Robustness and Scalability of Dimensionality Reduction Methods for Single Cell RNAseq Analysis

2019 ◽  
Author(s):  
Shiquan Sun ◽  
Jiaqiang Zhu ◽  
Ying Ma ◽  
Xiang Zhou

ABSTRACTBackgroundDimensionality reduction (DR) is an indispensable analytic component for many areas of single cell RNA sequencing (scRNAseq) data analysis. Proper DR can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of DR in scRNAseq analysis and the vast number of DR methods developed for scRNAseq studies, however, few comprehensive comparison studies have been performed to evaluate the effectiveness of different DR methods in scRNAseq.ResultsHere, we aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used DR methods for scRNAseq studies. Specifically, we compared 18 different DR methods on 30 publicly available scRNAseq data sets that cover a range of sequencing techniques and sample sizes. We evaluated the performance of different DR methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluated the computational scalability of different DR methods by recording their computational cost.ConclusionsBased on the comprehensive evaluation results, we provide important guidelines for choosing DR methods for scRNAseq data analysis. We also provide all analysis scripts used in the present study atwww.xzlab.org/reproduce.html. Together, we hope that our results will serve as an important practical reference for practitioners to choose DR methods in the field of scRNAseq analysis.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Shiquan Sun ◽  
Jiaqiang Zhu ◽  
Ying Ma ◽  
Xiang Zhou

Abstract Background Dimensionality reduction is an indispensable analytic component for many areas of single-cell RNA sequencing (scRNA-seq) data analysis. Proper dimensionality reduction can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of dimensionality reduction in scRNA-seq analysis and the vast number of dimensionality reduction methods developed for scRNA-seq studies, few comprehensive comparison studies have been performed to evaluate the effectiveness of different dimensionality reduction methods in scRNA-seq. Results We aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used dimensionality reduction methods for scRNA-seq studies. Specifically, we compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample sizes. We evaluate the performance of different dimensionality reduction methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluate the computational scalability of different dimensionality reduction methods by recording their computational cost. Conclusions Based on the comprehensive evaluation results, we provide important guidelines for choosing dimensionality reduction methods for scRNA-seq data analysis. We also provide all analysis scripts used in the present study at www.xzlab.org/reproduce.html.


2019 ◽  
Author(s):  
Cody N. Heiser ◽  
Ken S. Lau

SummaryHigh-dimensional data, such as those generated using single-cell RNA sequencing, present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous scRNA-seq datasets, we find that input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by eleven published dimensionality reduction methods. Code available atgithub.com/KenLauLab/DR-structure-preservationallows for rapid evaluation of further datasets and methods.


2012 ◽  
Vol 51 (04) ◽  
pp. 341-347 ◽  
Author(s):  
F. Mulas ◽  
L. Zagar ◽  
B. Zupan ◽  
R. Bellazzi

SummaryObjective: The assessment of the developmental potential of stem cells is a crucial step towards their clinical application in regenerative medicine. It has been demonstrated that genome-wide expression profiles can predict the cellular differentiation stage by means of dimensionality reduction methods. Here we show that these techniques can be further strengthened to support decision making with i) a novel strategy for gene selection; ii) methods for combining the evidence from multiple data sets.Methods: We propose to exploit dimensionality reduction methods for the selection of genes specifically activated in different stages of differentiation. To obtain an integrated predictive model, the expression values of the selected genes from multiple data sets are combined. We investigated distinct approaches that either aggregate data sets or use learning ensembles.Results: We analyzed the performance of the proposed methods on six publicly available data sets. The selection procedure identified a reduced subset of genes whose expression values gave rise to an accurate stage prediction. The assessment of predictive accuracy demonstrated a high quality of predictions for most of the data integration methods presented.Conclusion: The experimental results highlighted the main potentials of proposed approaches. These include the ability to predict the true staging by combining multiple training data sets when this could not be inferred from a single data source, and to focus the analysis on a reduced list of genes of similar predictive performance.


2020 ◽  
Vol 21 (S16) ◽  
Author(s):  
Ruiyu Xiao ◽  
Guoshan Lu ◽  
Wanqian Guo ◽  
Shuilin Jin

Abstract Background Single-cell RNA sequencing can be used to fairly determine cell types, which is beneficial to the medical field, especially the many recent studies on COVID-19. Generally, single-cell RNA data analysis pipelines include data normalization, size reduction, and unsupervised clustering. However, different normalization and size reduction methods will significantly affect the results of clustering and cell type enrichment analysis. Choices of preprocessing paths is crucial in scRNA-Seq data mining, because a proper preprocessing path can extract more important information from complex raw data and lead to more accurate clustering results. Results We proposed a method called NDRindex (Normalization and Dimensionality Reduction index) to evaluate data quality of outcomes of normalization and dimensionality reduction methods. The method includes a function to calculate the degree of data aggregation, which is the key to measuring data quality before clustering. For the five single-cell RNA sequence datasets we tested, the results proved the efficacy and accuracy of our index. Conclusions This method we introduce focuses on filling the blanks in the selection of preprocessing paths, and the result proves its effectiveness and accuracy. Our research provides useful indicators for the evaluation of RNA-Seq data.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1158 ◽  
Author(s):  
Fanny Perraudeau ◽  
Davide Risso ◽  
Kelly Street ◽  
Elizabeth Purdom ◽  
Sandrine Dudoit

Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecendented opportunity to investigate at the molecular level fundamental biological questions, such as stem cell differentiation or the discovery and characterization of rare cell types. However, such assays raise challenging statistical and computational questions and require the development of novel methodology and software. Using stem cell differentiation in the mouse olfactory epithelium as a case study, this integrated workflow provides a step-by-step tutorial to the methodology and associated software for the following four main tasks: (1) dimensionality reduction accounting for zero inflation and over dispersion and adjusting for gene and cell-level covariates; (2) cell clustering using resampling-based sequential ensemble clustering; (3) inference of cell lineages and pseudotimes; and (4) differential expression analysis along lineages.


2021 ◽  
Author(s):  
Wolfgang Kopp ◽  
Altuna Akalin ◽  
Uwe Ohler

Advances in single-cell technologies enable the routine interrogation of chromatin accessibility for tens of thousands of single cells, shedding light on gene regulatory processes at an unprecedented resolution. Meanwhile, size, sparsity and high dimensionality of the resulting data continue to pose challenges for its computational analysis, and specifically the integration of data from different sources. We have developed a dedicated computational approach, a variational auto-encoder using a noise model specifically designed for single-cell ATAC-seq data, which facilitates simultaneous dimensionality reduction and batch correction via an adversarial learning strategy. We showcase both its individual advantages on carefully chosen real and simulated data sets, as well as the benefits for detailed cell type characterization via integrating multiple complex datasets.


F1000Research ◽  
2020 ◽  
Vol 8 ◽  
pp. 2120
Author(s):  
Miroslav Kratochvíl ◽  
Abhishek Koladiya ◽  
Jiří Vondrášek

EmbedSOM is a simple and fast dimensionality reduction algorithm, originally developed for its applications in single-cell cytometry data analysis. We present an updated version of EmbedSOM, viewed as an algorithm for landmark-directed embedding enrichment, and demonstrate that it works well even with manifold-learning techniques other than the self-organizing maps. Using this generalization, we introduce an inwards-growing variant of self-organizing maps that is designed to mitigate some earlier identified deficiencies of EmbedSOM output. Finally, we measure the performance of the generalized EmbedSOM, compare several variants of the algorithm that utilize different landmark-generating functions, and showcase the functionality on single-cell cytometry datasets from recent studies.


2020 ◽  
Author(s):  
Felix Raimundo ◽  
Celine Vallot ◽  
Jean Philippe Vert

AbstractBackgroundMany computational methods have been developed recently to analyze single-cell RNA-seq (scRNA-seq) data. Several benchmark studies have compared these methods on their ability for dimensionality reduction, clustering or differential analysis, often relying on default parameters. Yet given the biological diversity of scRNA-seq datasets, parameter tuning might be essential for the optimal usage of methods, and determining how to tune parameters remains an unmet need.ResultsHere, we propose a benchmark to assess the performance of five methods, systematically varying their tunable parameters, for dimension reduction of scRNA-seq data, a common first step to many downstream applications such as cell type identification or trajectory inference. We run a total of 1.5 million experiments to assess the influence of parameter changes on the performance of each method, and propose two strategies to automatically tune parameters for methods that need it.ConclusionsWe find that principal component analysis (PCA)-based methods like scran and Seurat are competitive with default parameters but do not benefit much from parameter tuning, while more complex models like ZinbWave, DCA and scVI can reach better performance but after parameter tuning.


2020 ◽  
Vol 12 (7) ◽  
pp. 1104
Author(s):  
Jiansi Ren ◽  
Ruoxiang Wang ◽  
Gang Liu ◽  
Ruyi Feng ◽  
Yuanni Wang ◽  
...  

The classification of hyperspectral remote sensing images is difficult due to the curse of dimensionality. Therefore, it is necessary to find an effective way to reduce the dimensions of such images. The Relief-F method has been introduced for supervising dimensionality reduction, but the band subset obtained by this method has a large number of continuous bands, resulting in a reduction in the classification accuracy. In this paper, an improved method—called Partitioned Relief-F—is presented to mitigate the influence of continuous bands on classification accuracy while retaining important information. Firstly, the importance scores of each band are obtained using the original Relief-F method. Secondly, the whole band interval is divided in an orderly manner, using a partitioning strategy according to the correlation between the bands. Finally, the band with the highest importance score is selected in each sub-interval. To verify the effectiveness of the proposed Partitioned Relief-F method, a classification experiment is performed on three publicly available data sets. The dimensionality reduction methods Principal Component Analysis (PCA) and original Relief-F are selected for comparison. Furthermore, K-Means and Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) are selected for comparison in terms of partitioning strategy. This paper mainly measures the effectiveness of each method indirectly, using the overall accuracy of the final classification. The experimental results indicate that the addition of the proposed partitioning strategy increases the overall accuracy of the three data sets by 1.55%, 3.14%, and 0.83%, respectively. In general, the proposed Partitioned Relief-F method can achieve significantly superior dimensionality reduction effects.


Sign in / Sign up

Export Citation Format

Share Document