A gene filter for comparative analysis of single-cell RNA-sequencing trajectory datasets

Mapping Intimacies ◽

10.1101/637488 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yutong Wang ◽

Tasha Thong ◽

Venkatesh Saligrama ◽

Justin Colacino ◽

Laura Balzano ◽

...

Keyword(s):

Comparative Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Data Sets ◽

Preimplantation Embryo Development ◽

Multiple Datasets ◽

Gene Filtering ◽

Single Cell Rna Sequencing ◽

Gene Filter ◽

Or Gene

AbstractUnsupervised feature selection, or gene filtering, is a common preprocessing step to reduce the dimensionality of single-cell RNA sequencing (scRNAseq) data sets. Existing gene filters operate on scRNAseq datasets in isolation from other datasets. When jointly analyzing multiple datasets, however, there is a need for gene filters that are tailored to comparative analysis. In this work, we present a method for ranking the relevance of genes for comparing trajectory datasets. Our method is unsupervised, i.e., the cell metadata are not assumed to be known. Using the top-ranking genes significantly improves performance compared to methods not tailored to comparative analysis. We demonstrate the effectiveness of our algorithm on previously published datasets from studies on preimplantation embryo development, neurogenesis and cardiogenesis.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

Comparative analysis of single-cell RNA sequencing data from mouse spermatogonial and mesenchymal stem cells to identify differentially expressed genes and transcriptional regulators of germline cells

Journal of Cellular Physiology ◽

10.1002/jcp.26303 ◽

2018 ◽

Vol 233 (7) ◽

pp. 5231-5242 ◽

Cited By ~ 6

Author(s):

Sajjad Sisakhtnezhad ◽

Parvin Heshmati

Keyword(s):

Stem Cells ◽

Mesenchymal Stem Cells ◽

Comparative Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Transcriptional Regulators ◽

Differentially Expressed ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Download Full-text

DrivAER: Identification of driving transcriptional programs in single-cell RNA sequencing data

10.1101/864165 ◽

2019 ◽

Author(s):

Lukas M. Simon ◽

Fangfang Yan ◽

Zhongming Zhao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Disease Status ◽

Data Sets ◽

Sequencing Data ◽

Functional Interpretation ◽

Recent Success ◽

Gene Sets ◽

Single Cell Rna Sequencing ◽

Cellular Maps

AbstractSingle cell RNA sequencing (scRNA-seq) unfolds complex transcriptomic data sets into detailed cellular maps. Despite recent success, there is a pressing need for specialized methods tailored towards the functional interpretation of these cellular maps. Here, we present DrivAER, a machine learning approach that scores annotated gene sets based on their relevance to user-specified outcomes such as pseudotemporal ordering or disease status. We demonstrate that DrivAER extracts the key driving pathways and transcription factors that regulate complex biological processes from scRNA-seq data.

Download Full-text

Controlling for confounding effects in single cell RNA sequencing studies using both control and target genes

10.1101/045070 ◽

2016 ◽

Author(s):

Mengjie Chen ◽

Xiang Zhou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Target Genes ◽

Expectation Maximization Algorithm ◽

Data Sets ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Order Of Magnitude ◽

The Rich ◽

Downstream Analysis

Single cell RNA sequencing (scRNAseq) technique is becoming increasingly popular for unbiased and high-resolutional transcriptome analysis of heterogeneous cell populations. Despite its many advantages, scRNAseq, like any other genomic sequencing technique, is susceptible to the influence of confounding effects. Controlling for confounding effects in scRNAseq data is thus a crucial step for proper data normalization and accurate downstream analysis. Several recent methodological studies have demonstrated the use of control genes for controlling for confounding effects in scRNAseq studies; the control genes are used to infer the confounding effects, which are then used to normalize target genes of primary interest. However, these methods can be suboptimal as they ignore the rich information contained in the target genes. Here, we develop an alternative statistical method, which we refer to as scPLS, for more accurate inference of confounding effects. Our method is based on partial least squares and models control and target genes jointly to better infer and control for confounding effects. To accompany our method, we develop a novel expectation maximization algorithm for scalable inference. Our algorithm is an order of magnitude faster than standard ones, making scPLS applicable to hundreds of cells and hundreds of thousands of genes. With extensive simulations and comparisons with other methods, we demonstrate the effectiveness of scPLS. Finally, we apply scPLS to analyze two scRNAseq data sets to illustrate its benefits in removing technical confounding effects as well as for removing cell cycle effects.

Download Full-text

Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects; Systematic comparative analysis of single cell RNA-sequencing methods

10.1242/prelights.10803 ◽

2019 ◽

Author(s):

Rob Hynds

Keyword(s):

Comparative Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Single Cell Rna Sequencing

Download Full-text

PscB: A Browser to Explore Plant Single Cell RNA-Sequencing Data Sets

PLANT PHYSIOLOGY ◽

10.1104/pp.20.00250 ◽

2020 ◽

Vol 183 (2) ◽

pp. 464-467

Author(s):

Xiaoli Ma ◽

Tom Denyer ◽

Marja C.P. Timmermans

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Download Full-text

A Systematic Evaluation of Methods for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data

10.21203/rs.3.rs-596075/v1 ◽

2021 ◽

Author(s):

Xiaowen Cao ◽

Li Xing ◽

Elham Majd ◽

Hua He ◽

Junhua Gu ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Simulated Data ◽

Supervised Machine Learning ◽

Cell Phenotype ◽

Data Sets ◽

Phenotype Classification ◽

Single Cell Rna Sequencing ◽

Cell Phenotypes

Abstract Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Beside cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms’ computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.

Download Full-text

Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-019-2599-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 50

Author(s):

Tianyu Wang ◽

Boyang Li ◽

Craig E. Nelson ◽

Sheida Nabavi

Keyword(s):

Gene Expression ◽

Comparative Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Sequencing Data ◽

Differential Gene Expression Analysis ◽

Single Cell Rna Sequencing ◽

Differential Gene

Download Full-text

Comparative Analysis of Single-Cell RNA Sequencing Methods

Molecular Cell ◽

10.1016/j.molcel.2017.01.023 ◽

2017 ◽

Vol 65 (4) ◽

pp. 631-643.e4 ◽

Cited By ~ 529

Author(s):

Christoph Ziegenhain ◽

Beate Vieth ◽

Swati Parekh ◽

Björn Reinius ◽

Amy Guillaumet-Adkins ◽

...

Keyword(s):

Comparative Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Single Cell Rna Sequencing

Download Full-text