psupertime: supervised pseudotime inference for single cell RNA-seq data with sequential labels

Mapping Intimacies ◽

10.1101/622001 ◽

2019 ◽

Cited By ~ 3

Author(s):

Will Macnair ◽

Manfred Claassen

Keyword(s):

Single Cell ◽

Biological Processes ◽

Rna Seq ◽

Batch Effects ◽

Substantial Variation ◽

Wide Range ◽

Time Series Studies ◽

Development And Differentiation ◽

Inference Methods ◽

Simple Regression

AbstractSingle cell RNA-seq has been successfully combined with pseudotime inference methods to investigate biological processes which have sequential labels, such as time series studies of development and differentiation. Pseudotime methods developed to date ignore the labels, and where there is substantial variation in the data not associated with the labels (such as cell cycle variation or batch effects), they can fail to find relevant genes. We introduce psupertime, a supervised pseudotime approach which outperforms benchmark pseudotime methods by explicitly using the sequential labels as input. psupertime uses a simple, regression-based model, which by acknowledging the labels assures that genes relevant to the process, rather than to major drivers of variation, are found. psupertime is applicable to the wide range of single cell RNA-seq datasets with sequential labels, derived from either experimental design or user-selected cell cluster sequences, and provides a tool for targeted identification of genes regulated along biological processes.

Download Full-text

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes ◽

10.3390/genes12020311 ◽

2021 ◽

Vol 12 (2) ◽

pp. 311

Author(s):

Zhenqiu Liu

Keyword(s):

Single Cell ◽

Free Parameter ◽

Graphical Model ◽

Expression Patterns ◽

Information Criterion ◽

Log P ◽

Rna Seq ◽

Clustering Methods ◽

Wide Range ◽

Free Parameters

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Download Full-text

Comparative analysis of antibody- and lipid-based multiplexing methods for single-cell RNA-seq

10.1101/2020.11.16.384222 ◽

2020 ◽

Author(s):

Viacheslav Mylka ◽

Jeroen Aerts ◽

Irina Matetovici ◽

Suresh Poovathingal ◽

Niels Vandamme ◽

...

Keyword(s):

Genetic Variation ◽

Comparative Analysis ◽

Single Cell ◽

Cell Lines ◽

Clinical Studies ◽

Clinical Samples ◽

Rna Seq ◽

Batch Effects ◽

Single Cell Sequencing ◽

Single Nucleus

ABSTRACTMultiplexing of samples in single-cell RNA-seq studies allows significant reduction of experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or - lipids allow barcoding sample-specific cells, a process called ‘hashing’. Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines. Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench

10.1101/2020.05.22.111211 ◽

2020 ◽

Author(s):

Ruben Chazarra-Gil ◽

Stijn van Dongen ◽

Vladimir Yu Kiselev ◽

Martin Hemberg

Keyword(s):

Single Cell ◽

Computational Methods ◽

Rna Seq ◽

Batch Effects ◽

Systematic Comparison ◽

Batch Correction ◽

Link Type ◽

Biological Signals ◽

The Cost

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.

Download Full-text

SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination

Virus Evolution ◽

10.1093/ve/vez003 ◽

2019 ◽

Vol 5 (1) ◽

Cited By ~ 5

Author(s):

Abbas Jariani ◽

Christopher Warth ◽

Koen Deforche ◽

Pieter Libin ◽

Alexei J Drummond ◽

...

Keyword(s):

Software Package ◽

Point Mutations ◽

Sequence Evolution ◽

Biological Processes ◽

Viral Sequence ◽

Recombination Point ◽

Wide Range ◽

Cross Platform ◽

Evolution Dynamics ◽

Inference Methods

Abstract Simulations are widely used to provide expectations and predictive distributions under known conditions against which to compare empirical data. Such simulations are also invaluable for testing and comparing the behaviour and power of inference methods. We describe SANTA-SIM, a software package to simulate the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components: replication, recombination, point mutations, insertion–deletions, and selection under various fitness models and population size dynamics. The software is designed to be intuitive to work with for a wide range of users and executable in a cross-platform manner.

Download Full-text

STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa755 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Source Code ◽

Cell Types ◽

R Package ◽

Computational Method ◽

Biological Variability ◽

Rna Seq ◽

Batch Effects ◽

Guide Trees

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.

Download Full-text

Interpretable factor models of single-cell RNA-seq via variational autoencoders

10.1101/737601 ◽

2019 ◽

Cited By ~ 2

Author(s):

Valentine Svensson ◽

Lior Pachter

Keyword(s):

Gene Expression ◽

Single Cell ◽

Statistical Inference ◽

Factor Models ◽

Rna Seq ◽

Cell Type ◽

Massive Datasets ◽

Domain Specific ◽

Variational Autoencoder ◽

Inference Methods

Single cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be of interest for other applications.

Download Full-text

SCOUT: Single-cell outlier analysis in cancer

10.1101/2020.03.25.007518 ◽

2020 ◽

Author(s):

Giovana Ravizzoni Onzi ◽

Juliano Luiz Faccioni ◽

Alvaro G. Alvarado ◽

Paula Andreghetto Bracco ◽

Harley I. Kornblum ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Biological Markers ◽

Rna Seq ◽

Outlier Analysis ◽

Mass Cytometry ◽

Wide Range ◽

Cell Data

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.

Download Full-text

Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization

10.7287/peerj.preprints.1839 ◽

2016 ◽

Author(s):

Xun Zhu ◽

Travers Ching ◽

Xinghua Pan ◽

Sherman Weissman ◽

Lana Garmire

Keyword(s):

Single Cell ◽

Hierarchical Clustering ◽

Matrix Factorization ◽

New Technology ◽

Data Sets ◽

Biological Processes ◽

Rna Seq ◽

Clustering Methods ◽

Hematopoietic Stem ◽

Non Negative Matrix Factorization

Single-cell RNA-Sequencing (scRNA-Seq) is a cutting edge technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we have investigated the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq data sets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy even when the clustering results of K-means and hierarchical clustering are enhanced by t-SNE. Moreover, NMF successfully detect the subpopulations, such as those in a single glioblastoma patient. Furthermore, in conjugation with the modularity detection method FEM, it reveals unique modules that are indicative of clinical subtypes. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data, and the NMFEM pipeline is suitable for modularity detection among single-cell RNA-Seq data.

Download Full-text

A novel single-cell based method for breast cancer prognosis

10.1101/2020.04.26.062794 ◽

2020 ◽

Author(s):

Xiaomei Li ◽

Lin Liu ◽

Greg Goodall ◽

Andreas Schreiber ◽

Taosheng Xu ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Breast Cancer Prognosis ◽

Cancer Prognosis ◽

Biological Processes ◽

Expression Data ◽

Rna Seq ◽

Novel Method

AbstractBreast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels. In this paper, we propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, we use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.Author summaryVarious computational methods have been developed for breast cancer prognosis. However, those methods mainly use the gene expression data generated by the bulk RNA sequencing techniques, which average the expression level of a gene across different cell types. As breast cancer is a heterogenous disease, the bulk gene expression may not be the ideal resource for cancer prognosis. In this study, we propose a novel method to improve breast cancer prognosis using scRNA-seq data. The proposed method has been applied to the EMT scRNA-seq dataset for identifying breast cancer signatures for prognosis. In comparison with existing bulk expression data based methods in breast cancer prognosis, our method shows a better performance. Our single-cell-based signatures provide clues to the relation between EMT and clinical outcomes of breast cancer. In addition, the proposed method can also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.

Download Full-text