TooManyCells identifies and visualizes relationships of single-cell clades

Mapping Intimacies ◽

10.1101/519660 ◽

2019 ◽

Cited By ~ 2

Author(s):

Gregory W. Schwartz ◽

Jelena Petrovic ◽

Maria Fasolino ◽

Yeqiao Zhou ◽

Stanley Cai ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Data Sets ◽

Reduction Methods ◽

Simultaneous Comparisons ◽

Spectral Clustering Algorithm ◽

The Relationship ◽

Matrix Free ◽

Cell Data

AbstractTranscriptional programs contribute to phenotypic and functional cell states. While elucidation of cell state heterogeneity and its role in biology and pathobiology has been advanced by studying single cell level measurements, the underlying assumptions of current analytical methods limit the identification and exploration of cell clades. Unlike other methods, which produce a single uni-layer partition of cells ignoring echelons of cell states, we present TooManyCells, a software consisting of a suite of graph-based tools for efficient, global, and unbiased identification and visualization of cell clades while maintaining and presenting the relationship between cell states. TooManyCells provides a set of tools based on a matrix-free efficient divisive hierarchical spectral clustering algorithm wholly different from the prevalent Louvain-based methods. BirchBeer, the visualization component of TooManyCells, introduces a new approach for single cell analysis that is built on a concept intentionally orthogonal to the widely used dimensionality reduction methods. Together, this suite of tools provide a paradigm shift in the analysis and interpretation of single cell data by enabling simultaneous comparisons of cell states at context-and application-dependent scales. A byproduct of this shift is the immediate detection and visualization of rare populations that outperforms previous algorithms as demonstrated by applying these tools to existing single cell RNA-seq data sets from various mouse organs.

Download Full-text

CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

10.1101/699041 ◽

2019 ◽

Cited By ~ 3

Author(s):

Thomas D. Sherman ◽

Tiger Gao ◽

Elana J. Fertig

Keyword(s):

Single Cell ◽

Data Structures ◽

Computational Efficiency ◽

Matrix Factorization ◽

Single Cell Analysis ◽

Sparse Data ◽

Data Sets ◽

Cell Analysis ◽

Gradient Based ◽

Cell Data

AbstractMotivationBayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis.ResultsWe upgraded CoGAPS in Version 3 to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This software includes a new parallelization framework that is designed around the sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Altogether, these updates to CoGAPS enhance the efficiency of the algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.AvailabilityCoGAPS is available as a Bioconductor package and the source code is provided at github.com/FertigLab/CoGAPS. All efficiency updates to enable single-cell analysis available as of version [email protected]

Download Full-text

CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

BMC Bioinformatics ◽

10.1186/s12859-020-03796-9 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Thomas D. Sherman ◽

Tiger Gao ◽

Elana J. Fertig

Keyword(s):

Single Cell ◽

Data Structures ◽

Computational Efficiency ◽

Matrix Factorization ◽

Single Cell Analysis ◽

Sparse Data ◽

Data Sets ◽

Cell Analysis ◽

Gradient Based ◽

Cell Data

Abstract Background Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Conclusions Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.

Download Full-text

CIM-seq

10.21203/rs.3.pex-1365/v1 ◽

2021 ◽

Author(s):

Nathanael Andrews ◽

Martin Enge

Keyword(s):

Single Cell ◽

Single Cells ◽

Likelihood Estimation ◽

Cell Types ◽

Data Sets ◽

Target Tissue ◽

Data Set ◽

Rnaseq Data ◽

The Given ◽

Cell Data

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.

Download Full-text

HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk

10.1101/506717 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jingtian Zhou ◽

Jianzhu Ma ◽

Yusi Chen ◽

Chuankai Cheng ◽

Bokan Bao ◽

...

Keyword(s):

Random Walk ◽

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Single Cells ◽

Genome Structure ◽

Real Data ◽

Cell Types ◽

3D Genome ◽

Cell Clustering

3D genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe HiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real data as benchmarks, HiCluster significantly improves clustering accuracy when applied to low coverage Hi-C datasets compared to existing methods. After imputation by HiCluster, structures similar to topologically associating domains (TADs) could be identified within single cells, and their consensus boundaries among cells were enriched at the TAD boundaries observed in bulk samples. In summary, HiCluster facilitates visualization and comparison of single-cell 3D genomes.

Download Full-text

Accuracy, Robustness and Scalability of Dimensionality Reduction Methods for Single Cell RNAseq Analysis

10.1101/641142 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shiquan Sun ◽

Jiaqiang Zhu ◽

Ying Ma ◽

Xiang Zhou

Keyword(s):

Data Analysis ◽

Dimensionality Reduction ◽

Single Cell ◽

Comprehensive Evaluation ◽

Computational Cost ◽

Noise Removal ◽

Data Sets ◽

Vast Number ◽

Cell Clustering ◽

Reduction Methods

ABSTRACTBackgroundDimensionality reduction (DR) is an indispensable analytic component for many areas of single cell RNA sequencing (scRNAseq) data analysis. Proper DR can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of DR in scRNAseq analysis and the vast number of DR methods developed for scRNAseq studies, however, few comprehensive comparison studies have been performed to evaluate the effectiveness of different DR methods in scRNAseq.ResultsHere, we aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used DR methods for scRNAseq studies. Specifically, we compared 18 different DR methods on 30 publicly available scRNAseq data sets that cover a range of sequencing techniques and sample sizes. We evaluated the performance of different DR methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluated the computational scalability of different DR methods by recording their computational cost.ConclusionsBased on the comprehensive evaluation results, we provide important guidelines for choosing DR methods for scRNAseq data analysis. We also provide all analysis scripts used in the present study atwww.xzlab.org/reproduce.html. Together, we hope that our results will serve as an important practical reference for practitioners to choose DR methods in the field of scRNAseq analysis.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

Scalable Clustering with Supervised Linkage Methods

10.1101/2021.08.01.454697 ◽

2021 ◽

Author(s):

James Anibal ◽

Alexandre Day ◽

Erol Bahadiroglu ◽

Liam O'Neill ◽

Long Phan ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Biomedical Sciences ◽

New Approach ◽

Scalable Clustering ◽

Linkage Methods ◽

Density Clustering ◽

Cell Data ◽

Different Levels

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner. We make HAL-x publicly available at: https://pypi.org/project/hal-x/

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

Comparison Between UMAP and t-SNE for Multiplex-Immunofluorescence Derived Single-Cell Data from Tissue Sections

10.1101/549659 ◽

2019 ◽

Cited By ~ 1

Author(s):

Duoduo Wu ◽

Joe Yeong Poh Sheng ◽

Grace Tan Su-En ◽

Marion Chevrier ◽

Josh Loh Jie Hua ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Cell Types ◽

Immune Markers ◽

Tissue Samples ◽

Tissue Sections ◽

Reduced Dimensions ◽

Dimensionality Reduction Technique ◽

Cell Data ◽

Worse Prognosis

AbstractUsing human hepatocellular carcinoma (HCC) tissue samples stained with seven immune markers including one nuclear counterstain, we compared and evaluated the use of a new dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP), as an alternative to t-Distributed Stochastic Neighbor Embedding (t-SNE) in analysing multiplex-immunofluorescence (mIF) derived single-cell data. We adopted an unsupervised clustering algorithm called FlowSOM to identify eight major cell types present in human HCC tissues. UMAP and t-SNE were ran independently on the dataset to qualitatively compare the distribution of clustered cell types in both reduced dimensions. Our comparison shows that UMAP is superior in runtime. Both techniques provide similar arrangements of cell clusters, with the key difference being UMAP’s extensive characteristic branching. Most interestingly, UMAP’s branching was able to highlight biological lineages, especially in identifying potential hybrid tumour cells (HTC). Survival analysis shows patients with higher proportion of HTC have a worse prognosis (p-value = 0.019). We conclude that both techniques are similar in their visualisation capabilities, but UMAP has a clear advantage over t-SNE in runtime, making it highly plausible to employ UMAP as an alternative to t-SNE in mIF data analysis.

Download Full-text

Optimal transport analysis reveals trajectories in steady-state systems

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009466 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009466

Author(s):

Stephen Zhang ◽

Anton Afanassiev ◽

Laura Greenstreet ◽

Tetsuya Matsumoto ◽

Geoffrey Schiebinger

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Single Cell Analysis ◽

Simulated Data ◽

Unified Approach ◽

Transport Analysis ◽

Time Courses ◽

Cell Trajectories ◽

Cell Data ◽

Natural Way

Understanding how cells change their identity and behaviour in living systems is an important question in many fields of biology. The problem of inferring cell trajectories from single-cell measurements has been a major topic in the single-cell analysis community, with different methods developed for equilibrium and non-equilibrium systems (e.g. haematopoeisis vs. embryonic development). We show that optimal transport analysis, a technique originally designed for analysing time-courses, may also be applied to infer cellular trajectories from a single snapshot of a population in equilibrium. Therefore, optimal transport provides a unified approach to inferring trajectories that is applicable to both stationary and non-stationary systems. Our method, StationaryOT, is mathematically motivated in a natural way from the hypothesis of a Waddington’s epigenetic landscape. We implement StationaryOT as a software package and demonstrate its efficacy in applications to simulated data as well as single-cell data from Arabidopsis thaliana root development.

Download Full-text