CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

BMC Bioinformatics ◽

10.1186/s12859-020-03796-9 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Thomas D. Sherman ◽

Tiger Gao ◽

Elana J. Fertig

Keyword(s):

Single Cell ◽

Data Structures ◽

Computational Efficiency ◽

Matrix Factorization ◽

Single Cell Analysis ◽

Sparse Data ◽

Data Sets ◽

Cell Analysis ◽

Gradient Based ◽

Cell Data

Abstract Background Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Conclusions Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.

Download Full-text

GranatumX: A community engaging and flexible software environment for single-cell analysis

10.1101/385591 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xun Zhu ◽

Breck Yunits ◽

Thomas Wolfgruber ◽

Yu Liu ◽

Qianhui Huang ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Programming Languages ◽

Single Cell Analysis ◽

Cell Analysis ◽

Software Environment ◽

Software Developers ◽

Software Ecosystem ◽

Graphical Environment ◽

Cell Data

AbstractWe present GranatumX, the next-generation software environment for single-cell data analysis. It enables biologists access to the latest single-cell bioinformatics methods in a graphical environment. It also offers software developers the opportunity to rapidly promote their own tools with others in customizable pipelines. The architecture of GranatumX allows for easy inclusion of plugin modules, named “Gboxes”, that wrap around bioinformatics tools written in various programming languages. GranatumX can be run in the cloud or private servers, and generate reproducible results. It is expected to become a community-engaging, flexible, and evolving software ecosystem for scRNA-Seq analysis, connecting developers with bench scientists. GranatumX is freely accessible at: http://garmiregroup.org/granatumx/app

Download Full-text

Multitask learning for Transformers with application to large-scale single-cell transcriptomes

10.1101/2020.02.05.935239 ◽

2020 ◽

Author(s):

Minxing Pang ◽

Jesper Tegnér

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cell Analysis ◽

Brain Atlas ◽

Biological Knowledge ◽

Data Sets ◽

Cell Analysis ◽

Large Scale Data ◽

Components Analysis ◽

Scale Data

AbstractRecent progress in machine learning provides competitive methods for bioinformatics in many traditional topics, such as transcriptomes sequence and single-cell analysis. However, discovering biomedical correlation of cells that are present across large-scale data sets remains challenging. Our attention-based neural network module with 300 million parameters is able to capture biological knowledge in a data-driven way. The module contains high-quality embedding, taxonomy analysis and similarity measurement. We tested the model on Mouse Brain Atlas, which consists of 160,000 cells and 25,000 genes. Our module obtained some interesting findings that have been verified by biologists and got better performance when benchmarked against autoencoder and principal components analysis.

Download Full-text

12 Grand Challenges in Single-Cell Data Science

10.7287/peerj.preprints.27885 ◽

2019 ◽

Author(s):

David Laehnemann ◽

Johannes Köster ◽

Ewa Szczurek ◽

Davis J McCarthy ◽

Stephanie C Hicks ◽

...

Keyword(s):

Single Cell ◽

Cell Biology ◽

Data Science ◽

Single Cell Analysis ◽

State Of The Art ◽

Cell Analysis ◽

Open Problems ◽

Single Cell Sequencing ◽

Current State ◽

Cell Data

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single-Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single-Cell Data Science' for the coming years.

Download Full-text

12 Grand challenges in single-cell data science

10.7287/peerj.preprints.27885v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

David Laehnemann ◽

Johannes Köster ◽

Ewa Szcureck ◽

Davis McCarthy ◽

Stephanie C Hicks ◽

...

Keyword(s):

Single Cell ◽

Cell Biology ◽

Data Science ◽

Single Cell Analysis ◽

State Of The Art ◽

Cell Analysis ◽

Open Problems ◽

Single Cell Sequencing ◽

Current State ◽

Cell Data

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.

Download Full-text

Neural Data Visualization for Scalable and Generalizable Single Cell Analysis

10.1101/289223 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hyunghoon Cho ◽

Bonnie Berger ◽

Jian Peng

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Single Cells ◽

Data Sets ◽

Cell Analysis ◽

Data Set ◽

Unseen Data ◽

Sequencing Experiment ◽

Cell Expression

SummarySingle-cell RNA sequencing is becoming effective and accessible as emerging technologies push its scale to millions of cells and beyond. Visualizing the landscape of single cell expression has been a fundamental tool in single cell analysis. However, standard methods for visualization, such as t-stochastic neighbor embedding (t-SNE), not only lack scalability to data sets with millions of cells, but also are unable to generalize to new cells, an important ability for transferring knowledge across fast-accumulating data sets. We introduce net-SNE, which trains a neural network to learn a high quality visualization of single cells that newly generalizes to unseen data. While matching the visualization quality of t-SNE on 14 benchmark data sets of varying sizes, from hundreds to 1.3 million cells, net-SNE also effectively positions previously unseen cells, even when an entire subtype is missing from the initial data set or when the new cells are from a different sequencing experiment. Furthermore, given a “reference” visualization, net-SNE can vastly reduce the computational burden of visualizing millions of single cells from multiple days to just a few minutes of runtime. Our work provides a general framework for newly bootstrapping single cell analysis from existing data sets.

Download Full-text

12 Grand Challenges in Single-Cell Data Science

10.7287/peerj.preprints.27885v3 ◽

2019 ◽

Cited By ~ 1

Author(s):

David Laehnemann ◽

Johannes Köster ◽

Ewa Szczurek ◽

Davis J McCarthy ◽

Stephanie C Hicks ◽

...

Keyword(s):

Single Cell ◽

Cell Biology ◽

Data Science ◽

Single Cell Analysis ◽

State Of The Art ◽

Cell Analysis ◽

Open Problems ◽

Single Cell Sequencing ◽

Current State ◽

Cell Data

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single-Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single-Cell Data Science' for the coming years.

Download Full-text

singlecellVR: interactive visualization of single-cell data in virtual reality

10.1101/2020.07.30.229534 ◽

2020 ◽

Author(s):

David F. Stein ◽

Huidong Chen ◽

Michael E. Vinyard ◽

Luca Pinello

Keyword(s):

Virtual Reality ◽

Single Cell ◽

Single Cell Analysis ◽

Data Conversion ◽

Cell Populations ◽

Complex Data ◽

Cell Analysis ◽

Cell Assays ◽

User Friendly ◽

Cell Data

ABSTRACTSingle-cell assays have transformed our ability to model heterogeneity within cell populations and tissues. Virtual Reality (VR) has recently emerged as a powerful technology to dynamically explore complex data. However, expensive hardware or advanced data preprocessing skills are required to adapt such technology to single-cell data. To address current shortcomings, we present singlecellVR, a user-friendly website for visualizing single-cell data, designed for cheap and easily available virtual reality hardware (e.g., Google Cardboard, ∼$8). We provide a companion package, scvr to streamline data conversion from the most widely-adopted single-cell analysis tools and a database of pre-analyzed datasets to which users can contribute.

Download Full-text

12 Grand challenges in single-cell data science

10.7287/peerj.preprints.27885v2 ◽

2019 ◽

Author(s):

David Laehnemann ◽

Johannes Köster ◽

Ewa Szczurek ◽

Davis J McCarthy ◽

Stephanie C Hicks ◽

...

Keyword(s):

Single Cell ◽

Cell Biology ◽

Data Science ◽

Single Cell Analysis ◽

State Of The Art ◽

Cell Analysis ◽

Open Problems ◽

Single Cell Sequencing ◽

Current State ◽

Cell Data

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.

Download Full-text

TooManyCells identifies and visualizes relationships of single-cell clades

10.1101/519660 ◽

2019 ◽

Cited By ~ 2

Author(s):

Gregory W. Schwartz ◽

Jelena Petrovic ◽

Maria Fasolino ◽

Yeqiao Zhou ◽

Stanley Cai ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Data Sets ◽

Reduction Methods ◽

Simultaneous Comparisons ◽

Spectral Clustering Algorithm ◽

The Relationship ◽

Matrix Free ◽

Cell Data

AbstractTranscriptional programs contribute to phenotypic and functional cell states. While elucidation of cell state heterogeneity and its role in biology and pathobiology has been advanced by studying single cell level measurements, the underlying assumptions of current analytical methods limit the identification and exploration of cell clades. Unlike other methods, which produce a single uni-layer partition of cells ignoring echelons of cell states, we present TooManyCells, a software consisting of a suite of graph-based tools for efficient, global, and unbiased identification and visualization of cell clades while maintaining and presenting the relationship between cell states. TooManyCells provides a set of tools based on a matrix-free efficient divisive hierarchical spectral clustering algorithm wholly different from the prevalent Louvain-based methods. BirchBeer, the visualization component of TooManyCells, introduces a new approach for single cell analysis that is built on a concept intentionally orthogonal to the widely used dimensionality reduction methods. Together, this suite of tools provide a paradigm shift in the analysis and interpretation of single cell data by enabling simultaneous comparisons of cell states at context-and application-dependent scales. A byproduct of this shift is the immediate detection and visualization of rare populations that outperforms previous algorithms as demonstrated by applying these tools to existing single cell RNA-seq data sets from various mouse organs.

Download Full-text