scholarly journals CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures

2019 ◽  
Author(s):  
Thomas D. Sherman ◽  
Tiger Gao ◽  
Elana J. Fertig

AbstractMotivationBayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis.ResultsWe upgraded CoGAPS in Version 3 to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This software includes a new parallelization framework that is designed around the sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Altogether, these updates to CoGAPS enhance the efficiency of the algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.AvailabilityCoGAPS is available as a Bioconductor package and the source code is provided at github.com/FertigLab/CoGAPS. All efficiency updates to enable single-cell analysis available as of version [email protected]

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Thomas D. Sherman ◽  
Tiger Gao ◽  
Elana J. Fertig

Abstract Background Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Conclusions Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.


2018 ◽  
Author(s):  
Xun Zhu ◽  
Breck Yunits ◽  
Thomas Wolfgruber ◽  
Yu Liu ◽  
Qianhui Huang ◽  
...  

AbstractWe present GranatumX, the next-generation software environment for single-cell data analysis. It enables biologists access to the latest single-cell bioinformatics methods in a graphical environment. It also offers software developers the opportunity to rapidly promote their own tools with others in customizable pipelines. The architecture of GranatumX allows for easy inclusion of plugin modules, named “Gboxes”, that wrap around bioinformatics tools written in various programming languages. GranatumX can be run in the cloud or private servers, and generate reproducible results. It is expected to become a community-engaging, flexible, and evolving software ecosystem for scRNA-Seq analysis, connecting developers with bench scientists. GranatumX is freely accessible at: http://garmiregroup.org/granatumx/app


2020 ◽  
Author(s):  
Minxing Pang ◽  
Jesper Tegnér

AbstractRecent progress in machine learning provides competitive methods for bioinformatics in many traditional topics, such as transcriptomes sequence and single-cell analysis. However, discovering biomedical correlation of cells that are present across large-scale data sets remains challenging. Our attention-based neural network module with 300 million parameters is able to capture biological knowledge in a data-driven way. The module contains high-quality embedding, taxonomy analysis and similarity measurement. We tested the model on Mouse Brain Atlas, which consists of 160,000 cells and 25,000 genes. Our module obtained some interesting findings that have been verified by biologists and got better performance when benchmarked against autoencoder and principal components analysis.


2019 ◽  
Author(s):  
David Laehnemann ◽  
Johannes Köster ◽  
Ewa Szczurek ◽  
Davis J McCarthy ◽  
Stephanie C Hicks ◽  
...  

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single-Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single-Cell Data Science' for the coming years.


Author(s):  
David Laehnemann ◽  
Johannes Köster ◽  
Ewa Szcureck ◽  
Davis McCarthy ◽  
Stephanie C Hicks ◽  
...  

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.


2018 ◽  
Author(s):  
Hyunghoon Cho ◽  
Bonnie Berger ◽  
Jian Peng

SummarySingle-cell RNA sequencing is becoming effective and accessible as emerging technologies push its scale to millions of cells and beyond. Visualizing the landscape of single cell expression has been a fundamental tool in single cell analysis. However, standard methods for visualization, such as t-stochastic neighbor embedding (t-SNE), not only lack scalability to data sets with millions of cells, but also are unable to generalize to new cells, an important ability for transferring knowledge across fast-accumulating data sets. We introduce net-SNE, which trains a neural network to learn a high quality visualization of single cells that newly generalizes to unseen data. While matching the visualization quality of t-SNE on 14 benchmark data sets of varying sizes, from hundreds to 1.3 million cells, net-SNE also effectively positions previously unseen cells, even when an entire subtype is missing from the initial data set or when the new cells are from a different sequencing experiment. Furthermore, given a “reference” visualization, net-SNE can vastly reduce the computational burden of visualizing millions of single cells from multiple days to just a few minutes of runtime. Our work provides a general framework for newly bootstrapping single cell analysis from existing data sets.


Author(s):  
David Laehnemann ◽  
Johannes Köster ◽  
Ewa Szczurek ◽  
Davis J McCarthy ◽  
Stephanie C Hicks ◽  
...  

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single-Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single-Cell Data Science' for the coming years.


2020 ◽  
Author(s):  
David F. Stein ◽  
Huidong Chen ◽  
Michael E. Vinyard ◽  
Luca Pinello

ABSTRACTSingle-cell assays have transformed our ability to model heterogeneity within cell populations and tissues. Virtual Reality (VR) has recently emerged as a powerful technology to dynamically explore complex data. However, expensive hardware or advanced data preprocessing skills are required to adapt such technology to single-cell data. To address current shortcomings, we present singlecellVR, a user-friendly website for visualizing single-cell data, designed for cheap and easily available virtual reality hardware (e.g., Google Cardboard, ∼$8). We provide a companion package, scvr to streamline data conversion from the most widely-adopted single-cell analysis tools and a database of pre-analyzed datasets to which users can contribute.


2019 ◽  
Author(s):  
David Laehnemann ◽  
Johannes Köster ◽  
Ewa Szczurek ◽  
Davis J McCarthy ◽  
Stephanie C Hicks ◽  
...  

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.


2019 ◽  
Author(s):  
Gregory W. Schwartz ◽  
Jelena Petrovic ◽  
Maria Fasolino ◽  
Yeqiao Zhou ◽  
Stanley Cai ◽  
...  

AbstractTranscriptional programs contribute to phenotypic and functional cell states. While elucidation of cell state heterogeneity and its role in biology and pathobiology has been advanced by studying single cell level measurements, the underlying assumptions of current analytical methods limit the identification and exploration of cell clades. Unlike other methods, which produce a single uni-layer partition of cells ignoring echelons of cell states, we present TooManyCells, a software consisting of a suite of graph-based tools for efficient, global, and unbiased identification and visualization of cell clades while maintaining and presenting the relationship between cell states. TooManyCells provides a set of tools based on a matrix-free efficient divisive hierarchical spectral clustering algorithm wholly different from the prevalent Louvain-based methods. BirchBeer, the visualization component of TooManyCells, introduces a new approach for single cell analysis that is built on a concept intentionally orthogonal to the widely used dimensionality reduction methods. Together, this suite of tools provide a paradigm shift in the analysis and interpretation of single cell data by enabling simultaneous comparisons of cell states at context-and application-dependent scales. A byproduct of this shift is the immediate detection and visualization of rare populations that outperforms previous algorithms as demonstrated by applying these tools to existing single cell RNA-seq data sets from various mouse organs.


Sign in / Sign up

Export Citation Format

Share Document