scMerge: Integration of multiple single-cell transcriptomics datasets leveraging stable expression and pseudo-replication

Mapping Intimacies ◽

10.1101/393280 ◽

2018 ◽

Cited By ~ 9

Author(s):

Yingxin Lin ◽

Shila Ghazanfar ◽

Kevin Wang ◽

Johann A. Gagnon-Bartsch ◽

Kitty K. Lo ◽

...

Keyword(s):

Single Cell ◽

Developmental Trajectories ◽

Stable Expression ◽

Rna Seq ◽

Integrative Analyses ◽

Biological Discovery ◽

Multiple Scenarios ◽

Novel Algorithm

AbstractConcerted examination of multiple collections of single cell RNA-Seq (scRNA-Seq) data promises further biological insights that cannot be uncovered with individual datasets. However, such integrative analyses are challenging and require sophisticated methodologies. To enable effective interrogation of multiple scRNA-Seq datasets, we have developed a novel algorithm, named scMerge, that removes unwanted variation by combining stably expressed genes and utilizing pseudo-replicates across datasets. Analysis of large collections of publicly available datasets demonstrates that scMerge performs well in multiple scenarios and enhances biological discovery, including inferring cell developmental trajectories.

Download Full-text

A novel algorithm for the collective integration of single cell RNA-seq during embryogenesis

10.1101/543314 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wuming Gong ◽

Bhairab N. Singh ◽

Pruthvi Shah ◽

Satyabrata Das ◽

Joshua Theisen ◽

...

Keyword(s):

Single Cell ◽

Developmental Stages ◽

Developmental Trajectories ◽

Single Cells ◽

Cardiovascular Development ◽

Rna Seq ◽

Downstream Target ◽

Early Mouse ◽

Endothelial Development ◽

Novel Algorithm

AbstractSingle cell RNA-seq (scRNA-seq) over specified time periods has been widely used to dissect the cell populations during mammalian embryogenesis. Integrating such scRNA-seq data from different developmental stages and from different laboratories is critical to comprehensively define and understand the molecular dynamics and systematically reconstruct the lineage trajectories. Here, we describe a novel algorithm to integrate heterogenous temporal scRNA-seq datasets and to preserve the global developmental trajectories. We applied this algorithm and approach to integrate 3,387 single cells from seven heterogenous temporal scRNA-seq datasets, and reconstructed the cell atlas of early mouse cardiovascular development from E6.5 to E9.5. Using this integrated atlas, we identified an Etv2 downstream target, Ebf1, as an important transcription factor for mouse endothelial development.

Download Full-text

Rejoinder for “Exponential-Family Embedding With Application to Cell Developmental Trajectories for Single-Cell RNA-Seq Data”

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1892701 ◽

2021 ◽

Vol 116 (534) ◽

pp. 478-480

Author(s):

Kevin Z. Lin ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Exponential Family ◽

Developmental Trajectories ◽

Rna Seq

Download Full-text

Discussion of “Exponential-Family Embedding With Application to Cell Developmental Trajectories for Single-Cell RNA-Seq Data”

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1880919 ◽

2021 ◽

Vol 116 (534) ◽

pp. 475-477

Author(s):

Jian Hu ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Exponential Family ◽

Developmental Trajectories ◽

Rna Seq

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1820006116 ◽

2019 ◽

Vol 116 (20) ◽

pp. 9775-9784 ◽

Cited By ~ 38

Author(s):

Yingxin Lin ◽

Shila Ghazanfar ◽

Kevin Y. X. Wang ◽

Johann A. Gagnon-Bartsch ◽

Kitty K. Lo ◽

...

Keyword(s):

Factor Analysis ◽

Data Integration ◽

Single Cell ◽

Rna Seq ◽

Cell Type ◽

Large Collection ◽

Single Cell Rna Sequencing ◽

Development Trajectory ◽

Biological Discovery ◽

Public Datasets

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.

Download Full-text

The SZS is an efficient statistical method to identify regulated splicing events in droplet-based RNA sequencing

10.1101/2020.11.10.377572 ◽

2020 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Julia Salzman

Keyword(s):

Single Cell ◽

Statistical Method ◽

Rna Seq ◽

Computationally Efficient ◽

Small Set ◽

Biological Discovery ◽

Cell Type Specific ◽

Human Spermatogenesis ◽

Splicing Patterns ◽

Cell Data

AbstractTo date, the field of single-cell genomics has viewed robust splicing analysis as completely out of reach in droplet-based platforms, preventing biological discovery of single-cell regulated splicing. Here, we introduce a novel, robust, and computationally efficient statistical method, the Splicing Z Score (SZS), to detect differential alternative splicing in single cell RNA-Seq technologies including 10x Chromium. We applied the SZS to primary human cells to discover new regulated, cell type-specific splicing patterns. Illustrating the power of the SZS method, splicing of a small set of genes has high predictive power for tissue compartment in the human lung, and the SZS identifies un-annotated, conserved splicing regulation in the human spermatogenesis. The SZS is a method that can rapidly identify regulated splicing events from single cell data and prioritize genes predicted to have functionally significant splicing programs.

Download Full-text

scDIOR: single cell RNA-seq data IO software

BMC Bioinformatics ◽

10.1186/s12859-021-04528-3 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Huijian Feng ◽

Lihui Lin ◽

Jiekai Chen

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Developmental Trajectories ◽

Rapid Development ◽

Data Transformation ◽

Rna Seq ◽

Data Types ◽

User Friendly ◽

Cell Data

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.

Download Full-text

Power in Numbers: Single-Cell RNA-Seq Strategies to Dissect Complex Tissues

Annual Review of Genetics ◽

10.1146/annurev-genet-120417-031247 ◽

2018 ◽

Vol 52 (1) ◽

pp. 203-221 ◽

Cited By ~ 29

Author(s):

Kenneth D. Birnbaum

Keyword(s):

Single Cell ◽

High Throughput ◽

Genetic Screening ◽

Developmental Trajectories ◽

Fluorescent Microscopy ◽

The Novel ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

Developmental Dynamics

The growing scale and declining cost of single-cell RNA-sequencing (RNA-seq) now permit a repetition of cell sampling that increases the power to detect rare cell states, reconstruct developmental trajectories, and measure phenotype in new terms such as cellular variance. The characterization of anatomy and developmental dynamics has not had an equivalent breakthrough since groundbreaking advances in live fluorescent microscopy. The new resolution obtained by single-cell RNA-seq is a boon to genetics because the novel description of phenotype offers the opportunity to refine gene function and dissect pleiotropy. In addition, the recent pairing of high-throughput genetic perturbation with single-cell RNA-seq has made practical a scale of genetic screening not previously possible.

Download Full-text

Discussion of “Exponential-Family Embedding With Application to Cell Developmental Trajectories for Single-Cell RNA-seq Data”

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1880920 ◽

2021 ◽

Vol 116 (534) ◽

pp. 471-474

Author(s):

Zhicheng Ji ◽

Hongkai Ji

Keyword(s):

Single Cell ◽

Exponential Family ◽

Developmental Trajectories ◽

Rna Seq

Download Full-text

Continuous State HMMs for Modeling Time Series Single Cell RNA-Seq Data

10.1101/380568 ◽

2018 ◽

Author(s):

Chieh Lin ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Single Cell ◽

Developmental Process ◽

Developmental Trajectories ◽

Cell Types ◽

Supplementary Information ◽

Rna Seq ◽

Inference Algorithms ◽

Continuous State ◽

Efficient Learning

AbstractMotivationMethods for reconstructing developmental trajectories from time series single cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods, are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy.ResultsWe developed a new method based on continuous state HMMs (CSHMMs) for representing and modeling time series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single cell datasets we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types.AvailabilitySoftware and Supporting website: www.andrew.cmu.edu/user/chiehll/CSHMM/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text