Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference

Mapping Intimacies ◽

10.1101/464479 ◽

2018 ◽

Cited By ~ 4

Author(s):

Pierre-Cyril Aubin-Frankowski ◽

Jean-Philippe Vert

Keyword(s):

Gene Regulation ◽

Single Cell ◽

De Novo ◽

State Of The Art ◽

Real Data ◽

Biological Processes ◽

Rna Seq ◽

Cell Cycles ◽

Linear Differential ◽

Cell Trajectories

AbstractSingle-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulation networks (GRN) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. In this work we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual data, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data.

Download Full-text

Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference

Bioinformatics ◽

10.1093/bioinformatics/btaa576 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4774-4780 ◽

Cited By ~ 2

Author(s):

Pierre-Cyril Aubin-Frankowski ◽

Jean-Philippe Vert

Keyword(s):

Single Cell ◽

De Novo ◽

Real Data ◽

Supplementary Information ◽

Biological Processes ◽

Rna Seq ◽

Cell Cycles ◽

Gene Regulatory ◽

Linear Differential ◽

Cell Trajectories

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulatory network (GRNs) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. Results In this work, we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual cells, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data. Availability and implementation The MATLAB code of GRISLI is available at: https://github.com/PCAubin/GRISLI. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-Seq data

10.1101/068775 ◽

2016 ◽

Cited By ~ 2

Author(s):

Peijie Lin ◽

Michael Troup ◽

Joshua W. K. Ho

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

State Of The Art ◽

Principal Component ◽

Real Data ◽

Rna Seq ◽

Data Set ◽

Standard Principal Component Analysis ◽

The Impact ◽

Imputation Approach

Most existing dimensionality reduction and clustering packages for single-cell RNA-Seq (scRNA-Seq) data deal with dropouts by heavy modelling and computational machinery. Here we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm which uses a novel yet very simple ‘implicit imputation’ approach to alleviate the impact of dropouts in scRNA-Seq data in a principled manner. Using a range of simulated and real data, we have shown that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds for processing a data set of hundreds of cells, and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.org/VCCRI/CIDR.

Download Full-text

CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009631 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009631

Author(s):

Raquel Linheiro ◽

John Archer

Keyword(s):

De Novo ◽

Simulated Data ◽

Real Data ◽

Gene Families ◽

Classification Systems ◽

Whole Body ◽

Cdna Libraries ◽

Sequence Information ◽

Rna Seq ◽

High Quality

With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/.

Download Full-text

K-mer counting with low memory consumption enables fast clustering of single-cell sequencing data without read alignment

10.1101/723833 ◽

2019 ◽

Author(s):

Christina Huan Shi ◽

Kevin Y. Yip

Keyword(s):

Single Cell ◽

State Of The Art ◽

Rna Seq ◽

Sequencing Data ◽

Memory Consumption ◽

Analysis Pipeline ◽

Cell Clusters ◽

Single Cell Sequencing ◽

Sequencing Errors ◽

Full Analysis

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization

10.7287/peerj.preprints.1839 ◽

2016 ◽

Author(s):

Xun Zhu ◽

Travers Ching ◽

Xinghua Pan ◽

Sherman Weissman ◽

Lana Garmire

Keyword(s):

Single Cell ◽

Hierarchical Clustering ◽

Matrix Factorization ◽

New Technology ◽

Data Sets ◽

Biological Processes ◽

Rna Seq ◽

Clustering Methods ◽

Hematopoietic Stem ◽

Non Negative Matrix Factorization

Single-cell RNA-Sequencing (scRNA-Seq) is a cutting edge technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we have investigated the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq data sets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy even when the clustering results of K-means and hierarchical clustering are enhanced by t-SNE. Moreover, NMF successfully detect the subpopulations, such as those in a single glioblastoma patient. Furthermore, in conjugation with the modularity detection method FEM, it reveals unique modules that are indicative of clinical subtypes. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data, and the NMFEM pipeline is suitable for modularity detection among single-cell RNA-Seq data.

Download Full-text

Decision letter: Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila

10.7554/elife.47138.027 ◽

2019 ◽

Author(s):

Helen White-Cooper

Keyword(s):

Single Cell ◽

Gene Transcription ◽

De Novo ◽

Mutational Bias ◽

Rna Seq ◽

De Novo Gene

Download Full-text

A novel single-cell based method for breast cancer prognosis

10.1101/2020.04.26.062794 ◽

2020 ◽

Author(s):

Xiaomei Li ◽

Lin Liu ◽

Greg Goodall ◽

Andreas Schreiber ◽

Taosheng Xu ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Breast Cancer Prognosis ◽

Cancer Prognosis ◽

Biological Processes ◽

Expression Data ◽

Rna Seq ◽

Novel Method

AbstractBreast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels. In this paper, we propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, we use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.Author summaryVarious computational methods have been developed for breast cancer prognosis. However, those methods mainly use the gene expression data generated by the bulk RNA sequencing techniques, which average the expression level of a gene across different cell types. As breast cancer is a heterogenous disease, the bulk gene expression may not be the ideal resource for cancer prognosis. In this study, we propose a novel method to improve breast cancer prognosis using scRNA-seq data. The proposed method has been applied to the EMT scRNA-seq dataset for identifying breast cancer signatures for prognosis. In comparison with existing bulk expression data based methods in breast cancer prognosis, our method shows a better performance. Our single-cell-based signatures provide clues to the relation between EMT and clinical outcomes of breast cancer. In addition, the proposed method can also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.

Download Full-text

Author response: Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila

10.7554/elife.47138.028 ◽

2019 ◽

Cited By ~ 1

Author(s):

Evan Witt ◽

Sigi Benjamin ◽

Nicolas Svetec ◽

Li Zhao

Keyword(s):

Single Cell ◽

Gene Transcription ◽

De Novo ◽

Author Response ◽

Mutational Bias ◽

Rna Seq ◽

De Novo Gene

Download Full-text

Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference

10.1101/598748 ◽

2019 ◽

Cited By ~ 4

Author(s):

Yuanhua Huang ◽

Davis J McCarthy ◽

Oliver Stegle

Keyword(s):

Single Cell ◽

Real Data ◽

Experimental Designs ◽

Joint Analysis ◽

Rna Seq ◽

Computationally Efficient ◽

Primary Object ◽

Genotype Information ◽

Object Of Study ◽

Multiple Samples

AbstractThe joint analysis of multiple samples using single-cell RNA-seq is a promising experimental design, offering both increased throughput while allowing to account for batch variation. To achieve multi-sample designs, genetic variants that segregate between the samples in the pool have been proposed as natural barcodes for cell demultiplexing. Existing demultiplexing strategies rely on access to complete genotype data from the pooled samples, which greatly limits the applicability of such methods, in particular when genetic variation is not the primary object of study. To address this, we here present Vireo, a computationally efficient Bayesian model to demultiplex single-cell data from pooled experimental designs. Uniquely, our model can be applied in settings when only partial or no genotype information is available. Using simulations based on synthetic mixtures and results on real data, we demonstrate the robustness of our model and illustrate the utility of multi-sample experimental designs for common expression analyses.

Download Full-text