SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

Mapping Intimacies ◽

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text

scGEApp: a Matlab app for feature selection on single-cell RNA sequencing data

10.1101/544163 ◽

2019 ◽

Cited By ~ 2

Author(s):

James J. Cai

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Single Cell ◽

Rna Sequencing ◽

User Interfaces ◽

Dropout Rate ◽

Supplementary Information ◽

Sequencing Data ◽

Full Spectrum ◽

Single Cell Rna Sequencing

AbstractMotivationThe recent development of single-cell technologies, especially single-cell RNA sequencing (scRNA-seq), provides an unprecedented level of resolution to the cell type heterogeneity. It also enables the study of gene expression variability across individual cells within a homogenous cell population. Feature selection algorithms have been used to select biologically meaningful genes while controlling for sampling noise. An easy-to-use application for feature selection on scRNA-seq data requires integration of functions for data filtering, normalization, visualization, and enrichment analyses. Graphic user interfaces (GUIs) are desired for such an application.ResultsWe used native Matlab and App Designer to develop scGEApp for feature selection on singlecell gene expression data. We specifically designed a new feature selection algorithm based on the 3D spline fitting of expression mean (μ), coefficient of variance (CV), and dropout rate (rdrop), making scGEApp a unique tool for feature selection on scRNA-seq data. Our method can be applied to single-sample or two-sample scRNA-seq data, identify feature genes, e.g., those with unexpectedly high CV for given μ and rdrop of those genes, or genes with the most feature changes. Users can operate scGEApp through GUIs to use the full spectrum of functions including normalization, batch effect correction, imputation, visualization, feature selection, and downstream analyses with GSEA and GOrilla.Availabilityhttps://github.com/jamesjcai/scGEAppContact:[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa105 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3276-3278 ◽

Cited By ~ 2

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Simulation Method ◽

R Package ◽

Supplementary Information ◽

Expression Data ◽

Sequencing Data ◽

Wide Range ◽

Single Cell Rna Sequencing

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single-Cell Transcriptome Analysis Reveals Dynamic Cell Populations and Differential Gene Expression Patterns in Control and Aneurysmal Human Aortic Tissue

Circulation ◽

10.1161/circulationaha.120.046528 ◽

2020 ◽

Vol 142 (14) ◽

pp. 1374-1388

Author(s):

Yanming Li ◽

Pingping Ren ◽

Ashley Dawson ◽

Hernan G. Vasquez ◽

Waleed Ageedi ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Aortic Wall ◽

Genome Wide Association ◽

Aortic Tissue ◽

Sequencing Data ◽

Genome Wide ◽

Single Cell Rna Sequencing ◽

Differential Gene

Background: Ascending thoracic aortic aneurysm (ATAA) is caused by the progressive weakening and dilatation of the aortic wall and can lead to aortic dissection, rupture, and other life-threatening complications. To improve our understanding of ATAA pathogenesis, we aimed to comprehensively characterize the cellular composition of the ascending aortic wall and to identify molecular alterations in each cell population of human ATAA tissues. Methods: We performed single-cell RNA sequencing analysis of ascending aortic tissues from 11 study participants, including 8 patients with ATAA (4 women and 4 men) and 3 control subjects (2 women and 1 man). Cells extracted from aortic tissue were analyzed and categorized with single-cell RNA sequencing data to perform cluster identification. ATAA-related changes were then examined by comparing the proportions of each cell type and the gene expression profiles between ATAA and control tissues. We also examined which genes may be critical for ATAA by performing the integrative analysis of our single-cell RNA sequencing data with publicly available data from genome-wide association studies. Results: We identified 11 major cell types in human ascending aortic tissue; the high-resolution reclustering of these cells further divided them into 40 subtypes. Multiple subtypes were observed for smooth muscle cells, macrophages, and T lymphocytes, suggesting that these cells have multiple functional populations in the aortic wall. In general, ATAA tissues had fewer nonimmune cells and more immune cells, especially T lymphocytes, than control tissues did. Differential gene expression data suggested the presence of extensive mitochondrial dysfunction in ATAA tissues. In addition, integrative analysis of our single-cell RNA sequencing data with public genome-wide association study data and promoter capture Hi-C data suggested that the erythroblast transformation-specific related gene( ERG ) exerts an important role in maintaining normal aortic wall function. Conclusions: Our study provides a comprehensive evaluation of the cellular composition of the ascending aortic wall and reveals how the gene expression landscape is altered in human ATAA tissue. The information from this study makes important contributions to our understanding of ATAA formation and progression.

Download Full-text

schex avoids overplotting for large single-cell RNA-sequencing datasets

Bioinformatics ◽

10.1093/bioinformatics/btz907 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2291-2292 ◽

Cited By ~ 1

Author(s):

Saskia Freytag ◽

Ryan Lister

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing

Bioinformatics ◽

10.1093/bioinformatics/btaa278 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4021-4029

Author(s):

Hyundoo Jeong ◽

Zhandong Liu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Expression Patterns ◽

Imputation Method ◽

Supplementary Information ◽

Single Cell Sequencing ◽

Depth Analysis ◽

Single Cell Rna Sequencing

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Differential gene expression analysis in single-cell RNA sequencing data

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217650 ◽

2017 ◽

Author(s):

Tianyu Wang ◽

Sheida Nabavi

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Differential Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Sequencing Data ◽

Differential Gene Expression Analysis ◽

Single Cell Rna Sequencing ◽

Differential Gene

Download Full-text

Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data

Genome Biology ◽

10.1186/gb-2013-14-1-r7 ◽

2013 ◽

Vol 14 (1) ◽

pp. R7 ◽

Cited By ~ 100

Author(s):

Jong Kim ◽

John C Marioni

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Stochastic Gene Expression ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Kinetics Of

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Abstract 4689: Subclone-specific evolution of tumor phenotypes – A framework to study subclone-specific gene expression from a combination of bulk DNA and single cell RNA sequencing data

10.1158/1538-7445.sabcs18-4689 ◽

2019 ◽

Author(s):

Yi Qiao ◽

Xiaomeng Huang ◽

Samuel Brady ◽

Andrea Bild ◽

David Bowtell ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Specific Gene ◽

Sequencing Data ◽

Specific Gene Expression ◽

Single Cell Rna Sequencing ◽

Tumor Phenotypes

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text