scholarly journals A statistical simulator scDesign for rational scRNA-seq experimental design

2018 ◽  
Author(s):  
Wei Vivian Li ◽  
Jingyi Jessica Li

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths, and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information.ResultsHere we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and six different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experiment design based on specific research goals and compares various scRNA-seq computational methods.AvailabilityWe have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/[email protected]

2019 ◽  
Vol 35 (14) ◽  
pp. i41-i50 ◽  
Author(s):  
Wei Vivian Li ◽  
Jingyi Jessica Li

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. Results Here we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and 6 different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experimental design and comparison of scRNA–seq computational methods based on specific research goals. Availability and implementation We have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign. Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6985 ◽  
Author(s):  
Jordan M. Singer ◽  
Darwin Y. Fu ◽  
Jacob J. Hughey

Simulated data are invaluable for assessing a computational method’s ability to distinguish signal from noise. Although many biological systems show rhythmicity, there is no general-purpose tool to simulate large-scale, rhythmic data. Here we present Simphony, an R package for simulating data from experiments in which the abundances of rhythmic and non-rhythmic features (e.g., genes) are measured at multiple time points in multiple conditions. Simphony has parameters for specifying experimental design and each feature’s rhythmic properties (e.g., amplitude and phase). In addition, Simphony can sample measurements from Gaussian and negative binomial distributions, the latter of which approximates read counts from RNA-seq data. We show an example of using Simphony to evaluate the accuracy of rhythm detection. Our results suggest that Simphony will aid experimental design and computational method development. Simphony is thoroughly documented and freely available at https://github.com/hugheylab/simphony.


2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.


Author(s):  
Massimo Andreatta ◽  
Santiago J Carmona

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Yingying Cao ◽  
Simo Kitanovski ◽  
Daniel Hoffmann

Abstract Background RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy. Results Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization. Conclusion intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Katharina T. Schmid ◽  
Barbara Höllbacher ◽  
Cristiana Cruceanu ◽  
Anika Böttcher ◽  
Heiko Lickert ◽  
...  

AbstractSingle cell RNA-seq has revolutionized transcriptomics by providing cell type resolution for differential gene expression and expression quantitative trait loci (eQTL) analyses. However, efficient power analysis methods for single cell data and inter-individual comparisons are lacking. Here, we present scPower; a statistical framework for the design and power analysis of multi-sample single cell transcriptomic experiments. We modelled the relationship between sample size, the number of cells per individual, sequencing depth, and the power of detecting differentially expressed genes within cell types. We systematically evaluated these optimal parameter combinations for several single cell profiling platforms, and generated broad recommendations. In general, shallow sequencing of high numbers of cells leads to higher overall power than deep sequencing of fewer cells. The model, including priors, is implemented as an R package and is accessible as a web tool. scPower is a highly customizable tool that experimentalists can use to quickly compare a multitude of experimental designs and optimize for a limited budget.


2014 ◽  
Author(s):  
Sofie Demeyer ◽  
Tom Michoel

Transcriptional regulation of gene expression is one of the main processes that affect cell diversification from a single set of genes. Regulatory proteins often interact with DNA regions located distally from the transcription start sites (TSS) of the genes. We developed a computational method that combines open chromatin and gene expression information for a large number of cell types to identify these distal regulatory elements. Our method builds correlation graphs for publicly available DNase-seq and exon array datasets with matching samples and uses graph-based methods to filter findings supported by multiple datasets and remove indirect interactions. The resulting set of interactions was validated with both anecdotal information of known long-range interactions and unbiased experimental data deduced from Hi-C and CAGE experiments. Our results provide a novel set of high-confidence candidate open chromatin regions involved in gene regulation, often located several Mb away from the TSS of their target gene.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2018 ◽  
Author(s):  
Jordan M Singer ◽  
Darwin Y Fu ◽  
Jacob J Hughey

Simulated data are invaluable for assessing a computational method's ability to distinguish signal from noise. Although many biological systems show rhythmicity, there is no general-purpose tool to simulate large-scale, rhythmic data. Here we present Simphony, an R package for simulating data from experiments in which the abundances of rhythmic and non-rhythmic features (e.g., genes) are measured at multiple time points in multiple conditions. Simphony has parameters for specifying experimental design and each feature's rhythmic properties (e.g., shape, amplitude, and phase). In addition, Simphony can sample measurements from Gaussian and negative binomial distributions, the latter of which approximates read counts from next-generation sequencing data. We show an example of using Simphony to benchmark a method for detecting rhythms. Our results suggest that Simphony can aid experimental design and computational method development. Simphony is thoroughly documented and freely available at https://github.com/hugheylab/simphony.


2021 ◽  
Author(s):  
Katharina Schmid ◽  
Cristiana Cruceanu ◽  
Anika Böttcher ◽  
Heiko Lickert ◽  
Elisabeth Binder ◽  
...  

Abstract Single cell RNA-seq revolutionizes transcriptomics by providing cell type resolution for interindividual differential gene expression and expression quantitative trait loci analyses. However, efficient power analysis methods accounting for the characteristics of single cell data and interindividual comparison are missing. Here we present a statistical framework for design and power analysis of multi-sample single cell genomics experiments. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes within cell types. It enables fast systematic comparison of alternative experimental designs and optimization for a limited budget. We evaluated data driven priors for a range of applications and single cell platforms. In many settings, shallow sequencing of high numbers of cells leads to higher overall power than deep sequencing of fewer cells. The model including priors is implemented as an R package scPower and is accessible as a web tool.


Sign in / Sign up

Export Citation Format

Share Document