scholarly journals CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-Seq data

2016 ◽  
Author(s):  
Peijie Lin ◽  
Michael Troup ◽  
Joshua W. K. Ho

Most existing dimensionality reduction and clustering packages for single-cell RNA-Seq (scRNA-Seq) data deal with dropouts by heavy modelling and computational machinery. Here we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm which uses a novel yet very simple ‘implicit imputation’ approach to alleviate the impact of dropouts in scRNA-Seq data in a principled manner. Using a range of simulated and real data, we have shown that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds for processing a data set of hundreds of cells, and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.org/VCCRI/CIDR.

2018 ◽  
Author(s):  
Pierre-Cyril Aubin-Frankowski ◽  
Jean-Philippe Vert

AbstractSingle-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulation networks (GRN) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. In this work we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual data, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data.


2020 ◽  
Author(s):  
Felix Raimundo ◽  
Celine Vallot ◽  
Jean Philippe Vert

AbstractBackgroundMany computational methods have been developed recently to analyze single-cell RNA-seq (scRNA-seq) data. Several benchmark studies have compared these methods on their ability for dimensionality reduction, clustering or differential analysis, often relying on default parameters. Yet given the biological diversity of scRNA-seq datasets, parameter tuning might be essential for the optimal usage of methods, and determining how to tune parameters remains an unmet need.ResultsHere, we propose a benchmark to assess the performance of five methods, systematically varying their tunable parameters, for dimension reduction of scRNA-seq data, a common first step to many downstream applications such as cell type identification or trajectory inference. We run a total of 1.5 million experiments to assess the influence of parameter changes on the performance of each method, and propose two strategies to automatically tune parameters for methods that need it.ConclusionsWe find that principal component analysis (PCA)-based methods like scran and Seurat are competitive with default parameters but do not benefit much from parameter tuning, while more complex models like ZinbWave, DCA and scVI can reach better performance but after parameter tuning.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Akram Vasighizaker ◽  
Saiteja Danda ◽  
Luis Rueda

AbstractIdentifying relevant disease modules such as target cell types is a significant step for studying diseases. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, are the most suitable approach in scRNA-seq data analysis when the cell types have not been well-characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensionality of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a method that is used to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We further performed gene set enrichment analysis to evaluate the proposed method’s performance. As such, our results show that modified locally linear embedding combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different datasets.


2017 ◽  
Author(s):  
Davide Risso ◽  
Fanny Perraudeau ◽  
Svetlana Gribkova ◽  
Sandrine Dudoit ◽  
Jean-Philippe Vert

AbstractSingle-cell RNA sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.


2017 ◽  
Vol 727 ◽  
pp. 447-449 ◽  
Author(s):  
Jun Dai ◽  
Hua Yan ◽  
Jian Jian Yang ◽  
Jun Jun Guo

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.


2020 ◽  
Author(s):  
Silvia Llonch ◽  
Montserrat Barragán ◽  
Paula Nieto ◽  
Anna Mallol ◽  
Marc Elosua-Bayes ◽  
...  

AbstractStudy questionTo which degree does maternal age affect the transcriptome of human oocytes at the germinal vesicle (GV) stage or at metaphase II after maturation in vitro (IVM-MII)?Summary answerWhile the oocytes’ transcriptome is predominantly determined by maturation stage, transcript levels of genes related to chromosome segregation, mitochondria and RNA processing are affected by age after in vitro maturation of denuded oocytes.What is known alreadyFemale fertility is inversely correlated with maternal age due to both a depletion of the oocyte pool and a reduction in oocyte developmental competence. Few studies have addressed the effect of maternal age on the human mature oocyte (MII) transcriptome, which is established during oocyte growth and maturation, and the pathways involved remain unclear. Here, we characterize and compare the transcriptomes of a large cohort of fully grown GV and IVM-MII oocytes from women of varying reproductive age.Study design, size, durationIn this prospective molecular study, 37 women were recruited from May 2018 to June 2019. The mean age was 28.8 years (SD=7.7, range 18-43). A total of 72 oocytes were included in the study at GV stage after ovarian stimulation, and analyzed as GV (n=40) and in vitro matured oocytes (IVM-MII; n=32).Participants/materials, setting, methodsDenuded oocytes were included either as GV at the time of ovum pick-up or as IVM-MII after in vitro maturation for 30 hours in G2™ medium, and processed for transcriptomic analysis by single-cell RNA-seq using the Smart-seq2 technology. Cluster and maturation stage marker analysis were performed using the Seurat R package. Genes with an average fold change greater than 2 and a p-value < 0.01 were considered maturation stage markers. A Pearson correlation test was used to identify genes whose expression levels changed progressively with age. Those genes presenting a correlation value (R) >= |0.3| and a p-value < 0.05 were considered significant.Main results and the role of chanceFirst, by exploration of the RNA-seq data using tSNE dimensionality reduction, we identified two clusters of cells reflecting the oocyte maturation stage (GV and IVM-MII) with 4,445 and 324 putative marker genes, respectively. Next we identified genes, for which RNA levels either progressively increased or decreased with age. This analysis was performed independently for GV and IVM-MII oocytes. Our results indicate that the transcriptome is more affected by age in IVM-MII oocytes (1,219 genes) than in GV oocytes (596 genes). In particular, we found that genes involved in chromosome segregation and RNA splicing significantly increase in transcript levels with age, while genes related to mitochondrial activity present lower transcript levels with age. Gene regulatory network analysis revealed potential upstream master regulator functions for genes whose transcript levels present positive (GPBP1, RLF, SON, TTF1) or negative (BNC1, THRB) correlation with age.Limitations, reasons for cautionIVM-MII oocytes used in this study were obtained after in vitro maturation of denuded GV oocytes, therefore, their transcriptome might not be fully representative of in vivo matured MII oocytes.The Smart-seq2 methodology used in this study detects polyadenylated transcripts only and we could therefore not assess non-polyadenylated transcripts.Wider implications of the findingsOur analysis suggests that advanced maternal age does not globally affect the oocyte transcriptome at GV or IVM-MII stages. Nonetheless, hundreds of genes displayed altered transcript levels with age, particularly in IVM-MII oocytes. Especially affected by age were genes related to chromosome segregation and mitochondrial function, pathways known to be involved in oocyte ageing. Our study thereby suggests that misregulation of chromosome segregation and mitochondrial pathways also at the RNA-level might contribute to the age-related quality decline in human oocytes.Study funding/competing interest(s)This study was funded by the AXA research fund, the European commission, intramural funding of Clinica EUGIN, the Spanish Ministry of Science, Innovation and Universities, the Catalan Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) and by contributions of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) to the EMBL partnership and to the “Centro de Excelencia Severo Ochoa”.The authors have no conflict of interest to declare.


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.


2017 ◽  
Vol 9 (2) ◽  
pp. 169-186 ◽  
Author(s):  
Liang Zhao ◽  
Tsvi Vinig

Purpose In the existing literature on crowdfunding project performance, previous studies have given little attention to the impact of investors’ hedonic value and utilitarian value on project results. In a crowdfunding setting, utilitarian value is somehow hard to satisfy due to information asymmetry and adverse selection problem. Therefore, the projects with more hedonic value can be more attractive for potential investors. Lucky draw is a method to increase consumer hedonic value, and it can influence investors’ behavior as a result. The authors hypothesize that projects with hedonic treatment (lucky draw) may have higher probability to win their campaign than others. The paper aims to discuss these issues. Design/methodology/approach A unique self-extracted two-year Chinese crowdfunding platform real data set has been applied as the analysis sample. The authors first employ propensity score matching methods to control for the endogeneity of hedonic treatment adoption (lucky draw). The authors then run OLS regression and probit regression in order to test the hypotheses. Findings The analysis suggests a significant positive relationship not only between project lottery adoption and project results but also between project lottery adoption and project popularity. Originality/value The results suggest that an often ignored factor – hedonic treatment (lucky draw) – can play an important role in crowdfunding project performance.


Sign in / Sign up

Export Citation Format

Share Document