scholarly journals FORKS: Finding Orderings Robustly using k-means and Steiner trees

2017 ◽  
Author(s):  
Mayank Sharma ◽  
Huipeng Li ◽  
Debarka Sengupta ◽  
Shyam Prabhakar ◽  
Jayadeva

AbstractRecent advances in single cell RNA-seq technologies have provided researchers with unprecedented details of transcriptomic variation across individual cells. However, it has not been straightforward to infer differentiation trajectories from such data, due to the parameter-sensitivity of existing methods. Here, we present Finding Orderings Robustly using k-means and Steiner trees (FORKS), an algorithm that pseudo-temporally orders cells and thereby infers bifurcating state trajectories. FORKS, which is a generic method, can be applied to both single-cell and bulk differentiation data. It is a semi-supervised approach, in that it requires the user to specify the starting point of the time course. We systematically benchmarked FORKS and eight other pseudo-time estimation algorithms on six benchmark datasets, and found it to be more accurate, more reproducible, and more memory-efficient than existing methods for pseudo-temporal ordering. Another major advantage of our approach is its robustness – FORKS can be used with default parameter settings on a wide range of datasets.

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2018 ◽  
Author(s):  
Sean M. Gross ◽  
Mark A. Dane ◽  
Elmar Bucher ◽  
Laura M. Heiser

AbstractCells sense and respond to their environment by activating distinct intracellular signaling pathways, however an individual cell’s ability to faithfully transmit and discriminate environmental signals is thought to be limited. To assess the fidelity of signal transmission in the PI3K-AKT signaling pathway, we first developed an optimized genetically encoded sensor that had an increased dynamic range and reduced variation under basal conditions. We then used this reporter to track responses to varying doses of IGF-I in live cells and found that signaling responses from individual cells overlapped across a wide range of IGF-I doses, suggesting limited transmission accuracy. However, further analysis of individual cell traces revealed that responses were constant over time without stochastic fluctuations. We devised a new information theoretic approach to calculate the channel capacity using variance of the single cell time course data‐‐rather than population-level variance as has been previously used—and predicted that cells were capable of discriminating multiple growth factor doses. We validated these predictions by tracking individual cell responses to multiple IGF-I doses and found that cells can accurately distinguish at least four different IGF-I concentrations, as demonstrated by their distinct responses. Furthermore, we found a similar discriminatory ability to pathway inhibition, as assessed by responses to the PI3K inhibitor alpelisib. Our studies indicate that cells can faithfully transmit an IGF-I input into a down-stream signaling response and that heterogeneous responses result from variation in the input-output relation across the population. These observations reveal the importance of viewing each cell as having its own communication channel and underscore the importance of understanding responses at the single cell level.


2020 ◽  
Author(s):  
Giovana Ravizzoni Onzi ◽  
Juliano Luiz Faccioni ◽  
Alvaro G. Alvarado ◽  
Paula Andreghetto Bracco ◽  
Harley I. Kornblum ◽  
...  

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Laith Alzubaidi ◽  
Jinglan Zhang ◽  
Amjad J. Humaidi ◽  
Ayad Al-Dujaili ◽  
Ye Duan ◽  
...  

AbstractIn the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.


2021 ◽  
Author(s):  
Klebea Carvalho ◽  
Elisabeth Rebboah ◽  
Camden Jansen ◽  
Katherine Williams ◽  
Andrew Dowey ◽  
...  

SummaryGene regulatory networks (GRNs) provide a powerful framework for studying cellular differentiation. However, it is less clear how GRNs encode cellular responses to everyday microenvironmental cues. Macrophages can be polarized and potentially repolarized based on environmental signaling. In order to identify the GRNs that drive macrophage polarization and the heterogeneous single-cell subpopulations that are present in the process, we used a high-resolution time course of bulk and single-cell RNA-seq and ATAC-seq assays of HL-60-derived macrophages polarized towards M1 or M2 over 24 hours. We identified transient M1 and M2 markers, including the main transcription factors that underlie polarization, and subpopulations of naive, transitional, and terminally polarized macrophages. We built bulk and single-cell polarization GRNs to compare the recovered interactions and found that each technology recovered only a subset of known interactions. Our data provide a resource to study the GRN of cellular maturation in response to microenvironmental stimuli in a variety of contexts in homeostasis and disease.


2018 ◽  
Author(s):  
Shan Jiang ◽  
Katherine Williams ◽  
Xiangduo Kong ◽  
Weihua Zeng ◽  
Xinyi Ma ◽  
...  

AbstractFSHD is characterized by the misexpression of DUX4 in skeletal muscle. However, DUX4 is lowly expressed in patient samples and analysis of the consequences of DUX4 expression has largely relied on artificial overexpression. To better understand the native expression profile of DUX4 and its targets, we performed pooled RNA-seq differentiation time-course in FSHD2 patient-derived primary myoblasts and identified early-and late-induced sets of FSHD-associated genes. Using single-cell and single-nucleus RNA-seq on FSHD2 myoblasts and myotubes respectively, we captured DUX4 expression in single-nuclei and found that only some DUX4 targets are coexpressed. We identified two populations of FSHD myotube nuclei with distinct transcriptional profiles. One population is highly enriched with DUX4 and FSHD related genes, including the DUX4 paralog DUXA (“FSHD-Hi”). The other population has no expression of DUX4 and expresses low amounts of FSHD related genes (“FSHD-Lo”), but is marked by the expression of CYTL1 and CHI3L1. “FSHD-Hi” myotube nuclei upregulated a set of transcription factors (TFs) that may form a self-sustaining network of gene dysregulation, which perpetuates this disease after DUX4 is no longer expressed.


2016 ◽  
Author(s):  
Ning Leng ◽  
Li-Fang Chu ◽  
Jeea Choi ◽  
Christina Kendziorski ◽  
James A. Thomson ◽  
...  

AbstractMotivationWith the development of single cell RNA-seq (scRNA-seq) technology, scRNA-seq experiments with ordered conditions (e.g. time-course) are becoming common. Methods developed for analyzing ordered bulk RNA-seq experiments are not applicable to scRNA-seq, since their distributional assumptions are often violated by additional heterogeneities prevalent in scRNA-seq. Here we present SC-Pattern - an empirical Bayes model to characterize genes with expression changes in ordered scRNA-seq experiments. SCPattern utilizes the non-parametrical Kolmogorov-Smirnov statistic, thus it has the flexibility to identify genes with a wide variety of types of changes. Additionally, the Bayes framework allows SCPattern to classify genes into expression patterns with probability estimates.ResultsSimulation results show that SCPattern is well powered for identifying genes with expression changes while the false discovery rate is well controlled. SCPattern is also able to accurately classify these dynamic genes into directional expression patterns. Applied to a scRNA-seq time course dataset studying human embryonic cell differentiation, SCPattern detected a group of important genes that are involved in mesendoderm and definitive endoderm cell fate decisions, positional patterning, and cell cycle.Availability and ImplementationThe SCPattern is implemented as an R package along with a user-friendly graphical interface, which are available at:https://github.com/lengning/SCPatternContact:[email protected]


Author(s):  
Katharina T. Schmid ◽  
Cristiana Cruceanu ◽  
Anika Böttcher ◽  
Heiko Lickert ◽  
Elisabeth B. Binder ◽  
...  

AbstractBackgroundThe identification of genes associated with specific experimental conditions, genotypes or phenotypes through differential expression analysis has long been the cornerstone of transcriptomic analysis. Single cell RNA-seq is revolutionizing transcriptomics and is enabling interindividual differential gene expression analysis and identification of genetic variants associated with gene expression, so called expression quantitative trait loci at cell-type resolution. Current methods for power analysis and guidance of experimental design either do not account for the specific characteristics of single cell data or are not suitable to model interindividual comparisons.ResultsHere we present a statistical framework for experimental design and power analysis of single cell differential gene expression between groups of individuals and expression quantitative trait locus analysis. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes within individual cell types. Power analysis is based on data driven priors from literature or pilot experiments across a wide range of application scenarios and single cell RNA-seq platforms. Using these priors we show that, for a fixed budget, the number of cells per individual is the major determinant of power.ConclusionOur model is general and allows for systematic comparison of alternative experimental designs and can thus be used to guide experimental design to optimize power. For a wide range of applications, shallow sequencing of high numbers of cells per individual leads to higher overall power than deep sequencing of fewer cells. The model is implemented as an R package scPower.


2016 ◽  
Author(s):  
Vincent Gardeux ◽  
Fabrice David ◽  
Adrian Shajkofci ◽  
Petra C Schwalie ◽  
Bart Deplancke

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet, these groups often lack the expertise to handle complex scRNA-seq data sets.ResultsWe developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types.AvailabilityThe tool is freely available at http://[email protected]


Sign in / Sign up

Export Citation Format

Share Document