scholarly journals Mathematical modeling with single-cell sequencing data

2019 ◽  
Author(s):  
Heyrim Cho ◽  
Russell C. Rockne

AbstractSingle-cell sequencing technologies have revolutionized molecular and cellular biology and stimulated the development of computational tools to analyze the data generated from these technology platforms. However, despite the recent explosion of computational analysis tools, relatively few mathematical models have been developed to utilize these data. Here we compare and contrast two approaches for building mathematical models of cell state-transitions with single-cell RNA-sequencing data with hematopoeisis as a model system; by solving partial differential equations on a graph representing discrete cell state relationships, and by solving the equations on a continuous cell state-space. We demonstrate how to calibrate model parameters from single or multiple time-point single-cell sequencing data, and examine the effects of data processing algorithms on the model calibration and predictions. As an application of our approach, we demonstrate how the calibrated models may be used to mathematically perturb normal hematopoeisis to simulate, predict, and study the emergence of novel cell types during the pathogenesis of acute myeloid leukemia. The mathematical modeling framework we present is general and can be applied to study cell state-transitions in any single-cell genome sequencing dataset.Author summaryHere we compare and contrast graph- and continuum-based approaches for constructing mathematical models of cell state-transitions using single-cell RNA-sequencing data. Using two publicly available datasets, we demonstrate how to calibrate mathematical models of hematopoeisis and how to use the models to predict dynamics of acute myeloid leukemia pathogenesis by mathematically perturbing the process of cellular proliferation and differentiation. We apply these modeling approaches to study the effects of perturbing individual or sets of genes in subsets of cells, or by modeling the dynamics of cell state-transitions directly in a reduced dimensional space. We examine the effects of different graph abstraction and trajectory inference algorithms on calibrating the models and the subsequent model predictions. We conclude that both the graph- and continuum-based modeling approaches can be equally well calibrated to data and discuss situations in which one method may be preferable over the other. This work presents a general mathematical modeling framework, applicable to any single-cell sequencing dataset where cell state-transitions are of interest.

2021 ◽  
Author(s):  
Thomas Stiehl ◽  
Anna Marciniak-Czochra

AbstractAcute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.


2021 ◽  
Vol 12 ◽  
Author(s):  
Thomas Stiehl ◽  
Anna Marciniak-Czochra

Acute myeloid leukemia is an aggressive cancer of the blood forming system. The malignant cell population is composed of multiple clones that evolve over time. Clonal data reflect the mechanisms governing treatment response and relapse. Single cell sequencing provides most direct insights into the clonal composition of the leukemic cells, however it is still not routinely available in clinical practice. In this work we develop a computational algorithm that allows identifying all clonal hierarchies that are compatible with bulk variant allele frequencies measured in a patient sample. The clonal hierarchies represent descendance relations between the different clones and reveal the order in which mutations have been acquired. The proposed computational approach is tested using single cell sequencing data that allow comparing the outcome of the algorithm with the true structure of the clonal hierarchy. We investigate which problems occur during reconstruction of clonal hierarchies from bulk sequencing data. Our results suggest that in many cases only a small number of possible hierarchies fits the bulk data. This implies that bulk sequencing data can be used to obtain insights in clonal evolution.


2021 ◽  
Author(s):  
Lukas J Vlahos ◽  
Aleksandar Obradovic ◽  
Pasquale Laise ◽  
Jeremy Worley ◽  
Xiangtian Tan ◽  
...  

While single-cell RNA sequencing provides a new window on physiologic and pathologic tissue biology and heterogeneity, it suffers from low signal-to-noise ratio and a high dropout rate at the individual gene level, thus challenging quantitative analyses. To address this problem, we introduce PISCES (Protein-activity Inference for Single Cell Studies), an integrated analytical framework for the protein activity-based analysis of single cell subpopulations. PISCES leverages the assembly of lineage-specific gene regulatory networks, to accurately measure activity of each protein based on the expression its transcriptional targets (regulon), using the ARACNe and metaVIPER algorithms, respectively. It implements novel analytical and visualization functions, including activity-based cluster analysis, identification of cell state repertoires, and elucidation of master regulators of cell state and cell state transitions, with full interoperability with Seurat's single-cell data format. Accuracy and reproducibility assessment, via technical and biological validation assays and by assessing concordance with antibody and CITE-Seq-based measurements, show dramatic improvement in the ability to identify rare subpopulations and to assess activity of key lineage markers, compared to gene expression analysis.


Author(s):  
Kevin Y. Huang ◽  
Enrico Petretto

Single-cell transcriptomics analyses of the fibrotic lung uncovered two cell states critical to lung injury recovery in the alveolar epithelium- a reparative transitional cell state in the mouse and a disease-specific cell state (KRT5-/KRT17+) in human idiopathic pulmonary fibrosis (IPF). The murine transitional cell state lies between the differentiation from type 2 (AT2) to type 1 pneumocyte (AT1), and the human KRT5-/KRT17+ cell state may arise from the dysregulation of this differentiation process. We review major findings of single-cell transcriptomics analyses of the fibrotic lung and re-analyzed data from 7 single-cell RNA sequencing studies of human and murine models of IPF, focusing on the alveolar epithelium. Our comparative and cross-species single-cell transcriptomics analyses allowed us to further delineate the differentiation trajectories from AT2 to AT1 and AT2 to the KRT5-/KRT17+ cell state. We observed AT1 cells in human IPF retain the transcriptional signature of the murine transitional cell state. Using pseudotime analysis, we recapitulated the differentiation trajectories from AT2 to AT1 and from AT2 to KRT5-/KRT17+ cell state in multiple human IPF studies. We further delineated transcriptional programs underlying cell state transitions and determined the molecular phenotypes at terminal differentiation. We hypothesize that in addition to the reactivation of developmental programs (SOX4, SOX9), senescence (TP63, SOX4) and the Notch pathway (HES1) are predicted to steer intermediate progenitors to the KRT5-/KRT17+ cell state. Our analyses suggest that activation of SMAD3 later in the differentiation process may explain the fibrotic molecular phenotype typical of KRT5-/KRT17+ cells.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A520-A520
Author(s):  
Son Pham ◽  
Tri Le ◽  
Tan Phan ◽  
Minh Pham ◽  
Huy Nguyen ◽  
...  

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A


2019 ◽  
Author(s):  
Simone Ciccolella ◽  
Murray Patterson ◽  
Paola Bonizzoni ◽  
Gianluca Della Vedova

AbstractBackgroundSingle cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes infeasible using some approaches and tools. While this has not inhibited the development of methods for inferring phylogenies from SCS data, the continuing increase in size and resolution of these data begin to put a strain on such methods.One possible solution is to reduce the size of an SCS instance — usually represented as a matrix of presence, absence and missing values of the mutations found in the different sequenced cells — and infer the tree from this reduced-size instance. Previous approaches have used k-means to this end, clustering groups of mutations and/or cells, and using these means as the reduced instance. Such an approach typically uses the Euclidean distance for computing means. However, since the values in these matrices are of a categorical nature (having the three categories: present, absent and missing), we explore techniques for clustering categorical data — commonly used in data mining and machine learning — to SCS data, with this goal in mind.ResultsIn this work, we present a new clustering procedure aimed at clustering categorical vector, or matrix data — here representing SCS instances, called celluloid. We demonstrate that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method.Finally, we demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice.AvailabilityOur approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license.


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2018 ◽  
Vol 34 (12) ◽  
pp. 2077-2086 ◽  
Author(s):  
Suoqin Jin ◽  
Adam L MacLean ◽  
Tao Peng ◽  
Qing Nie

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using ‘single-cell energy’ and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are—in combination—more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document